removers
Remove spaCy
tokens.
This module contains functions that assist with removing spaCy
tokens.
A typical usage example
import spacy
from spacy_cleaner import processing
nlp = spacy.load("en_core_web_md")
doc = nlp("and")
tok = doc[0]
processing.remove_stopword_token(tok)
and
is a stopword so an empty string is returned.
remove_email_token(tok)
¶
If the token is like an email, replace it with an empty string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tok |
Token
|
A |
required |
Returns:
Type | Description |
---|---|
Union[str, Token]
|
An empty string or the original token. |
Source code in spacy_cleaner/processing/removers.py
remove_number_token(tok)
¶
If the token is like a number, replace it with an empty string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tok |
Token
|
A |
required |
Returns:
Type | Description |
---|---|
Union[str, Token]
|
An empty string or the original token. |
Source code in spacy_cleaner/processing/removers.py
remove_punctuation_token(tok)
¶
If the token is punctuation, replace it with an empty string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tok |
Token
|
A |
required |
Returns:
Type | Description |
---|---|
Union[str, Token]
|
An empty string or the original token. |
Source code in spacy_cleaner/processing/removers.py
remove_stopword_token(tok)
¶
If the token is a stopword, replace it with an empty string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tok |
Token
|
A |
required |
Returns:
Type | Description |
---|---|
Union[str, Token]
|
An empty string or the original token. |
Source code in spacy_cleaner/processing/removers.py
remove_url_token(tok)
¶
If the token is like a URL, replace it with an empty string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tok |
Token
|
A |
required |
Returns:
Type | Description |
---|---|
Union[str, Token]
|
An empty string or the original token. |