replacers
Replace spaCy
tokens.
This module contains functions that assist with replace spaCy
tokens.
A typical usage example
import spacy
from spacy_cleaner import processing
nlp = spacy.load("en_core_web_md")
doc = nlp(",")
tok = doc[0]
processing.replace_punctuation_token(tok)
,
is replaced with _IS_PUNCT_
.
replace_email_token(tok, replace='_LIKE_EMAIL_')
¶
If the token is like an email, replace it with the string _LIKE_EMAIL_
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tok |
Token
|
A |
required |
replace |
str
|
The replacement string. |
'_LIKE_EMAIL_'
|
Returns:
Type | Description |
---|---|
Union[str, Token]
|
The replacement string or the original token. |
Source code in spacy_cleaner/processing/replacers.py
replace_number_token(tok, replace='_LIKE_NUM_')
¶
If the token is like a number, replace it with the string _LIKE_NUM_
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tok |
Token
|
A |
required |
replace |
str
|
The replacement string. |
'_LIKE_NUM_'
|
Returns:
Type | Description |
---|---|
Union[str, Token]
|
The replacement string or the original token. |
Source code in spacy_cleaner/processing/replacers.py
replace_punctuation_token(tok, replace='_IS_PUNCT_')
¶
If the token is punctuation, replace it with the string _IS_PUNCT_
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tok |
Token
|
A |
required |
replace |
str
|
The replacement string. |
'_IS_PUNCT_'
|
Returns:
Type | Description |
---|---|
Union[str, Token]
|
The replacement string or the original token. |
Source code in spacy_cleaner/processing/replacers.py
replace_stopword_token(tok, replace='_IS_STOP_')
¶
If the token is a stopword, replace it with the string _IS_STOP_
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tok |
Token
|
A |
required |
replace |
str
|
The replacement string. |
'_IS_STOP_'
|
Returns:
Type | Description |
---|---|
Union[str, Token]
|
The replacement string or the original token. |
Source code in spacy_cleaner/processing/replacers.py
replace_url_token(tok, replace='_LIKE_URL_')
¶
If the token is like a URL, replace it with the string _LIKE_URL_
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tok |
Token
|
A |
required |
replace |
str
|
The replacement string. |
'_LIKE_URL_'
|
Returns:
Type | Description |
---|---|
Union[str, Token]
|
The replacement string or the original token. |