helpers
Processing helper functions.
clean_doc(doc, *processors)
¶
Cleans a spaCy document and returns a cleaned string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
doc |
Doc
|
spaCy document to be cleaned. |
required |
*processors |
Callable[[Token], Union[str, Token]]
|
Callable token processors. |
()
|
Returns:
Type | Description |
---|---|
str
|
A string of the cleaned text. |
Source code in spacy_cleaner/processing/helpers.py
replace_multi_whitespace(s, replace=' ')
¶
Replace multiple whitespace characters with a single space.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s |
str
|
The string to be replaced. |
required |
replace |
str
|
The replacement string. |
' '
|
Returns:
Type | Description |
---|---|
str
|
A string with all the whitespace replaced with a single space. |
Source code in spacy_cleaner/processing/helpers.py
token_pipe(tok, *processors)
¶
Applies a series of processors to a token until it becomes a string.
It takes a token, and applies a series of functions to it, until one of the functions returns a string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tok |
Token
|
The token to be transformed, |
required |
*processors |
Callable[[Token], Union[str, Token]]
|
Callable token processors. |
()
|
Returns:
Type | Description |
---|---|
str
|
A string of the token after being processed. |