Quickstart
For this example will be used simple ExecutionSchema with TokenSearcherNER task. This program will extract entities with provided labels and threshold.
To create program follow this steps:
1. Install package
pip install -U utca
2. Import modules that will be used
from utca.core import (
AddData,
RenameAttribute,
Flush
)
from utca.implementation.predictors import (
TokenSearcherPredictor, TokenSearcherPredictorConfig
)
from utca.implementation.tasks import (
TokenSearcherNER,
TokenSearcherNERPostprocessor,
)
3. Initialize components with desired configurations
Predictor that will be used by NER task
predictor = TokenSearcherPredictor(
TokenSearcherPredictorConfig(
device="cpu"
)
)
NER task
ner_task = TokenSearcherNER(
predictor=predictor,
postprocess=TokenSearcherNERPostprocessor(
threshold=0.5
)
)
Here, we set up a task using the created predictor and define a postprocess chain with a predefined threshold.
Alternatively, we can create an NER task without describing the configuration or predictor by simply:
ner_task = TokenSearcherNER()
It will create a default task, which differs from the one described above only by the threshold value, which defaults to 0.
To learn more about default parameters, refer to:
TokenSearcherNER4. Create ExecutionSchema
pipeline = (
AddData({"labels": ["scientist", "university", "city"]})
| ner_task
| Flush(keys=["labels"])
| RenameAttribute("output", "entities")
)
Here we described pipeline that will:
Add labels to input data with values ["scientist", "university", "city"]
Execute NER task
Remove labels from results
Rename output to entities
5. Run created pipeline
res = pipeline.run({
"text": """Dr. Paul Hammond, a renowned neurologist at Johns Hopkins University, has recently published a paper in the prestigious journal "Nature Neuroscience".
His research focuses on a rare genetic mutation, found in less than 0.01% of the population, that appears to prevent the development of Alzheimer's disease. Collaborating with researchers at the University of California, San Francisco, the team is now working to understand the mechanism by which this mutation confers its protective effect.
Funded by the National Institutes of Health, their research could potentially open new avenues for Alzheimer's treatment."""
})
Here, we run pupline with input text.
Result should look similar to:
{
"text": """Dr. Paul Hammond, a renowned neurologist at Johns Hopkins University, has recently published a paper in the prestigious journal "Nature Neuroscience".
His research focuses on a rare genetic mutation, found in less than 0.01% of the population, that appears to prevent the development of Alzheimer's disease. Collaborating with researchers at the University of California, San Francisco, the team is now working to understand the mechanism by which this mutation confers its protective effect.
Funded by the National Institutes of Health, their research could potentially open new avenues for Alzheimer's treatment.""",
"entities": [
{
"start": 4,
"end": 16,
"span": "Paul Hammond",
"score": 0.5637074708938599,
"entity": "scientist"
},
{
"start": 44,
"end": 68,
"span": "Johns Hopkins University",
"score": 0.8921091556549072,
"entity": "university"
},
{
"start": 347,
"end": 371,
"span": "University of California",
"score": 0.7202138900756836,
"entity": "university"
},
{
"start": 373,
"end": 386,
"span": "San Francisco",
"score": 0.7660449743270874,
"entity": "city"
}
]
}
Used components
TokenSearcherPredictorTokenSearcherNERAddDataFlushRenameAttributeExecutionSchemaWhat next
Explore more about components and concepts on the following pages, or jump to class descriptions and more advanced examples.
ConceptsExamplesLast updated