Quickstart

For this example will be used simple ExecutionSchema with TokenSearcherNER task. This program will extract entities with provided labels and threshold.

To create program follow this steps:

1. Install package

pip install -U utca

2. Import modules that will be used

from utca.core import (
    AddData,
    RenameAttribute,
    Flush
)
from utca.implementation.predictors import (
    TokenSearcherPredictor, TokenSearcherPredictorConfig
)
from utca.implementation.tasks import (
    TokenSearcherNER,
    TokenSearcherNERPostprocessor,
)

3. Initialize components with desired configurations

Predictor that will be used by NER task

predictor = TokenSearcherPredictor(
    TokenSearcherPredictorConfig(
        device="cpu"
    )
)

NER task

ner_task = TokenSearcherNER(
    predictor=predictor,
    postprocess=TokenSearcherNERPostprocessor(
        threshold=0.5
    )
)

Here, we set up a task using the created predictor and define a postprocess chain with a predefined threshold.

Alternatively, we can create an NER task without describing the configuration or predictor by simply:

ner_task = TokenSearcherNER()

It will create a default task, which differs from the one described above only by the threshold value, which defaults to 0.

To learn more about default parameters, refer to:

4. Create ExecutionSchema

pipeline = (        
    AddData({"labels": ["scientist", "university", "city"]})         
    | ner_task
    | Flush(keys=["labels"])
    | RenameAttribute("output", "entities")
)

Here we described pipeline that will:

  1. Add labels to input data with values ["scientist", "university", "city"]

  2. Execute NER task

  3. Remove labels from results

  4. Rename output to entities

5. Run created pipeline

res = pipeline.run({
    "text": """Dr. Paul Hammond, a renowned neurologist at Johns Hopkins University, has recently published a paper in the prestigious journal "Nature Neuroscience". 
His research focuses on a rare genetic mutation, found in less than 0.01% of the population, that appears to prevent the development of Alzheimer's disease. Collaborating with researchers at the University of California, San Francisco, the team is now working to understand the mechanism by which this mutation confers its protective effect. 
Funded by the National Institutes of Health, their research could potentially open new avenues for Alzheimer's treatment."""
})

Here, we run pupline with input text.

Note that text and labels keys are expected by TokenSearcherNER task described above. Refer to class description in Used components section.

Result should look similar to:

{
    "text": """Dr. Paul Hammond, a renowned neurologist at Johns Hopkins University, has recently published a paper in the prestigious journal "Nature Neuroscience". 
His research focuses on a rare genetic mutation, found in less than 0.01% of the population, that appears to prevent the development of Alzheimer's disease. Collaborating with researchers at the University of California, San Francisco, the team is now working to understand the mechanism by which this mutation confers its protective effect. 
Funded by the National Institutes of Health, their research could potentially open new avenues for Alzheimer's treatment.""", 
    "entities": [
        {
            "start": 4, 
            "end": 16, 
            "span": "Paul Hammond",
            "score": 0.5637074708938599, 
            "entity": "scientist"
        }, 
        {
            "start": 44, 
            "end": 68, 
            "span": "Johns Hopkins University", 
            "score": 0.8921091556549072, 
            "entity": "university"
        }, 
        {
            "start": 347, 
            "end": 371,
            "span": "University of California",
            "score": 0.7202138900756836, 
            "entity": "university"
        }, 
        {
            "start": 373, 
            "end": 386,
            "span": "San Francisco",
            "score": 0.7660449743270874, 
            "entity": "city"
        }
    ]
}

Used components

What next

Explore more about components and concepts on the following pages, or jump to class descriptions and more advanced examples.

Last updated