Knowledgator UTCA
KnowledgatorGitHubDiscord
  • Welcome to UTCA documentation!
  • Quickstart
  • Concepts
    • Components
    • Types of components
    • ExecutionSchema
    • Context
    • Scopes
  • Development and Contribution
    • Contribution
    • Future relises
  • Framework structure
  • Core
    • Component
    • BaseExecutor
    • Action
    • Executable
    • Evaluator
    • Memory management
    • Schemas
    • Exceptions
  • Structural components
    • ExecutionSchema
    • Switch
    • ForEach
    • Filter
    • While
    • Condition
    • BREAK
    • Log
  • Base Actions
    • Flush
    • AddData
    • RenameAttribute
    • RenameAttributeQuery
    • SetValue
    • UnpackValue
    • NestToKey
    • ExecuteFunction
  • Predictors
    • Predictor
    • Transformers predictors
    • Transformers schemas
    • TokenSearcherPredictor
    • ComprehendItPredictor
    • GLiNERPredictor
    • OpenAIChatGPTPredictor
    • OpenAIWhisperPredictor
  • Tasks
    • Task
    • ComprehendIt
    • TokenSearcherTextCleaner
    • TokenSearcherNER
    • TokenSearcherQandA
    • TokenSearcherRelationExtraction
    • GLiNER
    • GLiNERRelationExtraction
    • GLiNERQandA
    • OpenAIChat
    • WhisperSpeechToText
    • TransformersTextToSpeech
    • TransformersChartsAndPlotsAnalysis
    • TransformersDocumentQandA
    • TransformersImageClassification
    • TransformersVisualQandA
    • TransformersObjectDetection
    • TransformersTextEmbedding
    • TransformersEntityLinking
    • TransformersTokenClassifier
    • TransformersTextSummarization
    • TransformersTextualQandA
    • TransformersTextClassification
    • TransformersChat
    • Objects
    • Chat tasks utilities
    • Relation extraction tasks utilities
  • Executable Schemas
    • SemanticSearchSchema
    • Web2Meaning
    • RequestsHTML
  • Datasources
    • Audio
    • DB
      • SQL
      • Neo4j
      • Chroma
      • Qdrant
    • Google Documents
    • Google Sheets
    • Image
    • Index
    • JSON
    • PDF
    • Plain text
    • Video
  • Conditions
    • RePattern
    • SemanticCondition
  • APIs
    • GoogleCloudClient
  • Integrations
    • Google Cloud
  • Examples
    • Basic image classification
    • Text to speech
    • PDF document processing
Powered by GitBook
On this page
  • Module: implementation.predictors
  • Methods and properties
  • __init__
  • WhisperModelConfig
  • __init__
  • WhisperTranscriptionConfig
  • __init__
  • WhisperInput
  • __init__
  • WhisperOutput
  • __init__
  1. Predictors

OpenAIWhisperPredictor

Whisper predictor

PreviousOpenAIChatGPTPredictorNextTasks

Last updated 1 year ago

This predictor is specifically build for OpenAI Whisper model. See more about Whisper:

Subclass of .

Module: .predictors

Methods and properties

Main methods and properties


__init__

Arguments:

  • name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.




WhisperModelConfig


__init__

Arguments:

  • name (str, optional): One of the official model names listed by whisper.available_models(), or path to a model checkpoint containing the model dimensions and the model state_dict. Defaults to "base".

  • device (Union[str, torch.device]): The PyTorch device to put the model into.

  • download_root (Optional[str], optional): Path to download the model files; by default, it uses "~/.cache/whisper".

  • in_memory (bool, optional): Whether to preload the model weights into host memory. Defaults to False.




WhisperTranscriptionConfig


__init__

Arguments:

  • verbose (Optional[bool]): Whether to display the text being decoded to the console. If True, displays all the details, If False, displays minimal details. If None, does not display anything.

  • temperature (Union[float, Tuple[float, ...]]): Temperature for sampling. It can be a tuple of temperatures, which will be successively used upon failures according to either compression_ratio_threshold or logprob_threshold.

  • compression_ratio_threshold (float): If the gzip compression ratio is above this value, treat as failed.

  • logprob_threshold (float): If the average log probability over sampled tokens is below this value, treat as failed.

  • no_speech_threshold (float): If the no_speech probability is higher than this value AND the average log probability over sampled tokens is below logprob_threshold, consider the segment as silent.

  • condition_on_previous_text (bool): If True, the previous output of the model is provided as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop, such as repetition looping or timestamps going out of sync.

  • word_timestamps (bool): Extract word-level timestamps using the cross-attention pattern and dynamic time warping, and include the timestamps for each word in each segment.

  • prepend_punctuations (str): If word_timestamps is True, merge these punctuation symbols with the next word.

  • append_punctuations (str): If word_timestamps is True, merge these punctuation symbols with the previous word.

  • decode_options (Optional[Dict[str, Any]]): Keyword arguments to construct DecodingOptions instances.




WhisperInput


__init__

Arguments:

  • audio (NDArray[Any]): Audio waveform.

  • initial_prompt (Optional[str], optional): Optional text to provide as a prompt for the first window. This can be used to provide, or "prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns to make it more likely to predict those word correctly.




WhisperOutput


__init__

Arguments:

  • text (str): Transcribed text.



model_cfg (): Whisper model configuration. If equals to None, default configuration will be used. Defaults to None.

transcription_cfg (Optional[]): Transcription configuration. If equals to None, default configuration will be used. Defaults to None.

input_class (Type[], optional): Class for input validation. Defaults to .

output_class (Type[], optional): Class for output validation. Defaults to .

Prebuild configuration that describes default parameters for GLiNER models pipeline. Subclass of .

Prebuild configuration that describes default parameters for GLiNER models pipeline. Subclass of .

Subclass of.

Subclass of.

WhisperModelConfig
WhisperModelConfig
WhisperTranscriptionConfig
WhisperTranscriptionConfig
Predictor
WhisperInput
WhisperOutput
implementation
Input
Output
Config
Config
IOModel
IOModel
GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak SupervisionGitHub
Logo