
Whisper predictor

This predictor is specifically build for OpenAI Whisper model. See more about Whisper:

Subclass of Predictor.

Module: implementation.predictors

Methods and properties

Main methods and properties



  • model_cfg (WhisperModelConfig): Whisper model configuration. If equals to None, default WhisperModelConfig configuration will be used. Defaults to None.

  • transcription_cfg (Optional[WhisperTranscriptionConfig]): Transcription configuration. If equals to None, default WhisperTranscriptionConfig configuration will be used. Defaults to None.

  • input_class (Type[Input], optional): Class for input validation. Defaults to WhisperInput.

  • output_class (Type[Output], optional): Class for output validation. Defaults to WhisperOutput.

  • name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.


Prebuild configuration that describes default parameters for GLiNER models pipeline. Subclass of Config.



  • name (str, optional): One of the official model names listed by whisper.available_models(), or path to a model checkpoint containing the model dimensions and the model state_dict. Defaults to "base".

  • device (Union[str, torch.device]): The PyTorch device to put the model into.

  • download_root (Optional[str], optional): Path to download the model files; by default, it uses "~/.cache/whisper".

  • in_memory (bool, optional): Whether to preload the model weights into host memory. Defaults to False.


Prebuild configuration that describes default parameters for GLiNER models pipeline. Subclass of Config.



  • verbose (Optional[bool]): Whether to display the text being decoded to the console. If True, displays all the details, If False, displays minimal details. If None, does not display anything.

  • temperature (Union[float, Tuple[float, ...]]): Temperature for sampling. It can be a tuple of temperatures, which will be successively used upon failures according to either compression_ratio_threshold or logprob_threshold.

  • compression_ratio_threshold (float): If the gzip compression ratio is above this value, treat as failed.

  • logprob_threshold (float): If the average log probability over sampled tokens is below this value, treat as failed.

  • no_speech_threshold (float): If the no_speech probability is higher than this value AND the average log probability over sampled tokens is below logprob_threshold, consider the segment as silent.

  • condition_on_previous_text (bool): If True, the previous output of the model is provided as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop, such as repetition looping or timestamps going out of sync.

  • word_timestamps (bool): Extract word-level timestamps using the cross-attention pattern and dynamic time warping, and include the timestamps for each word in each segment.

  • prepend_punctuations (str): If word_timestamps is True, merge these punctuation symbols with the next word.

  • append_punctuations (str): If word_timestamps is True, merge these punctuation symbols with the previous word.

  • decode_options (Optional[Dict[str, Any]]): Keyword arguments to construct DecodingOptions instances.


Subclass of IOModel.



  • audio (NDArray[Any]): Audio waveform.

  • initial_prompt (Optional[str], optional): Optional text to provide as a prompt for the first window. This can be used to provide, or "prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns to make it more likely to predict those word correctly.


Subclass of IOModel.



  • text (str): Transcribed text.

Last updated