OpenAIWhisperPredictor
Whisper predictor
This predictor is specifically build for OpenAI Whisper model. See more about Whisper:
Subclass of Predictor.
Module: implementation.predictors
Methods and properties
Main methods and properties
__init__
Arguments:
model_cfg (WhisperModelConfig): Whisper model configuration. If equals to None, default WhisperModelConfig configuration will be used. Defaults to None.
transcription_cfg (Optional[WhisperTranscriptionConfig]): Transcription configuration. If equals to None, default WhisperTranscriptionConfig configuration will be used. Defaults to None.
input_class (Type[Input], optional): Class for input validation. Defaults to WhisperInput.
output_class (Type[Output], optional): Class for output validation. Defaults to WhisperOutput.
name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.
WhisperModelConfig
Prebuild configuration that describes default parameters for GLiNER models pipeline. Subclass of Config.
__init__
Arguments:
name (str, optional): One of the official model names listed by whisper.available_models(), or path to a model checkpoint containing the model dimensions and the model state_dict. Defaults to "base".
device (Union[str, torch.device]): The PyTorch device to put the model into.
download_root (Optional[str], optional): Path to download the model files; by default, it uses "~/.cache/whisper".
in_memory (bool, optional): Whether to preload the model weights into host memory. Defaults to False.
WhisperTranscriptionConfig
Prebuild configuration that describes default parameters for GLiNER models pipeline. Subclass of Config.
__init__
Arguments:
verbose (Optional[bool]): Whether to display the text being decoded to the console. If True, displays all the details, If False, displays minimal details. If None, does not display anything.
temperature (Union[float, Tuple[float, ...]]): Temperature for sampling. It can be a tuple of temperatures, which will be successively used upon failures according to either compression_ratio_threshold or logprob_threshold.
compression_ratio_threshold (float): If the gzip compression ratio is above this value, treat as failed.
logprob_threshold (float): If the average log probability over sampled tokens is below this value, treat as failed.
no_speech_threshold (float): If the no_speech probability is higher than this value AND the average log probability over sampled tokens is below logprob_threshold, consider the segment as silent.
condition_on_previous_text (bool): If True, the previous output of the model is provided as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop, such as repetition looping or timestamps going out of sync.
word_timestamps (bool): Extract word-level timestamps using the cross-attention pattern and dynamic time warping, and include the timestamps for each word in each segment.
prepend_punctuations (str): If word_timestamps is True, merge these punctuation symbols with the next word.
append_punctuations (str): If word_timestamps is True, merge these punctuation symbols with the previous word.
decode_options (Optional[Dict[str, Any]]): Keyword arguments to construct DecodingOptions instances.
WhisperInput
Subclass of IOModel.
__init__
Arguments:
audio (NDArray[Any]): Audio waveform.
initial_prompt (Optional[str], optional): Optional text to provide as a prompt for the first window. This can be used to provide, or "prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns to make it more likely to predict those word correctly.
WhisperOutput
Subclass of IOModel.
__init__
Arguments:
text (str): Transcribed text.
Last updated