OpenAIWhisperPredictor
Whisper predictor
Last updated
Whisper predictor
Last updated
This predictor is specifically build for OpenAI Whisper model. See more about Whisper:
Subclass of Predictor.
Main methods and properties
model_cfg (WhisperModelConfig): Whisper model configuration. If equals to None, default WhisperModelConfig configuration will be used. Defaults to None.
transcription_cfg (Optional[WhisperTranscriptionConfig]): Transcription configuration. If equals to None, default WhisperTranscriptionConfig configuration will be used. Defaults to None.
input_class (Type[Input], optional): Class for input validation. Defaults to WhisperInput.
output_class (Type[Output], optional): Class for output validation. Defaults to WhisperOutput.
name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.
Prebuild configuration that describes default parameters for GLiNER models pipeline. Subclass of Config.
name (str, optional): One of the official model names listed by whisper.available_models(), or path to a model checkpoint containing the model dimensions and the model state_dict. Defaults to "base".
device (Union[str, torch.device]): The PyTorch device to put the model into.
download_root (Optional[str], optional): Path to download the model files; by default, it uses "~/.cache/whisper".
in_memory (bool, optional): Whether to preload the model weights into host memory. Defaults to False.
Prebuild configuration that describes default parameters for GLiNER models pipeline. Subclass of Config.
verbose (Optional[bool]): Whether to display the text being decoded to the console. If True, displays all the details, If False, displays minimal details. If None, does not display anything.
temperature (Union[float, Tuple[float, ...]]): Temperature for sampling. It can be a tuple of temperatures, which will be successively used upon failures according to either compression_ratio_threshold or logprob_threshold.
compression_ratio_threshold (float): If the gzip compression ratio is above this value, treat as failed.
logprob_threshold (float): If the average log probability over sampled tokens is below this value, treat as failed.
no_speech_threshold (float): If the no_speech probability is higher than this value AND the average log probability over sampled tokens is below logprob_threshold, consider the segment as silent.
condition_on_previous_text (bool): If True, the previous output of the model is provided as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop, such as repetition looping or timestamps going out of sync.
word_timestamps (bool): Extract word-level timestamps using the cross-attention pattern and dynamic time warping, and include the timestamps for each word in each segment.
prepend_punctuations (str): If word_timestamps is True, merge these punctuation symbols with the next word.
append_punctuations (str): If word_timestamps is True, merge these punctuation symbols with the previous word.
decode_options (Optional[Dict[str, Any]]): Keyword arguments to construct DecodingOptions instances.
Subclass of IOModel.
audio (NDArray[Any]): Audio waveform.
initial_prompt (Optional[str], optional): Optional text to provide as a prompt for the first window. This can be used to provide, or "prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns to make it more likely to predict those word correctly.
Subclass of IOModel.
text (str): Transcribed text.