OpenAIWhisperPredictor
Whisper predictor
Last updated
Whisper predictor
Last updated
This predictor is specifically build for OpenAI Whisper model. See more about Whisper:
Subclass of .
Main methods and properties
name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.
name (str, optional): One of the official model names listed by whisper.available_models(), or path to a model checkpoint containing the model dimensions and the model state_dict. Defaults to "base".
device (Union[str, torch.device]): The PyTorch device to put the model into.
download_root (Optional[str], optional): Path to download the model files; by default, it uses "~/.cache/whisper".
in_memory (bool, optional): Whether to preload the model weights into host memory. Defaults to False.
verbose (Optional[bool]): Whether to display the text being decoded to the console. If True, displays all the details, If False, displays minimal details. If None, does not display anything.
temperature (Union[float, Tuple[float, ...]]): Temperature for sampling. It can be a tuple of temperatures, which will be successively used upon failures according to either compression_ratio_threshold or logprob_threshold.
compression_ratio_threshold (float): If the gzip compression ratio is above this value, treat as failed.
logprob_threshold (float): If the average log probability over sampled tokens is below this value, treat as failed.
no_speech_threshold (float): If the no_speech probability is higher than this value AND the average log probability over sampled tokens is below logprob_threshold, consider the segment as silent.
condition_on_previous_text (bool): If True, the previous output of the model is provided as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop, such as repetition looping or timestamps going out of sync.
word_timestamps (bool): Extract word-level timestamps using the cross-attention pattern and dynamic time warping, and include the timestamps for each word in each segment.
prepend_punctuations (str): If word_timestamps is True, merge these punctuation symbols with the next word.
append_punctuations (str): If word_timestamps is True, merge these punctuation symbols with the previous word.
decode_options (Optional[Dict[str, Any]]): Keyword arguments to construct DecodingOptions instances.
audio (NDArray[Any]): Audio waveform.
initial_prompt (Optional[str], optional): Optional text to provide as a prompt for the first window. This can be used to provide, or "prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns to make it more likely to predict those word correctly.
text (str): Transcribed text.
model_cfg (): Whisper model configuration. If equals to None, default configuration will be used. Defaults to None.
transcription_cfg (Optional[]): Transcription configuration. If equals to None, default configuration will be used. Defaults to None.
input_class (Type[], optional): Class for input validation. Defaults to .
output_class (Type[], optional): Class for output validation. Defaults to .
Prebuild configuration that describes default parameters for GLiNER models pipeline. Subclass of .
Prebuild configuration that describes default parameters for GLiNER models pipeline. Subclass of .
Subclass of.
Subclass of.