OpenAIWhisperPredictor

Whisper predictor

This predictor is specifically build for OpenAI Whisper model. See more about Whisper:

GitHub - openai/whisper: Robust Speech Recognition via Large-Scale Weak SupervisionGitHub

Subclass of Predictor.

Module: implementation.predictors

Methods and properties

Main methods and properties

init

Arguments:

model_cfg (WhisperModelConfig): Whisper model configuration. If equals to None, default WhisperModelConfig configuration will be used. Defaults to None.
transcription_cfg (Optional[WhisperTranscriptionConfig]): Transcription configuration. If equals to None, default WhisperTranscriptionConfig configuration will be used. Defaults to None.
input_class (Type[Input], optional): Class for input validation. Defaults to WhisperInput.
output_class (Type[Output], optional): Class for output validation. Defaults to WhisperOutput.
name (Optional[str], optional): Name for identification. If equals to None, class name will be used. Defaults to None.

WhisperModelConfig

Prebuild configuration that describes default parameters for GLiNER models pipeline. Subclass of Config.

init

Arguments:

name (str, optional): One of the official model names listed by whisper.available_models(), or path to a model checkpoint containing the model dimensions and the model state_dict. Defaults to "base".
device (Union[str, torch.device]): The PyTorch device to put the model into.
download_root (Optional[str], optional): Path to download the model files; by default, it uses "~/.cache/whisper".
in_memory (bool, optional): Whether to preload the model weights into host memory. Defaults to False.

WhisperTranscriptionConfig

Prebuild configuration that describes default parameters for GLiNER models pipeline. Subclass of Config.

init

Arguments:

verbose (Optional[bool]): Whether to display the text being decoded to the console. If True, displays all the details, If False, displays minimal details. If None, does not display anything.
temperature (Union[float, Tuple[float, ...]]): Temperature for sampling. It can be a tuple of temperatures, which will be successively used upon failures according to either compression_ratio_threshold or logprob_threshold.
compression_ratio_threshold (float): If the gzip compression ratio is above this value, treat as failed.
logprob_threshold (float): If the average log probability over sampled tokens is below this value, treat as failed.
no_speech_threshold (float): If the no_speech probability is higher than this value AND the average log probability over sampled tokens is below logprob_threshold, consider the segment as silent.
condition_on_previous_text (bool): If True, the previous output of the model is provided as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop, such as repetition looping or timestamps going out of sync.
word_timestamps (bool): Extract word-level timestamps using the cross-attention pattern and dynamic time warping, and include the timestamps for each word in each segment.
prepend_punctuations (str): If word_timestamps is True, merge these punctuation symbols with the next word.
append_punctuations (str): If word_timestamps is True, merge these punctuation symbols with the previous word.
decode_options (Optional[Dict[str, Any]]): Keyword arguments to construct DecodingOptions instances.

WhisperInput

Subclass of IOModel.

init

Arguments:

audio (NDArray[Any]): Audio waveform.
initial_prompt (Optional[str], optional): Optional text to provide as a prompt for the first window. This can be used to provide, or "prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns to make it more likely to predict those word correctly.

WhisperOutput

Subclass of IOModel.

init

Arguments:

text (str): Transcribed text.

PreviousOpenAIChatGPTPredictor NextTasks

Last updated 1 year ago

Module: implementation.predictors

Methods and properties

__init__

Arguments:

WhisperModelConfig

__init__

Arguments:

WhisperTranscriptionConfig

__init__

Arguments:

WhisperInput

__init__

Arguments:

WhisperOutput

__init__

Arguments:

init

init

init

init

init