# OpenAIWhisperPredictor

This predictor is specifically build for OpenAI Whisper model. See more about Whisper:

{% embed url="<https://github.com/openai/whisper>" %}

Subclass of [**Predictor**](https://utca.knowledgator.com/predictors/predictor)**.**

## Module: [implementation](https://utca.knowledgator.com/framework-structure#implementation).predictors

## Methods and properties

Main methods and properties

***

### <mark style="color:blue;">\_\_init\_\_</mark>

#### Arguments:

* <mark style="color:orange;">**model\_cfg**</mark>**&#x20;(**[**WhisperModelConfig**](#whispermodelconfig)**):** Whisper model configuration. If equals to None, default [**WhisperModelConfig**](#whispermodelconfig) configuration will be used. Defaults to None.
* <mark style="color:orange;">**transcription\_cfg**</mark>**&#x20;(Optional\[**[**WhisperTranscriptionConfig**](#whispertranscriptionconfig)**]):** Transcription configuration. If equals to None, default [**WhisperTranscriptionConfig**](#whispertranscriptionconfig) configuration will be used. Defaults to None.
* <mark style="color:orange;">**input\_class**</mark>**&#x20;(Type\[**[**Input**](https://utca.knowledgator.com/core/schemas#input)**], optional):** Class for input validation. Defaults to [**WhisperInput**](#whisperinput).
* <mark style="color:orange;">**output\_class**</mark>**&#x20;(Type\[**[**Output**](https://utca.knowledgator.com/core/schemas#output)**], optional):** Class for output validation. Defaults to [**WhisperOutput**](#whisperoutput).
* <mark style="color:orange;">**name**</mark>**&#x20;(Optional\[str], optional):** Name for identification. If equals to None, class name will be used. Defaults to None.

***

***

***

## <mark style="color:green;">WhisperModelConfig</mark>

Prebuild configuration that describes default parameters for GLiNER models pipeline. Subclass of [**Config**](https://utca.knowledgator.com/core/schemas#config).

***

### <mark style="color:blue;">\_\_init\_\_</mark>

#### Arguments:

* <mark style="color:orange;">**name**</mark>**&#x20;(str, optional):** One of the official model names listed by whisper.available\_models(), or path to a model checkpoint containing the model dimensions and the model state\_dict. Defaults to "base".
* <mark style="color:orange;">**device**</mark>**&#x20;(Union\[str, torch.device]):** The PyTorch device to put the model into.
* <mark style="color:orange;">**download\_root**</mark>**&#x20;(Optional\[str], optional):** Path to download the model files; by default, it uses "\~/.cache/whisper".
* <mark style="color:orange;">**in\_memory**</mark>**&#x20;(bool, optional):** Whether to preload the model weights into host memory. Defaults to False.

***

***

***

## <mark style="color:green;">WhisperTranscriptionConfig</mark>

Prebuild configuration that describes default parameters for GLiNER models pipeline. Subclass of [**Config**](https://utca.knowledgator.com/core/schemas#config).

***

### <mark style="color:blue;">\_\_init\_\_</mark>

#### Arguments:

* <mark style="color:orange;">**verbose**</mark>**&#x20;(Optional\[bool]):** Whether to display the text being decoded to the console. If True, displays all the details, If False, displays minimal details. If None, does not display anything.
* <mark style="color:orange;">**temperature**</mark>**&#x20;(Union\[float, Tuple\[float, ...]]):** Temperature for sampling. It can be a tuple of temperatures, which will be successively used upon failures according to either compression\_ratio\_threshold or logprob\_threshold.
* <mark style="color:orange;">**compression\_ratio\_threshold**</mark>**&#x20;(float):** If the gzip compression ratio is above this value, treat as failed.
* <mark style="color:orange;">**logprob\_threshold**</mark>**&#x20;(float):** If the average log probability over sampled tokens is below this value, treat as failed.
* <mark style="color:orange;">**no\_speech\_threshold**</mark>**&#x20;(float):** If the no\_speech probability is higher than this value AND the average log probability over sampled tokens is below logprob\_threshold, consider the segment as silent.
* <mark style="color:orange;">**condition\_on\_previous\_text**</mark>**&#x20;(bool):** If True, the previous output of the model is provided as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop, such as repetition looping or timestamps going out of sync.
* <mark style="color:orange;">**word\_timestamps**</mark>**&#x20;(bool):** Extract word-level timestamps using the cross-attention pattern and dynamic time warping, and include the timestamps for each word in each segment.
* <mark style="color:orange;">**prepend\_punctuations**</mark>**&#x20;(str):** If word\_timestamps is True, merge these punctuation symbols with the next word.
* <mark style="color:orange;">**append\_punctuations**</mark>**&#x20;(str):** If word\_timestamps is True, merge these punctuation symbols with the previous word.
* <mark style="color:orange;">**decode\_options**</mark>**&#x20;(Optional\[Dict\[str, Any]]):** Keyword arguments to construct DecodingOptions instances.

***

***

***

## <mark style="color:green;">WhisperInput</mark>

Subclass of[ **IOModel**](https://utca.knowledgator.com/core/schemas#iomodel).

***

### <mark style="color:blue;">\_\_init\_\_</mark>

#### Arguments:

* <mark style="color:orange;">**audio**</mark>**&#x20;(NDArray\[Any]):** Audio waveform.
* <mark style="color:orange;">**initial\_prompt**</mark>**&#x20;(Optional\[str], optional):** Optional text to provide as a prompt for the first window. This can be used to provide, or "prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns to make it more likely to predict those word correctly.

***

***

***

## <mark style="color:green;">WhisperOutput</mark>

Subclass of[ **IOModel**](https://utca.knowledgator.com/core/schemas#iomodel).

***

### <mark style="color:blue;">\_\_init\_\_</mark>

#### Arguments:

* <mark style="color:orange;">**text**</mark>**&#x20;(str):** Transcribed text.

***

***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://utca.knowledgator.com/predictors/openaiwhisperpredictor.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
