openai.resources.audio
The audio
module provides classes for audio processing operations like transcription, translation, and speech synthesis.
Use the audio
module by accessing the OpenAI.audio
attribute on the client object, and
then access the attribute for the feature you'd like to use:
OpenAI.audio.transcriptions
- Transcribe spoken audio to textOpenAI.audio.translations
- Translate spoken audio to EnglishOpenAI.audio.speech
- Generate spoken audio from text
Examples:
from pathlib import Path
from openai import OpenAI
openai = OpenAI()
speech_file_path = Path(__file__).parent / "speech.mp3"
# Create text-to-speech audio file
with openai.audio.speech.with_streaming_response.create(
model="tts-1",
voice="alloy",
input="The quick brown fox jumps over the lazy dog.",
) as response:
response.stream_to_file(speech_file_path)
Modules:
Name | Description |
---|---|
audio |
The Use the
Examples:
|
speech |
|
transcriptions |
|
translations |
|
Classes:
AsyncAudio
AsyncAudio(client: AsyncOpenAI)
Methods:
Name | Description |
---|---|
speech |
|
transcriptions |
|
translations |
|
with_raw_response |
|
with_streaming_response |
|
AsyncAudioWithRawResponse
AsyncAudioWithRawResponse(audio: AsyncAudio)
AsyncAudioWithStreamingResponse
AsyncAudioWithStreamingResponse(audio: AsyncAudio)
AsyncSpeech
AsyncSpeech(client: AsyncOpenAI)
Methods:
Name | Description |
---|---|
create |
Generates audio from the input text. |
with_raw_response |
|
with_streaming_response |
|
create
async
create(
*,
input: str,
model: Union[str, Literal["tts-1", "tts-1-hd"]],
voice: Literal[
"alloy", "echo", "fable", "onyx", "nova", "shimmer"
],
response_format: (
Literal["mp3", "opus", "aac", "flac", "wav", "pcm"]
| NotGiven
) = NOT_GIVEN,
speed: float | NotGiven = NOT_GIVEN,
extra_headers: Headers | None = None,
extra_query: Query | None = None,
extra_body: Body | None = None,
timeout: float | Timeout | None | NotGiven = NOT_GIVEN
) -> HttpxBinaryResponseContent
Generates audio from the input text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input
|
str
|
The text to generate audio for. The maximum length is 4096 characters. |
required |
model
|
Union[str, Literal['tts-1', 'tts-1-hd']]
|
One of the available TTS models:
|
required |
voice
|
Literal['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']
|
The voice to use when generating the audio. Supported voices are |
required |
response_format
|
Literal['mp3', 'opus', 'aac', 'flac', 'wav', 'pcm'] | NotGiven
|
The format to audio in. Supported formats are |
NOT_GIVEN
|
speed
|
float | NotGiven
|
The speed of the generated audio. Select a value from |
NOT_GIVEN
|
extra_headers
|
Headers | None
|
Send extra headers |
None
|
extra_query
|
Query | None
|
Add additional query parameters to the request |
None
|
extra_body
|
Body | None
|
Add additional JSON properties to the request |
None
|
timeout
|
float | Timeout | None | NotGiven
|
Override the client-level default timeout for this request, in seconds |
NOT_GIVEN
|
AsyncSpeechWithRawResponse
AsyncSpeechWithRawResponse(speech: AsyncSpeech)
Attributes:
Name | Type | Description |
---|---|---|
create |
|
AsyncSpeechWithStreamingResponse
AsyncSpeechWithStreamingResponse(speech: AsyncSpeech)
Attributes:
Name | Type | Description |
---|---|---|
create |
|
AsyncTranscriptions
AsyncTranscriptions(client: AsyncOpenAI)
Methods:
Name | Description |
---|---|
create |
Transcribes audio into the input language. |
with_raw_response |
|
with_streaming_response |
|
create
async
create(
*,
file: FileTypes,
model: Union[str, Literal["whisper-1"]],
language: str | NotGiven = NOT_GIVEN,
prompt: str | NotGiven = NOT_GIVEN,
response_format: (
Literal[
"json", "text", "srt", "verbose_json", "vtt"
]
| NotGiven
) = NOT_GIVEN,
temperature: float | NotGiven = NOT_GIVEN,
timestamp_granularities: (
List[Literal["word", "segment"]] | NotGiven
) = NOT_GIVEN,
extra_headers: Headers | None = None,
extra_query: Query | None = None,
extra_body: Body | None = None,
timeout: float | Timeout | None | NotGiven = NOT_GIVEN
) -> Transcription
Transcribes audio into the input language.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
FileTypes
|
The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. |
required |
model
|
Union[str, Literal['whisper-1']]
|
ID of the model to use. Only |
required |
language
|
str | NotGiven
|
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency. |
NOT_GIVEN
|
prompt
|
str | NotGiven
|
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language. |
NOT_GIVEN
|
response_format
|
Literal['json', 'text', 'srt', 'verbose_json', 'vtt'] | NotGiven
|
The format of the transcript output, in one of these options: |
NOT_GIVEN
|
temperature
|
float | NotGiven
|
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit. |
NOT_GIVEN
|
timestamp_granularities
|
List[Literal['word', 'segment']] | NotGiven
|
The timestamp granularities to populate for this transcription.
|
NOT_GIVEN
|
extra_headers
|
Headers | None
|
Send extra headers |
None
|
extra_query
|
Query | None
|
Add additional query parameters to the request |
None
|
extra_body
|
Body | None
|
Add additional JSON properties to the request |
None
|
timeout
|
float | Timeout | None | NotGiven
|
Override the client-level default timeout for this request, in seconds |
NOT_GIVEN
|
AsyncTranscriptionsWithRawResponse
AsyncTranscriptionsWithRawResponse(
transcriptions: AsyncTranscriptions,
)
Attributes:
Name | Type | Description |
---|---|---|
create |
|
AsyncTranscriptionsWithStreamingResponse
AsyncTranscriptionsWithStreamingResponse(
transcriptions: AsyncTranscriptions,
)
Attributes:
Name | Type | Description |
---|---|---|
create |
|
AsyncTranslations
AsyncTranslations(client: AsyncOpenAI)
Methods:
Name | Description |
---|---|
create |
Translates audio into English. |
with_raw_response |
|
with_streaming_response |
|
create
async
create(
*,
file: FileTypes,
model: Union[str, Literal["whisper-1"]],
prompt: str | NotGiven = NOT_GIVEN,
response_format: str | NotGiven = NOT_GIVEN,
temperature: float | NotGiven = NOT_GIVEN,
extra_headers: Headers | None = None,
extra_query: Query | None = None,
extra_body: Body | None = None,
timeout: float | Timeout | None | NotGiven = NOT_GIVEN
) -> Translation
Translates audio into English.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
FileTypes
|
The audio file object (not file name) translate, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. |
required |
model
|
Union[str, Literal['whisper-1']]
|
ID of the model to use. Only |
required |
prompt
|
str | NotGiven
|
An optional text to guide the model's style or continue a previous audio segment. The prompt should be in English. |
NOT_GIVEN
|
response_format
|
str | NotGiven
|
The format of the transcript output, in one of these options: |
NOT_GIVEN
|
temperature
|
float | NotGiven
|
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit. |
NOT_GIVEN
|
extra_headers
|
Headers | None
|
Send extra headers |
None
|
extra_query
|
Query | None
|
Add additional query parameters to the request |
None
|
extra_body
|
Body | None
|
Add additional JSON properties to the request |
None
|
timeout
|
float | Timeout | None | NotGiven
|
Override the client-level default timeout for this request, in seconds |
NOT_GIVEN
|
AsyncTranslationsWithRawResponse
AsyncTranslationsWithRawResponse(
translations: AsyncTranslations,
)
Attributes:
Name | Type | Description |
---|---|---|
create |
|
AsyncTranslationsWithStreamingResponse
AsyncTranslationsWithStreamingResponse(
translations: AsyncTranslations,
)
Attributes:
Name | Type | Description |
---|---|---|
create |
|
Audio
Audio(client: OpenAI)
Methods:
Name | Description |
---|---|
speech |
|
transcriptions |
|
translations |
|
with_raw_response |
|
with_streaming_response |
|
AudioWithRawResponse
AudioWithRawResponse(audio: Audio)
AudioWithStreamingResponse
AudioWithStreamingResponse(audio: Audio)
Speech
Speech(client: OpenAI)
Methods:
Name | Description |
---|---|
create |
Generates audio from the input text. |
with_raw_response |
|
with_streaming_response |
|
create
create(
*,
input: str,
model: Union[str, Literal["tts-1", "tts-1-hd"]],
voice: Literal[
"alloy", "echo", "fable", "onyx", "nova", "shimmer"
],
response_format: (
Literal["mp3", "opus", "aac", "flac", "wav", "pcm"]
| NotGiven
) = NOT_GIVEN,
speed: float | NotGiven = NOT_GIVEN,
extra_headers: Headers | None = None,
extra_query: Query | None = None,
extra_body: Body | None = None,
timeout: float | Timeout | None | NotGiven = NOT_GIVEN
) -> HttpxBinaryResponseContent
Generates audio from the input text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input
|
str
|
The text to generate audio for. The maximum length is 4096 characters. |
required |
model
|
Union[str, Literal['tts-1', 'tts-1-hd']]
|
One of the available TTS models:
|
required |
voice
|
Literal['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']
|
The voice to use when generating the audio. Supported voices are |
required |
response_format
|
Literal['mp3', 'opus', 'aac', 'flac', 'wav', 'pcm'] | NotGiven
|
The format to audio in. Supported formats are |
NOT_GIVEN
|
speed
|
float | NotGiven
|
The speed of the generated audio. Select a value from |
NOT_GIVEN
|
extra_headers
|
Headers | None
|
Send extra headers |
None
|
extra_query
|
Query | None
|
Add additional query parameters to the request |
None
|
extra_body
|
Body | None
|
Add additional JSON properties to the request |
None
|
timeout
|
float | Timeout | None | NotGiven
|
Override the client-level default timeout for this request, in seconds |
NOT_GIVEN
|
SpeechWithRawResponse
SpeechWithRawResponse(speech: Speech)
Attributes:
Name | Type | Description |
---|---|---|
create |
|
SpeechWithStreamingResponse
SpeechWithStreamingResponse(speech: Speech)
Attributes:
Name | Type | Description |
---|---|---|
create |
|
Transcriptions
Transcriptions(client: OpenAI)
Methods:
Name | Description |
---|---|
create |
Transcribes audio into the input language. |
with_raw_response |
|
with_streaming_response |
|
create
create(
*,
file: FileTypes,
model: Union[str, Literal["whisper-1"]],
language: str | NotGiven = NOT_GIVEN,
prompt: str | NotGiven = NOT_GIVEN,
response_format: (
Literal[
"json", "text", "srt", "verbose_json", "vtt"
]
| NotGiven
) = NOT_GIVEN,
temperature: float | NotGiven = NOT_GIVEN,
timestamp_granularities: (
List[Literal["word", "segment"]] | NotGiven
) = NOT_GIVEN,
extra_headers: Headers | None = None,
extra_query: Query | None = None,
extra_body: Body | None = None,
timeout: float | Timeout | None | NotGiven = NOT_GIVEN
) -> Transcription
Transcribes audio into the input language.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
FileTypes
|
The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. |
required |
model
|
Union[str, Literal['whisper-1']]
|
ID of the model to use. Only |
required |
language
|
str | NotGiven
|
The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency. |
NOT_GIVEN
|
prompt
|
str | NotGiven
|
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language. |
NOT_GIVEN
|
response_format
|
Literal['json', 'text', 'srt', 'verbose_json', 'vtt'] | NotGiven
|
The format of the transcript output, in one of these options: |
NOT_GIVEN
|
temperature
|
float | NotGiven
|
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit. |
NOT_GIVEN
|
timestamp_granularities
|
List[Literal['word', 'segment']] | NotGiven
|
The timestamp granularities to populate for this transcription.
|
NOT_GIVEN
|
extra_headers
|
Headers | None
|
Send extra headers |
None
|
extra_query
|
Query | None
|
Add additional query parameters to the request |
None
|
extra_body
|
Body | None
|
Add additional JSON properties to the request |
None
|
timeout
|
float | Timeout | None | NotGiven
|
Override the client-level default timeout for this request, in seconds |
NOT_GIVEN
|
TranscriptionsWithRawResponse
TranscriptionsWithRawResponse(
transcriptions: Transcriptions,
)
Attributes:
Name | Type | Description |
---|---|---|
create |
|
TranscriptionsWithStreamingResponse
TranscriptionsWithStreamingResponse(
transcriptions: Transcriptions,
)
Attributes:
Name | Type | Description |
---|---|---|
create |
|
Translations
Translations(client: OpenAI)
Methods:
Name | Description |
---|---|
create |
Translates audio into English. |
with_raw_response |
|
with_streaming_response |
|
create
create(
*,
file: FileTypes,
model: Union[str, Literal["whisper-1"]],
prompt: str | NotGiven = NOT_GIVEN,
response_format: str | NotGiven = NOT_GIVEN,
temperature: float | NotGiven = NOT_GIVEN,
extra_headers: Headers | None = None,
extra_query: Query | None = None,
extra_body: Body | None = None,
timeout: float | Timeout | None | NotGiven = NOT_GIVEN
) -> Translation
Translates audio into English.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
FileTypes
|
The audio file object (not file name) translate, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. |
required |
model
|
Union[str, Literal['whisper-1']]
|
ID of the model to use. Only |
required |
prompt
|
str | NotGiven
|
An optional text to guide the model's style or continue a previous audio segment. The prompt should be in English. |
NOT_GIVEN
|
response_format
|
str | NotGiven
|
The format of the transcript output, in one of these options: |
NOT_GIVEN
|
temperature
|
float | NotGiven
|
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit. |
NOT_GIVEN
|
extra_headers
|
Headers | None
|
Send extra headers |
None
|
extra_query
|
Query | None
|
Add additional query parameters to the request |
None
|
extra_body
|
Body | None
|
Add additional JSON properties to the request |
None
|
timeout
|
float | Timeout | None | NotGiven
|
Override the client-level default timeout for this request, in seconds |
NOT_GIVEN
|
TranslationsWithRawResponse
TranslationsWithRawResponse(translations: Translations)
Attributes:
Name | Type | Description |
---|---|---|
create |
|
TranslationsWithStreamingResponse
TranslationsWithStreamingResponse(
translations: Translations,
)
Attributes:
Name | Type | Description |
---|---|---|
create |
|