Skip to content

openai.resources.audio

The audio module provides classes for audio processing operations like transcription, translation, and speech synthesis.

Use the audio module by accessing the OpenAI.audio attribute on the client object, and then access the attribute for the feature you'd like to use:

Examples:

from pathlib import Path

from openai import OpenAI

openai = OpenAI()

speech_file_path = Path(__file__).parent / "speech.mp3"

    # Create text-to-speech audio file
    with openai.audio.speech.with_streaming_response.create(
        model="tts-1",
        voice="alloy",
        input="The quick brown fox jumps over the lazy dog.",
    ) as response:
        response.stream_to_file(speech_file_path)

Modules:

Name Description
audio

The audio module provides classes for audio processing operations like transcription, translation, and speech synthesis.

Use the audio module by accessing the OpenAI.audio attribute on the client object, and then access the attribute for the feature you'd like to use:

Examples:

from pathlib import Path

from openai import OpenAI

openai = OpenAI()

speech_file_path = Path(__file__).parent / "speech.mp3"

    # Create text-to-speech audio file
    with openai.audio.speech.with_streaming_response.create(
        model="tts-1",
        voice="alloy",
        input="The quick brown fox jumps over the lazy dog.",
    ) as response:
        response.stream_to_file(speech_file_path)

speech
transcriptions
translations

Classes:

Name Description
AsyncAudio
AsyncAudioWithRawResponse
AsyncAudioWithStreamingResponse
AsyncSpeech
AsyncSpeechWithRawResponse
AsyncSpeechWithStreamingResponse
AsyncTranscriptions
AsyncTranscriptionsWithRawResponse
AsyncTranscriptionsWithStreamingResponse
AsyncTranslations
AsyncTranslationsWithRawResponse
AsyncTranslationsWithStreamingResponse
Audio
AudioWithRawResponse
AudioWithStreamingResponse
Speech
SpeechWithRawResponse
SpeechWithStreamingResponse
Transcriptions
TranscriptionsWithRawResponse
TranscriptionsWithStreamingResponse
Translations
TranslationsWithRawResponse
TranslationsWithStreamingResponse

AsyncAudio

AsyncAudio(client: AsyncOpenAI)

Methods:

Name Description
speech
transcriptions
translations
with_raw_response
with_streaming_response

speech

speech() -> AsyncSpeech

transcriptions

transcriptions() -> AsyncTranscriptions

translations

translations() -> AsyncTranslations

with_raw_response

with_raw_response() -> AsyncAudioWithRawResponse

with_streaming_response

with_streaming_response() -> (
    AsyncAudioWithStreamingResponse
)

AsyncAudioWithRawResponse

AsyncAudioWithRawResponse(audio: AsyncAudio)

Methods:

Name Description
speech
transcriptions
translations

speech

transcriptions

translations

AsyncAudioWithStreamingResponse

AsyncAudioWithStreamingResponse(audio: AsyncAudio)

Methods:

Name Description
speech
transcriptions
translations

transcriptions

translations

AsyncSpeech

AsyncSpeech(client: AsyncOpenAI)

Methods:

Name Description
create

Generates audio from the input text.

with_raw_response
with_streaming_response

create async

create(
    *,
    input: str,
    model: Union[str, Literal["tts-1", "tts-1-hd"]],
    voice: Literal[
        "alloy", "echo", "fable", "onyx", "nova", "shimmer"
    ],
    response_format: (
        Literal["mp3", "opus", "aac", "flac", "wav", "pcm"]
        | NotGiven
    ) = NOT_GIVEN,
    speed: float | NotGiven = NOT_GIVEN,
    extra_headers: Headers | None = None,
    extra_query: Query | None = None,
    extra_body: Body | None = None,
    timeout: float | Timeout | None | NotGiven = NOT_GIVEN
) -> HttpxBinaryResponseContent

Generates audio from the input text.

Parameters:

Name Type Description Default
input str

The text to generate audio for. The maximum length is 4096 characters.

required
model Union[str, Literal['tts-1', 'tts-1-hd']]

One of the available TTS models: tts-1 or tts-1-hd

required
voice Literal['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']

The voice to use when generating the audio. Supported voices are alloy, echo, fable, onyx, nova, and shimmer. Previews of the voices are available in the Text to speech guide.

required
response_format Literal['mp3', 'opus', 'aac', 'flac', 'wav', 'pcm'] | NotGiven

The format to audio in. Supported formats are mp3, opus, aac, flac, wav, and pcm.

NOT_GIVEN
speed float | NotGiven

The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.

NOT_GIVEN
extra_headers Headers | None

Send extra headers

None
extra_query Query | None

Add additional query parameters to the request

None
extra_body Body | None

Add additional JSON properties to the request

None
timeout float | Timeout | None | NotGiven

Override the client-level default timeout for this request, in seconds

NOT_GIVEN

with_raw_response

with_raw_response() -> AsyncSpeechWithRawResponse

with_streaming_response

with_streaming_response() -> (
    AsyncSpeechWithStreamingResponse
)

AsyncSpeechWithRawResponse

AsyncSpeechWithRawResponse(speech: AsyncSpeech)

Attributes:

Name Type Description
create

create instance-attribute

create = async_to_raw_response_wrapper(create)

AsyncSpeechWithStreamingResponse

AsyncSpeechWithStreamingResponse(speech: AsyncSpeech)

Attributes:

Name Type Description
create

create instance-attribute

create = async_to_custom_streamed_response_wrapper(
    create, AsyncStreamedBinaryAPIResponse
)

AsyncTranscriptions

AsyncTranscriptions(client: AsyncOpenAI)

Methods:

Name Description
create

Transcribes audio into the input language.

with_raw_response
with_streaming_response

create async

create(
    *,
    file: FileTypes,
    model: Union[str, Literal["whisper-1"]],
    language: str | NotGiven = NOT_GIVEN,
    prompt: str | NotGiven = NOT_GIVEN,
    response_format: (
        Literal[
            "json", "text", "srt", "verbose_json", "vtt"
        ]
        | NotGiven
    ) = NOT_GIVEN,
    temperature: float | NotGiven = NOT_GIVEN,
    timestamp_granularities: (
        List[Literal["word", "segment"]] | NotGiven
    ) = NOT_GIVEN,
    extra_headers: Headers | None = None,
    extra_query: Query | None = None,
    extra_body: Body | None = None,
    timeout: float | Timeout | None | NotGiven = NOT_GIVEN
) -> Transcription

Transcribes audio into the input language.

Parameters:

Name Type Description Default
file FileTypes

The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

required
model Union[str, Literal['whisper-1']]

ID of the model to use. Only whisper-1 (which is powered by our open source Whisper V2 model) is currently available.

required
language str | NotGiven

The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

NOT_GIVEN
prompt str | NotGiven

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

NOT_GIVEN
response_format Literal['json', 'text', 'srt', 'verbose_json', 'vtt'] | NotGiven

The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.

NOT_GIVEN
temperature float | NotGiven

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

NOT_GIVEN
timestamp_granularities List[Literal['word', 'segment']] | NotGiven

The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.

NOT_GIVEN
extra_headers Headers | None

Send extra headers

None
extra_query Query | None

Add additional query parameters to the request

None
extra_body Body | None

Add additional JSON properties to the request

None
timeout float | Timeout | None | NotGiven

Override the client-level default timeout for this request, in seconds

NOT_GIVEN

with_raw_response

with_raw_response() -> AsyncTranscriptionsWithRawResponse

with_streaming_response

with_streaming_response() -> (
    AsyncTranscriptionsWithStreamingResponse
)

AsyncTranscriptionsWithRawResponse

AsyncTranscriptionsWithRawResponse(
    transcriptions: AsyncTranscriptions,
)

Attributes:

Name Type Description
create

create instance-attribute

create = async_to_raw_response_wrapper(create)

AsyncTranscriptionsWithStreamingResponse

AsyncTranscriptionsWithStreamingResponse(
    transcriptions: AsyncTranscriptions,
)

Attributes:

Name Type Description
create

create instance-attribute

create = async_to_streamed_response_wrapper(create)

AsyncTranslations

AsyncTranslations(client: AsyncOpenAI)

Methods:

Name Description
create

Translates audio into English.

with_raw_response
with_streaming_response

create async

create(
    *,
    file: FileTypes,
    model: Union[str, Literal["whisper-1"]],
    prompt: str | NotGiven = NOT_GIVEN,
    response_format: str | NotGiven = NOT_GIVEN,
    temperature: float | NotGiven = NOT_GIVEN,
    extra_headers: Headers | None = None,
    extra_query: Query | None = None,
    extra_body: Body | None = None,
    timeout: float | Timeout | None | NotGiven = NOT_GIVEN
) -> Translation

Translates audio into English.

Parameters:

Name Type Description Default
file FileTypes

The audio file object (not file name) translate, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

required
model Union[str, Literal['whisper-1']]

ID of the model to use. Only whisper-1 (which is powered by our open source Whisper V2 model) is currently available.

required
prompt str | NotGiven

An optional text to guide the model's style or continue a previous audio segment. The prompt should be in English.

NOT_GIVEN
response_format str | NotGiven

The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.

NOT_GIVEN
temperature float | NotGiven

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

NOT_GIVEN
extra_headers Headers | None

Send extra headers

None
extra_query Query | None

Add additional query parameters to the request

None
extra_body Body | None

Add additional JSON properties to the request

None
timeout float | Timeout | None | NotGiven

Override the client-level default timeout for this request, in seconds

NOT_GIVEN

with_raw_response

with_raw_response() -> AsyncTranslationsWithRawResponse

with_streaming_response

with_streaming_response() -> (
    AsyncTranslationsWithStreamingResponse
)

AsyncTranslationsWithRawResponse

AsyncTranslationsWithRawResponse(
    translations: AsyncTranslations,
)

Attributes:

Name Type Description
create

create instance-attribute

create = async_to_raw_response_wrapper(create)

AsyncTranslationsWithStreamingResponse

AsyncTranslationsWithStreamingResponse(
    translations: AsyncTranslations,
)

Attributes:

Name Type Description
create

create instance-attribute

create = async_to_streamed_response_wrapper(create)

Audio

Audio(client: OpenAI)

Methods:

Name Description
speech
transcriptions
translations
with_raw_response
with_streaming_response

speech

speech() -> Speech

transcriptions

transcriptions() -> Transcriptions

translations

translations() -> Translations

with_raw_response

with_raw_response() -> AudioWithRawResponse

with_streaming_response

with_streaming_response() -> AudioWithStreamingResponse

AudioWithRawResponse

AudioWithRawResponse(audio: Audio)

Methods:

Name Description
speech
transcriptions
translations

speech

transcriptions

transcriptions() -> TranscriptionsWithRawResponse

translations

translations() -> TranslationsWithRawResponse

AudioWithStreamingResponse

AudioWithStreamingResponse(audio: Audio)

Methods:

Name Description
speech
transcriptions
translations

speech

transcriptions

translations

Speech

Speech(client: OpenAI)

Methods:

Name Description
create

Generates audio from the input text.

with_raw_response
with_streaming_response

create

create(
    *,
    input: str,
    model: Union[str, Literal["tts-1", "tts-1-hd"]],
    voice: Literal[
        "alloy", "echo", "fable", "onyx", "nova", "shimmer"
    ],
    response_format: (
        Literal["mp3", "opus", "aac", "flac", "wav", "pcm"]
        | NotGiven
    ) = NOT_GIVEN,
    speed: float | NotGiven = NOT_GIVEN,
    extra_headers: Headers | None = None,
    extra_query: Query | None = None,
    extra_body: Body | None = None,
    timeout: float | Timeout | None | NotGiven = NOT_GIVEN
) -> HttpxBinaryResponseContent

Generates audio from the input text.

Parameters:

Name Type Description Default
input str

The text to generate audio for. The maximum length is 4096 characters.

required
model Union[str, Literal['tts-1', 'tts-1-hd']]

One of the available TTS models: tts-1 or tts-1-hd

required
voice Literal['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']

The voice to use when generating the audio. Supported voices are alloy, echo, fable, onyx, nova, and shimmer. Previews of the voices are available in the Text to speech guide.

required
response_format Literal['mp3', 'opus', 'aac', 'flac', 'wav', 'pcm'] | NotGiven

The format to audio in. Supported formats are mp3, opus, aac, flac, wav, and pcm.

NOT_GIVEN
speed float | NotGiven

The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.

NOT_GIVEN
extra_headers Headers | None

Send extra headers

None
extra_query Query | None

Add additional query parameters to the request

None
extra_body Body | None

Add additional JSON properties to the request

None
timeout float | Timeout | None | NotGiven

Override the client-level default timeout for this request, in seconds

NOT_GIVEN

with_raw_response

with_raw_response() -> SpeechWithRawResponse

with_streaming_response

with_streaming_response() -> SpeechWithStreamingResponse

SpeechWithRawResponse

SpeechWithRawResponse(speech: Speech)

Attributes:

Name Type Description
create

create instance-attribute

create = to_raw_response_wrapper(create)

SpeechWithStreamingResponse

SpeechWithStreamingResponse(speech: Speech)

Attributes:

Name Type Description
create

create instance-attribute

create = to_custom_streamed_response_wrapper(
    create, StreamedBinaryAPIResponse
)

Transcriptions

Transcriptions(client: OpenAI)

Methods:

Name Description
create

Transcribes audio into the input language.

with_raw_response
with_streaming_response

create

create(
    *,
    file: FileTypes,
    model: Union[str, Literal["whisper-1"]],
    language: str | NotGiven = NOT_GIVEN,
    prompt: str | NotGiven = NOT_GIVEN,
    response_format: (
        Literal[
            "json", "text", "srt", "verbose_json", "vtt"
        ]
        | NotGiven
    ) = NOT_GIVEN,
    temperature: float | NotGiven = NOT_GIVEN,
    timestamp_granularities: (
        List[Literal["word", "segment"]] | NotGiven
    ) = NOT_GIVEN,
    extra_headers: Headers | None = None,
    extra_query: Query | None = None,
    extra_body: Body | None = None,
    timeout: float | Timeout | None | NotGiven = NOT_GIVEN
) -> Transcription

Transcribes audio into the input language.

Parameters:

Name Type Description Default
file FileTypes

The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

required
model Union[str, Literal['whisper-1']]

ID of the model to use. Only whisper-1 (which is powered by our open source Whisper V2 model) is currently available.

required
language str | NotGiven

The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.

NOT_GIVEN
prompt str | NotGiven

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

NOT_GIVEN
response_format Literal['json', 'text', 'srt', 'verbose_json', 'vtt'] | NotGiven

The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.

NOT_GIVEN
temperature float | NotGiven

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

NOT_GIVEN
timestamp_granularities List[Literal['word', 'segment']] | NotGiven

The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.

NOT_GIVEN
extra_headers Headers | None

Send extra headers

None
extra_query Query | None

Add additional query parameters to the request

None
extra_body Body | None

Add additional JSON properties to the request

None
timeout float | Timeout | None | NotGiven

Override the client-level default timeout for this request, in seconds

NOT_GIVEN

with_raw_response

with_raw_response() -> TranscriptionsWithRawResponse

with_streaming_response

with_streaming_response() -> (
    TranscriptionsWithStreamingResponse
)

TranscriptionsWithRawResponse

TranscriptionsWithRawResponse(
    transcriptions: Transcriptions,
)

Attributes:

Name Type Description
create

create instance-attribute

create = to_raw_response_wrapper(create)

TranscriptionsWithStreamingResponse

TranscriptionsWithStreamingResponse(
    transcriptions: Transcriptions,
)

Attributes:

Name Type Description
create

create instance-attribute

create = to_streamed_response_wrapper(create)

Translations

Translations(client: OpenAI)

Methods:

Name Description
create

Translates audio into English.

with_raw_response
with_streaming_response

create

create(
    *,
    file: FileTypes,
    model: Union[str, Literal["whisper-1"]],
    prompt: str | NotGiven = NOT_GIVEN,
    response_format: str | NotGiven = NOT_GIVEN,
    temperature: float | NotGiven = NOT_GIVEN,
    extra_headers: Headers | None = None,
    extra_query: Query | None = None,
    extra_body: Body | None = None,
    timeout: float | Timeout | None | NotGiven = NOT_GIVEN
) -> Translation

Translates audio into English.

Parameters:

Name Type Description Default
file FileTypes

The audio file object (not file name) translate, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

required
model Union[str, Literal['whisper-1']]

ID of the model to use. Only whisper-1 (which is powered by our open source Whisper V2 model) is currently available.

required
prompt str | NotGiven

An optional text to guide the model's style or continue a previous audio segment. The prompt should be in English.

NOT_GIVEN
response_format str | NotGiven

The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.

NOT_GIVEN
temperature float | NotGiven

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

NOT_GIVEN
extra_headers Headers | None

Send extra headers

None
extra_query Query | None

Add additional query parameters to the request

None
extra_body Body | None

Add additional JSON properties to the request

None
timeout float | Timeout | None | NotGiven

Override the client-level default timeout for this request, in seconds

NOT_GIVEN

with_raw_response

with_raw_response() -> TranslationsWithRawResponse

with_streaming_response

with_streaming_response() -> (
    TranslationsWithStreamingResponse
)

TranslationsWithRawResponse

TranslationsWithRawResponse(translations: Translations)

Attributes:

Name Type Description
create

create instance-attribute

create = to_raw_response_wrapper(create)

TranslationsWithStreamingResponse

TranslationsWithStreamingResponse(
    translations: Translations,
)

Attributes:

Name Type Description
create

create instance-attribute

create = to_streamed_response_wrapper(create)