RequestAudioSpeech
Function
Description: βThis function asynchronously requests audio speech synthesis from OpenAI's Audio Speech API, utilizing specified parameters such as API key, model, input text, voice, optional instructions, folder path, and file name. The synthesized speech is saved as an MP3 file in the designated location.β
Parameters:
- apiKey (String): Your API key for accessing OpenAI's Audio Speech API.β
- model (String): The model used for speech synthesis. Options include:β
tts-1
: Optimized for real-time text-to-speech applications.
tts-1-hd
: Optimized for higher-quality speech synthesis
gpt-4o-mini-tts
: Built on GPT-4o mini, offering fast and powerful text-to-speech capabilities.
- textInput (String): The text to be synthesized into speech.β
- voice (String): The voice selection for audio generation. Supported voices are:β
alloy
β
ash
β
ballad
coral
β
echo
β
fable
β
onyx
β
nova
β
sage
β
shimmer
β
- instructions (String, optional): You can provide specific instructions to potentially influence the tone, style, or delivery of the speech (e.g., "Speak in a cheerful and positive tone."). Leave blank if not needed.
- folderPath (String): The directory path where the resulting MP3 file will be saved.β
- fileName (String): The name assigned to the saved MP3 file.
StreamAudioSpeech
Description
Streams audio speech synthesis from OpenAI in real-time and plays it directly through the device speaker. This function is optimized for low latency playback, starting audio almost immediately as it arrives.
It specifically uses the gpt-4o-mini-tts model and requests audio in PCM format, which is required for the direct playback mechanism. The function incorporates pre-buffering, frame alignment, boundary smoothing (cross-fading), and a final fade-out to minimize glitches and provide a smoother listening experience.
Use the separate StopStream block to halt playback manually.
Parameters
- apiKey (Type: Text)
- Required. Your unique API key for accessing OpenAI services.
- textInput (Type: Text)
- Required. The text content you want the AI to speak.
- voice (Type: Text)
- Required. The desired voice for the speech synthesis. Valid options include: alloy, echo, fable, onyx, nova, shimmer. (Note: The underlying model might support more voices than the older TTS-1).
- instructions (Type: Text)
- Optional. You can provide specific instructions to potentially influence the tone, style, or delivery of the speech (e.g., "Speak in a cheerful and positive tone."). Leave blank if not needed.
- Note: The model and responseFormat are fixed internally by this function:
- Model: gpt-4o-mini-tts
- Response Format: pcm (Raw Pulse Code Modulation audio data)
Events Triggered
This function can trigger the following events:
- StartedAudioStream()
- Fired shortly after the request is made, once enough initial audio data (pre-buffer) has been received and playback is about to begin. Use this to indicate to the user that the audio is starting.
- FinishedAudioStream()
- Fired only when the entire audio stream has been successfully received from OpenAI and played through to the end naturally (without being manually stopped).
- AudioStreamError(errorMessage Text)
- Fired if any error occurs during the process:
- Invalid API key or network issues.
- Errors reported by the OpenAI API.
- Problems initializing the device's audio player (AudioTrack).
- Errors during data writing or processing.
- The errorMessage parameter provides details about the failure.
- StoppedAudioStream()
- Important: This event is NOT directly fired by StreamAudioSpeech itself upon completion. It is fired when you explicitly call the StopStream block to manually interrupt the ongoing audio playback. If the stream is stopped manually, FinishedAudioStream will not be fired.
How It Works (Simplified)
- Request: Sends your text and parameters to OpenAI, specifically requesting a pcm audio stream from the gpt-4o-mini-tts model.
- Pre-buffer: Receives the first small portion of audio data from OpenAI and holds it temporarily.
- Initialize Player: Sets up the device's low-level audio player (AudioTrack) configured for the specific PCM format (e.g., 44.1kHz, 16-bit Mono).
- Write & Play: Writes the pre-buffered data to the player and immediately starts playback.
- Stream Loop: Continuously receives subsequent small chunks of audio data from OpenAI.
- Smooth & Play: For each incoming chunk:
- It checks if there's a large, potentially audible jump (discontinuity) between the end of the last played chunk and the start of the new one.
- If a jump is detected, it applies a very short fade-in to the beginning of the new chunk to smooth the transition, reducing clicks/pops.
- It ensures only complete audio frames (pairs of bytes for 16-bit PCM) are sent to the player.
- The processed chunk is sent to the player to be heard.
- Final Fade-Out: When the stream from OpenAI ends naturally, a short fade-out is applied to the very last segment of audio for a less abrupt ending.
- Finish/Error/Stop: Triggers the appropriate event (FinishedAudioStream, AudioStreamError, or relies on StopStream triggering StoppedAudioStream).
Audio Generation: File Download vs. Real-time Stream