Synthesize speech
OpenAI-compatible text-to-speech. Returns audio bytes. Requires the voice:synthesize scope.
Authorizations
Use Authorization: Bearer pyai_live_... (or pyai_test_...).
Body
Text to synthesize.
A stock voice id from GET /v1/voices (e.g. stock_emma_en_gb) or a cloned voice id (e.g. voice_abc) created via /v1/voice/clones. Omit to use the account's default voice.
Output audio format. The response Content-Type varies by format (audio/wav, audio/mpeg, audio/ogg, audio/aac, audio/flac, audio/pcm, audio/basic). pcm returns raw, headerless 16-bit little-endian mono samples (no container) at sample_rate — the format voice-agent orchestrators (e.g. Vapi custom-voice, LiveKit/Pipecat) feed directly into their pipelines. g711_ulaw/g711_alaw return raw, headerless G.711 telephony audio at a fixed 8 kHz mono (for Twilio/Plivo/FreeSWITCH); sample_rate does not apply and is rejected unless set to 8000.
wav, mp3, opus, aac, flac, pcm, g711_ulaw, g711_alaw Optional output sample rate in Hz (8000-48000), e.g. 8000/16000 for telephony or 24000 for wideband. Omit to use the native 24 kHz. Most relevant with response_format: pcm. Does not apply to g711_ulaw/g711_alaw, which are always 8 kHz mono (a conflicting value is rejected).
8000 <= x <= 48000Optional determinism seed for reproducible eval runs. Forwarded to the engine and honored once the engine supports it (PLATFORM_ASK_EVALS_ENGINE); no effect when omitted.
Optional sampling temperature for reproducible eval runs. Forwarded to the engine and honored once the engine supports it (PLATFORM_ASK_EVALS_ENGINE); no effect when omitted.
Response
Audio stream. The Content-Type varies by response_format: audio/wav (wav), audio/mpeg (mp3), audio/ogg (opus), audio/aac (aac), audio/flac (flac), audio/pcm (pcm — raw/headerless), and audio/basic (g711_ulaw/g711_alaw — raw/headerless G.711 at 8 kHz mono).
The response is of type file.