Skip to main content
POST
/
v1
/
audio
/
speech
Synthesize speech
curl --request POST \
  --url https://api.pyai.com/v1/audio/speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "input": "<string>",
  "model": "pyai-voice",
  "voice": "<string>",
  "response_format": "mp3",
  "sample_rate": 28000,
  "speed": 1,
  "seed": 123,
  "temperature": 123
}
'
"<string>"

Authorizations

Authorization
string
header
required

Use Authorization: Bearer pyai_live_... (or pyai_test_...).

Body

application/json
input
string
required

Text to synthesize.

model
string
default:pyai-voice
voice
string

A stock voice id from GET /v1/voices (e.g. stock_emma_en_gb) or a cloned voice id (e.g. voice_abc) created via /v1/voice/clones. Omit to use the account's default voice.

response_format
enum<string>
default:mp3

Output audio format. The response Content-Type varies by format (audio/wav, audio/mpeg, audio/ogg, audio/aac, audio/flac, audio/pcm, audio/basic). pcm returns raw, headerless 16-bit little-endian mono samples (no container) at sample_rate — the format voice-agent orchestrators (e.g. Vapi custom-voice, LiveKit/Pipecat) feed directly into their pipelines. g711_ulaw/g711_alaw return raw, headerless G.711 telephony audio at a fixed 8 kHz mono (for Twilio/Plivo/FreeSWITCH); sample_rate does not apply and is rejected unless set to 8000.

Available options:
wav,
mp3,
opus,
aac,
flac,
pcm,
g711_ulaw,
g711_alaw
sample_rate
integer

Optional output sample rate in Hz (8000-48000), e.g. 8000/16000 for telephony or 24000 for wideband. Omit to use the native 24 kHz. Most relevant with response_format: pcm. Does not apply to g711_ulaw/g711_alaw, which are always 8 kHz mono (a conflicting value is rejected).

Required range: 8000 <= x <= 48000
speed
number
default:1
seed
integer

Optional determinism seed for reproducible eval runs. Forwarded to the engine and honored once the engine supports it (PLATFORM_ASK_EVALS_ENGINE); no effect when omitted.

temperature
number

Optional sampling temperature for reproducible eval runs. Forwarded to the engine and honored once the engine supports it (PLATFORM_ASK_EVALS_ENGINE); no effect when omitted.

Response

Audio stream. The Content-Type varies by response_format: audio/wav (wav), audio/mpeg (mp3), audio/ogg (opus), audio/aac (aac), audio/flac (flac), audio/pcm (pcm — raw/headerless), and audio/basic (g711_ulaw/g711_alaw — raw/headerless G.711 at 8 kHz mono).

The response is of type file.