Synthesize speech

curl --request POST \ --url https://api.pyai.com/v1/audio/speech \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data ' { "input": "<string>", "model": "pyai-voice", "voice": "<string>", "response_format": "mp3", "sample_rate": 28000, "speed": 1, "seed": 123, "temperature": 123 } '

Authorizations

Authorization

string

header

required

Use Authorization: Bearer pyai_live_... (or pyai_test_...).

Body

application/json

input

string

required

Text to synthesize.

model

string

default:pyai-voice

voice

string

A stock voice id from GET /v1/voices (e.g. stock_emma_en_gb) or a cloned voice id (e.g. voice_abc) created via /v1/voice/clones. Omit to use the account's default voice.

response_format

enum<string>

default:mp3

Output audio format. The response Content-Type varies by format (audio/wav, audio/mpeg, audio/ogg, audio/aac, audio/flac, audio/pcm, audio/basic). pcm returns raw, headerless 16-bit little-endian mono samples (no container) at sample_rate — the format voice-agent orchestrators (e.g. Vapi custom-voice, LiveKit/Pipecat) feed directly into their pipelines. g711_ulaw/g711_alaw return raw, headerless G.711 telephony audio at a fixed 8 kHz mono (for Twilio/Plivo/FreeSWITCH); sample_rate does not apply and is rejected unless set to 8000.

Available options:

wav,

mp3,

opus,

aac,

flac,

pcm,

g711_ulaw,

g711_alaw

sample_rate

integer

Optional output sample rate in Hz (8000-48000), e.g. 8000/16000 for telephony or 24000 for wideband. Omit to use the native 24 kHz. Most relevant with response_format: pcm. Does not apply to g711_ulaw/g711_alaw, which are always 8 kHz mono (a conflicting value is rejected).

Required range: 8000 <= x <= 48000

speed

number

default:1

seed

integer

Optional determinism seed for reproducible eval runs. Forwarded to the engine and honored once the engine supports it (PLATFORM_ASK_EVALS_ENGINE); no effect when omitted.

temperature

number

Optional sampling temperature for reproducible eval runs. Forwarded to the engine and honored once the engine supports it (PLATFORM_ASK_EVALS_ENGINE); no effect when omitted.

Response

Audio stream. The Content-Type varies by response_format: audio/wav (wav), audio/mpeg (mp3), audio/ogg (opus), audio/aac (aac), audio/flac (flac), audio/pcm (pcm — raw/headerless), and audio/basic (g711_ulaw/g711_alaw — raw/headerless G.711 at 8 kHz mono).

The response is of type file.