Short version: English is GA and benchmarked across every product. Other
languages — including Indian languages and code-mixed speech — are an active
area of work: the
language hint is accepted and forwarded to the engine, but
recognition accuracy is not yet published, and native non-English voices
(TTS) are not in the catalog yet. For non-English voice today, the supported
pattern is Cue (turn detection + your knowledge base) + your own LLM and TTS.Status at a glance
| Product | English | Other languages (STT hint) | Native non-English voices |
|---|---|---|---|
| Hear — speech-to-text | GA · benchmarked | Accepted as a hint · accuracy not yet published | n/a |
| Cue — turn detection + KB grounding | GA · benchmarked | Accepted as a hint · accuracy not yet published | n/a (you bring TTS) |
| Omni — agentic voice (STT + brain + TTS) | GA · end-to-end | Roadmap (end-to-end non-English) | Roadmap |
| Speak — text-to-speech | GA | n/a | Not yet — English voices only (incl. Indian-English accents) |
language parameter is forwarded to the engine and
may work, but we make no published accuracy guarantee yet. Roadmap = planned,
not available today.
Hear & Cue (speech-to-text)
The streaming endpoint (GET /v1/audio/transcriptions/stream) and the batch
endpoint (POST /v1/transcription/jobs) accept an optional language parameter:
languageis an ISO-639-1 hint forwarded to the engine.- It is one hint per session — there is no mid-session auto-detect and no per-turn language switching today.
- English is the GA, benchmarked language. Published English accuracy: ~1.6% WER on clean audio and ~4.8% WER on an 8 kHz telephony/accented corpus (see Benchmarks).
- Other ISO-639-1 codes are accepted and forwarded, but we do not yet publish accuracy for them. Treat non-English STT as unmeasured until the numbers land on the benchmarks page.
Indian languages & code-mixed speech
This is a frequent request, so we are explicit about it. Today PyAI does not publish word-error-rate numbers for Hindi, Tamil, Telugu, Bengali, Marathi, Kannada, Malayalam, Gujarati, or Punjabi, nor for code-mixed Hinglish / Tanglish (Roman + native script mixed with English). Thelanguage hint will
accept these codes, but you should not assume production accuracy until we
publish measured results.
Omni (agentic voice)
Omni runs the full loop — speech-to-text, the brain, and text-to-speech — on the PyAI engine. Today that loop is English. Multilingual Omni (non-English end-to-end) is on the roadmap. Theconfigure frame has a language field
reserved for this, but it is not yet honored — sending it today is a no-op
(see the Omni protocol reference).
For a non-English voice agent right now, use the composable path:
- Cue — stream call audio for turn detection (and optional knowledge-base
grounding) on
GET /v1/audio/transcriptions/stream. - Your LLM — generate the reply from the Cue transcript + grounding.
- Your TTS — synthesize the reply in the target language and play it back.
Speak (text-to-speech)
The Speak catalog (GET /v1/voices) is English voices only today. This
includes several Indian-English accent voices (filter ?region=india) — but
those speak English (language: "en"), not Hindi/Tamil/etc.
There is no native Indian-language (or other non-English) TTS in the catalog
yet, and POST /v1/audio/speech has no language parameter. For non-English
speech output, bring your own TTS in the composable path above. Non-English
voices and non-English voice cloning are on the roadmap.
How accuracy gets published
We gate engine quality in CI with an offline benchmark harness (evals/) and
publish the headline numbers on the Benchmarks
page. As languages are measured, their numbers appear there and this matrix is
updated. If you need a number that isn’t published yet, ask — an unpublished
number means “not measured to our bar,” not “hidden.”
See also
Stream speech-to-text
Feed live call audio into Hear / Cue streaming.
Omni wire protocol
Connect, configure, and the live-vs-roadmap field table.
Telephony audio (8 kHz)
μ-law ↔ PCM16 at 8 kHz for phone legs.
Pricing & metering
Per-second billing and the rate card.