Skip to main content
PyAI is telephony-native Voice AI behind one API key: Hear (speech-to-text), Speak (text-to-speech + cloning), Cue (turn detection + knowledge-base context for your own pipeline), and Omni (realtime speech-to-speech voice agents). The API is OpenAI-compatible at https://api.pyai.com/v1. HTTP auth accepts either Authorization: Bearer <key> or the header alias x-api-key: <key>.
1

Get a key

Sign up at console.pyai.com, click Create API key, and copy it (shown once). Use a pyai_test_ sandbox key to start — it works instantly with hard daily caps and no billing.
export PYAI_API_KEY=pyai_test_...
2

Verify it in 5 seconds

GET /v1/me needs no special scope, so it’s the fastest possible first call — and it’s self-diagnosing: it echoes back the org, environment, granted scopes, and credit posture the gateway resolved for your key.
curl https://api.pyai.com/v1/me -H "Authorization: Bearer $PYAI_API_KEY"
A 200 means your key is live everywhere. A 401 means the key is wrong; a 402 is the billing gate (use a pyai_test_ key) — never a broken key.
3

Synthesize speech

curl https://api.pyai.com/v1/audio/speech \
  -H "Authorization: Bearer $PYAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"pyai-voice","input":"Hello from PyAI.","voice":"stock_emma_en_gb"}' \
  --output hello.wav
4

List voices

curl https://api.pyai.com/v1/voices -H "Authorization: Bearer $PYAI_API_KEY"
Use any returned id as the voice above, or create your own with voice cloning (/v1/voice/clones).
5

Talk to a realtime agent (Omni)

Open a WebSocket and pass the key as a subprotocol (browser-safe):
import WebSocket from "ws";
const ws = new WebSocket(
  "wss://api.pyai.com/v1/omni?agent_id=agent_123&format=pcm16&rate=24000",
  [`pyai-key.${process.env.PYAI_API_KEY}`],
);
ws.on("message", (d) => console.log(d.toString())); // hello + session_started
Prefer the official SDKs — they handle auth, retries, idempotency, and realtime for you: npm install @pyai/sdk or pip install pyai-sdk.

Next steps

Build a browser voice agent

Mic → Omni → speakers in ~10 minutes, all client-side.

Authentication

Keys, environments, rotation, revocation.

Pricing & metering

How usage is measured and billed.

Errors & limits

Error codes, rate limits, idempotency.

API reference

Full request/response schemas, right here in the docs.