How it fits together
Twilio’s<Connect><Stream> opens a bidirectional WebSocket to your server:
it sends the caller’s audio as base64 G.711 μ-law at 8 kHz, and accepts the
agent’s audio back the same way. Omni speaks PCM16. Run Omni at 8 kHz
(?rate=8000) and the only conversion you do is μ-law companding — no
resampling, because both sides are already at 8 kHz.
This guide is correct on transport, the Twilio Media Streams message types, the
codec/rate math, and the Omni event behaviors. The exact JSON payloads of Omni
events (e.g. the
dtmf and transfer_to_human frames) are defined by the
Omni wire protocol — we isolate them in one place so
they’re trivial to update.Prerequisites
A Twilio number
A voice-capable phone number in your Twilio console, plus your Account SID and
Auth Token (for the transfer step).
An Omni agent + key
An
agent_id and a pyai_test_ key. Set them as environment variables on the
server — never hard-code keys.Project layout
package.json
alawmulaw does the G.711 companding (mulaw.decode → Int16, mulaw.encode →
μ-law bytes); ws is the client socket to Omni; twilio is only used for the
REST transfer.
Build it
Serve TwiML that opens a bidirectional stream
When Twilio receives the call it fetches TwiML from your Set the number’s A call comes in webhook to
/voice route.
<Connect><Stream> (not <Start><Stream>) gives you a two-way socket so you
can send the agent’s audio back.server.js (TwiML route)
https://<PUBLIC_HOST>/voice (HTTP POST) in the Twilio console.Bridge the media WebSocket to Omni
Twilio connects to
/media and sends JSON frames: start (carries
streamSid + callSid), media (base64 μ-law), dtmf, and stop. For each
caller frame, decode μ-law → PCM16 and forward the raw bytes to Omni.server.js (caller → agent)
Relay agent audio back to the caller
Omni sends agent audio as binary PCM16 frames and session state as
text JSON. Encode PCM16 → μ-law, base64 it, and send a Twilio
media
message tagged with the streamSid.server.js (agent → caller)
Barge-in, DTMF, and transfer to a human
Three behaviors live in one event handler. Barge-in is the important one:
when the caller talks over the agent, Omni sends
flush. Twilio buffers
outbound audio, so you must tell it to drop what’s queued with a clear
message — otherwise the agent keeps talking over the caller.server.js (Omni events + helpers)
The
flush, dtmf, transfer_to_human, and session_ending event names
are stable; the exact JSON fields are defined in the
Omni wire protocol. forwardDtmf and the event
cases are the only places you’d touch when the frame reference lands.Run it
https://<PUBLIC_HOST>/voice, then
call the number. You should hear the agent greet you within a second of the call
connecting. Talk over it to confirm barge-in cuts the agent off; press a key to
confirm DTMF flows through.
Codec & rate notes
- μ-law ↔ PCM16 only. Twilio is 8 kHz μ-law; running Omni at
rate=8000means you never resample —mulaw.decode/mulaw.encodeis the whole codec path. If you ever bridge an 8 kHz leg to a 16 kHz Omni session you’d upsample 2:1 (16000 / 8000 = 2); for Twilio, don’t — keep both at 8 kHz. - Frame size. Twilio sends ~20 ms (160 μ-law bytes) per
mediamessage. Relay agent audio in similar ~20 ms chunks for smooth playback; sending huge bursts can make Twilio’s jitter buffer stutter. - Tag every outbound
mediawith thestreamSidfrom thestartevent, or Twilio drops it silently.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| Silence both ways | Used <Start><Stream> (one-way) | Use <Connect><Stream> for a bidirectional socket |
| Caller hears nothing from the agent | Missing/incorrect streamSid on outbound media | Capture streamSid from the start event and tag every media message |
| Agent keeps talking over the caller | flush not wired to Twilio clear | Send { event: "clear", streamSid } on every flush |
| Garbled / static audio | Skipped μ-law companding or wrong rate | Decode/encode with mulaw; keep Omni at rate=8000 |
| WS to Omni closes at once | Bad key or missing scope | Check the close code in Errors & limits; 4401 = bad key, 4403 = the key lacks the omni:session scope |
| Transfer does nothing | Wrong callSid or REST creds | Use start.callSid; verify TWILIO_ACCOUNT_SID/TWILIO_AUTH_TOKEN |
429 concurrency_limit_exceeded under load | Plan concurrency cap | Pool/limit live calls or raise the cap |
Next steps
Omni wire protocol
Exact event payloads and close codes.
Browser voice agent
The same agent, in the browser with WebRTC.
FreeSWITCH integration
Fork SIP/PSTN audio into Omni at 16 kHz.
Errors & limits
Rate limits, concurrency, and retries.