Skip to main content
Point a real phone number at a small Node server, and let callers talk to an Omni agent. This guide builds the bridge end to end: Twilio streams the call’s audio to your server over a WebSocket, you transcode and relay it to Omni, and you relay Omni’s voice back to the caller — with barge-in, DTMF, and “transfer me to a person” all wired up.

How it fits together

Twilio’s <Connect><Stream> opens a bidirectional WebSocket to your server: it sends the caller’s audio as base64 G.711 μ-law at 8 kHz, and accepts the agent’s audio back the same way. Omni speaks PCM16. Run Omni at 8 kHz (?rate=8000) and the only conversion you do is μ-law companding — no resampling, because both sides are already at 8 kHz.
This guide is correct on transport, the Twilio Media Streams message types, the codec/rate math, and the Omni event behaviors. The exact JSON payloads of Omni events (e.g. the dtmf and transfer_to_human frames) are defined by the Omni wire protocol — we isolate them in one place so they’re trivial to update.

Prerequisites

1

A Twilio number

A voice-capable phone number in your Twilio console, plus your Account SID and Auth Token (for the transfer step).
2

An Omni agent + key

An agent_id and a pyai_test_ key. Set them as environment variables on the server — never hard-code keys.
3

A public URL

Twilio must reach your server over TLS. For local dev, tunnel with ngrok http 8080 and use the https/wss host it prints.

Project layout

twilio-omni-bridge/
├── server.js        # TwiML endpoint + Twilio↔Omni WebSocket bridge
├── package.json
└── .env             # PYAI_API_KEY, PYAI_AGENT, TWILIO_* , PUBLIC_HOST
package.json
{
  "name": "twilio-omni-bridge",
  "type": "module",
  "scripts": { "start": "node server.js" },
  "dependencies": {
    "@fastify/websocket": "^11.0.0",
    "alawmulaw": "^6.0.0",
    "dotenv": "^16.4.0",
    "fastify": "^5.0.0",
    "twilio": "^5.0.0",
    "ws": "^8.18.0"
  }
}
alawmulaw does the G.711 companding (mulaw.decode → Int16, mulaw.encode → μ-law bytes); ws is the client socket to Omni; twilio is only used for the REST transfer.

Build it

1

Serve TwiML that opens a bidirectional stream

When Twilio receives the call it fetches TwiML from your /voice route. <Connect><Stream> (not <Start><Stream>) gives you a two-way socket so you can send the agent’s audio back.
server.js (TwiML route)
import "dotenv/config";
import Fastify from "fastify";
import websocket from "@fastify/websocket";
import WebSocket from "ws";
import { mulaw } from "alawmulaw";
import twilio from "twilio";

const { PUBLIC_HOST, PYAI_API_KEY, PYAI_AGENT } = process.env;

const app = Fastify();
await app.register(websocket);

app.post("/voice", (req, reply) => {
  const twiml = `
    <Response>
      <Connect>
        <Stream url="wss://${PUBLIC_HOST}/media" />
      </Connect>
    </Response>`;
  reply.type("text/xml").send(twiml.trim());
});
Set the number’s A call comes in webhook to https://<PUBLIC_HOST>/voice (HTTP POST) in the Twilio console.
2

Bridge the media WebSocket to Omni

Twilio connects to /media and sends JSON frames: start (carries streamSid + callSid), media (base64 μ-law), dtmf, and stop. For each caller frame, decode μ-law → PCM16 and forward the raw bytes to Omni.
server.js (caller → agent)
const omniURL =
  `wss://api.pyai.com/v1/omni?agent_id=${PYAI_AGENT}` +
  `&format=pcm16&rate=8000`;

app.get("/media", { websocket: true }, (twilioWS) => {
  let streamSid = null;
  let callSid = null;

  const omni = new WebSocket(omniURL, [`pyai-key.${PYAI_API_KEY}`]);

  twilioWS.on("message", (raw) => {
    const msg = JSON.parse(raw.toString());
    switch (msg.event) {
      case "start":
        streamSid = msg.start.streamSid;
        callSid = msg.start.callSid;
        break;

      case "media": {
        // base64 μ-law (8 kHz) → Int16 PCM (8 kHz) → Omni (binary)
        const ulaw = Buffer.from(msg.media.payload, "base64");
        const pcm = mulaw.decode(ulaw); // Int16Array
        if (omni.readyState === WebSocket.OPEN) {
          omni.send(Buffer.from(pcm.buffer, pcm.byteOffset, pcm.byteLength));
        }
        break;
      }

      case "dtmf":
        // Caller pressed a key. Forward the digit to Omni so the agent can
        // react (exact dtmf frame: see the protocol reference).
        forwardDtmf(omni, msg.dtmf.digit);
        break;

      case "stop":
        omni.close();
        break;
    }
  });

  // ...agent → caller wiring in the next step (same scope)
3

Relay agent audio back to the caller

Omni sends agent audio as binary PCM16 frames and session state as text JSON. Encode PCM16 → μ-law, base64 it, and send a Twilio media message tagged with the streamSid.
server.js (agent → caller)
  omni.on("message", (data, isBinary) => {
    if (isBinary) {
      // Omni PCM16 (8 kHz) → μ-law → base64 → Twilio
      const pcm = new Int16Array(
        data.buffer, data.byteOffset, data.byteLength / 2,
      );
      const ulaw = mulaw.encode(pcm); // Uint8Array
      twilioWS.send(JSON.stringify({
        event: "media",
        streamSid,
        media: { payload: Buffer.from(ulaw).toString("base64") },
      }));
    } else {
      handleOmniEvent(JSON.parse(data.toString()), { twilioWS, streamSid, callSid });
    }
  });

  twilioWS.on("close", () => omni.close());
  omni.on("close", () => twilioWS.close());
});
4

Barge-in, DTMF, and transfer to a human

Three behaviors live in one event handler. Barge-in is the important one: when the caller talks over the agent, Omni sends flush. Twilio buffers outbound audio, so you must tell it to drop what’s queued with a clear message — otherwise the agent keeps talking over the caller.
server.js (Omni events + helpers)
function forwardDtmf(omni, digit) {
  // Localized: shape per the Omni wire protocol reference.
  if (omni.readyState === WebSocket.OPEN) {
    omni.send(JSON.stringify({ type: "dtmf", digit }));
  }
}

function handleOmniEvent(evt, { twilioWS, streamSid, callSid }) {
  switch (evt.type) {
    case "hello":
    case "session_started":
      break;

    case "flush":
      // Barge-in: flush Twilio's outbound buffer so the agent stops mid-word.
      twilioWS.send(JSON.stringify({ event: "clear", streamSid }));
      break;

    case "transfer_to_human":
      transferCall(callSid); // redirect the live call to a person
      break;

    case "session_ending":
      twilioWS.close();
      break;
  }
}

const rest = twilio(process.env.TWILIO_ACCOUNT_SID, process.env.TWILIO_AUTH_TOKEN);

function transferCall(callSid) {
  // Replace the live call's TwiML with a <Dial> to a human.
  rest.calls(callSid).update({
    twiml: `<Response><Dial>${process.env.HUMAN_NUMBER}</Dial></Response>`,
  });
}

app.listen({ port: 8080, host: "0.0.0.0" });
The flush, dtmf, transfer_to_human, and session_ending event names are stable; the exact JSON fields are defined in the Omni wire protocol. forwardDtmf and the event cases are the only places you’d touch when the frame reference lands.

Run it

npm install
ngrok http 8080            # in one terminal — copy the https host
PUBLIC_HOST=<your-ngrok-host> npm start   # in another
Set your Twilio number’s voice webhook to https://<PUBLIC_HOST>/voice, then call the number. You should hear the agent greet you within a second of the call connecting. Talk over it to confirm barge-in cuts the agent off; press a key to confirm DTMF flows through.

Codec & rate notes

  • μ-law ↔ PCM16 only. Twilio is 8 kHz μ-law; running Omni at rate=8000 means you never resample — mulaw.decode/mulaw.encode is the whole codec path. If you ever bridge an 8 kHz leg to a 16 kHz Omni session you’d upsample 2:1 (16000 / 8000 = 2); for Twilio, don’t — keep both at 8 kHz.
  • Frame size. Twilio sends ~20 ms (160 μ-law bytes) per media message. Relay agent audio in similar ~20 ms chunks for smooth playback; sending huge bursts can make Twilio’s jitter buffer stutter.
  • Tag every outbound media with the streamSid from the start event, or Twilio drops it silently.

Troubleshooting

SymptomLikely causeFix
Silence both waysUsed <Start><Stream> (one-way)Use <Connect><Stream> for a bidirectional socket
Caller hears nothing from the agentMissing/incorrect streamSid on outbound mediaCapture streamSid from the start event and tag every media message
Agent keeps talking over the callerflush not wired to Twilio clearSend { event: "clear", streamSid } on every flush
Garbled / static audioSkipped μ-law companding or wrong rateDecode/encode with mulaw; keep Omni at rate=8000
WS to Omni closes at onceBad key or missing scopeCheck the close code in Errors & limits; 4401 = bad key, 4403 = the key lacks the omni:session scope
Transfer does nothingWrong callSid or REST credsUse start.callSid; verify TWILIO_ACCOUNT_SID/TWILIO_AUTH_TOKEN
429 concurrency_limit_exceeded under loadPlan concurrency capPool/limit live calls or raise the cap

Next steps

Omni wire protocol

Exact event payloads and close codes.

Browser voice agent

The same agent, in the browser with WebRTC.

FreeSWITCH integration

Fork SIP/PSTN audio into Omni at 16 kHz.

Errors & limits

Rate limits, concurrency, and retries.