> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pyai.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Answering-machine detection (WebSocket)

> Realtime answering-machine detection over a WebSocket. **This surface speaks Twilio's Media Streams protocol natively** (`start` / `media` / `stop` frames, G.711 μ-law 8 kHz base64, ~20 ms), so migrating from Twilio AMD is a one-line-TwiML change, point the call's media at PyAI, keep your carrier and your code.

```xml
<Response><Connect>
  <Stream url="wss://api.pyai.com/v1/amd/stream">
    <Parameter name="aggressiveness" value="0.25"/>
    <Parameter name="webhook" value="https://you/amd-events"/>
  </Stream>
</Connect></Response>
```

Authenticate with the `Sec-WebSocket-Protocol: pyai-key.<API_KEY>` subprotocol (or `?api_key=` server-side). Requires the `amd:detect` scope. Mid-call, PyAI pushes an `amd` decision event on the socket (and to your `webhook`): `answered_by` (PyAI's richer vocabulary), `answered_by_twilio` (Twilio's exact `AnsweredBy` enum for drop-in routing parity), `confidence`, `decision_ms`, and a human-readable `reason`. The per-call `aggressiveness` `<Parameter>` overrides the account default from `POST /v1/amd/config`.

Billed per **answered** call (`amd.calls`), the first 5,000 answered calls/month are free, then $0.004/answered call; AMD bundled with PyAI telephony/Omni is included.



## OpenAPI

````yaml https://api.pyai.com/openapi.json get /v1/amd/stream
openapi: 3.1.0
info:
  title: PyAI API
  version: 1.0.0
  description: >-
    Telephony-native Voice AI behind one bearer key:


    - **Hear**, speech-to-text · `POST /v1/audio/transcriptions` (streaming +
    batch)

    - **Speak**, text-to-speech, stock voices & voice cloning · `POST
    /v1/audio/speech`, `GET /v1/voices`, `/v1/voice/clones`

    - **Cue**, streaming turn detection + knowledge-base context for your own
    LLM/voice pipeline · `GET /v1/audio/transcriptions/stream` with grounding

    - **Omni**, full-duplex agentic voice (speech-to-speech, grounded in your
    knowledge bases + tools) · `/v1/omni` (and the OpenAI-compatible
    `/v1/realtime`)

    - **AMD API**, answering-machine detection: know *who or what* answered a
    call (human, voicemail, IVR, iPhone/Google screening, dead number, fax) in a
    fraction of the dead-air dwell (under 300 ms for a human, under 800 ms for a
    machine, in-region), with the reason it decided · `wss …/v1/amd/stream`
    (Twilio Media Streams drop-in), `POST /v1/amd/config`, `GET
    /v1/amd/calls/{id}`

    - **Agents**, PyAI's feature to create, manage & track your Omni voice
    agents (no-code builder, hosted knowledge, evals, monitoring). _Coming
    soon._


    ## Authentication


    Create a key in the [console](https://console.pyai.com) (it is shown once)
    and send it as a bearer token:


    ```

    Authorization: Bearer pyai_live_...

    ```


    Keys are environment-scoped: `pyai_live_...` (production) and
    `pyai_test_...` (sandbox). New accounts receive up to $50 of prepaid credits
    after phone verification (graduated signup); `pyai_test_` keys skip the
    credit gate entirely.


    Keys are self-validating signed tokens: they work on every PyAI surface the
    instant they are created, no activation or propagation delay. Treat them as
    opaque strings (up to 512 chars) and never parse their contents.


    WebSocket endpoints can't use request headers from a browser, so pass the
    key as a **subprotocol** instead:


    ```

    Sec-WebSocket-Protocol: pyai-key.pyai_live_...

    ```


    (server-side clients may instead append `?api_key=...` to the URL). The
    gateway authenticates the key on the WebSocket upgrade and swaps it for the
    internal upstream credential, your key never reaches the model, and the
    model's key never reaches you.


    ## Quickstart, Hear (speech-to-text)


    ```

    curl https://api.pyai.com/v1/audio/transcriptions \
      -H "Authorization: Bearer $PYAI_API_KEY" \
      -F file=@audio.wav -F model=pyai-hear
    # -> { "text": "..." }

    ```


    ## Quickstart, Speak (text-to-speech)


    ```

    curl https://api.pyai.com/v1/audio/speech \
      -H "Authorization: Bearer $PYAI_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"model":"pyai-voice","input":"Hello from PyAI.","voice":"voice_abc"}' \
      --output speech.wav
    ```


    `voice` is a stock voice id from `GET /v1/voices` (a curated set of 12
    prebuilt voices with personas, avatars, and previews) or a cloned voice id
    from `/v1/voice/clones`. Omit it to use your account's default voice.


    ## Quickstart, Omni (realtime voice agent)


    Omni is **zero-state, there is nothing to create first.** Open a WebSocket,
    pass your key as a subprotocol, and send the agent's behavior (voice,
    persona, knowledge endpoint) in the first `configure` frame:


    ```

    wss://api.pyai.com/v1/omni?session_label=support&format=pcm16&rate=24000
      Sec-WebSocket-Protocol: pyai-key.$PYAI_API_KEY
    ```


    The session is authorized by your key's **organization**; `session_label` is
    an **optional, opaque** tag (echoed to your own knowledge endpoint for
    correlation), omit it or use any value. When `session_label` equals a
    **`/v1/agents` profile id**, the engine loads persona, voice, and **greeting
    message** from that profile (turn-0 playback). `format` and `rate` are
    load-bearing on the connect URL (the SDK sets them). Send PCM16 audio as
    binary frames and receive the agent's speech the same way. **Optional
    convenience:** pre-store config via `POST /v1/agents` (including `greeting`,
    `consent_line`, `recordings_enabled`) and pass its id as `session_label`, or
    send everything inline in the post-handshake `configure` frame. Not required
    to connect. The OpenAI-realtime-compatible surface (`/v1/realtime`) is
    served by the same Omni engine; new integrations should prefer `/v1/omni`.
    (_Flow, the legacy voice-duplex engine, is retired for new customers; its
    `/v1/realtime` alias now routes to Omni._)


    For reproducible eval runs, determinism controls (`seed`/`temperature`) ride
    the Omni session's `configure` frame, which the gateway passes through
    unchanged, they are honored once the engine supports them; no platform
    change is required.


    ## Scopes


    | Scope | Grants |

    | --- | --- |

    | `hear:transcribe` | `POST /v1/audio/transcriptions` |

    | `hear:stream` | `GET /v1/audio/transcriptions/stream` (WebSocket) |

    | `voice:synthesize` | `POST /v1/audio/speech` (Speak) |

    | `voice:clone` | `/v1/voice/clones` (Speak) |

    | `voice:design` | `/v1/voice/design` (Speak) |

    | `flow:session` | _legacy_, `/v1/realtime` for existing Flow customers; new
    traffic on `/v1/realtime` uses Omni |

    | `omni:session` | `/v1/omni` (native), `/v1/realtime` (Omni), and `POST
    /v1/omni/sessions` (mint a browser session token) |

    | `omni:read` | `/v1/omni/calls` (Omni post-call records) |

    | `transcribe:jobs` | `/v1/transcription/jobs` |

    | `trace:configure` | `/v1/trace/config`, `/v1/trace/rule-packs` (Trace
    management) |

    | `trace:read` | `/v1/trace/interactions`, `/violations`, `/findings`,
    `/exposure` (Trace reads) |

    | `recap:configure` | `/v1/recap/config` (Recap management) |

    | `recap:configure` | `/v1/recap/crm-config` (Salesforce field mapping) |

    | `recap:read` | `/v1/recap/calls` (Recap reads) |

    | `amd:detect` | `wss …/v1/amd/stream` (AMD realtime detection, Twilio
    drop-in) |

    | `amd:configure` | `/v1/amd/config` (AMD operating-point dial + webhook) |

    | `amd:read` | `/v1/amd/calls` (AMD decision records) |

    | `telephony:manage` | `/v1/telephony/*` (managed numbers) |


    `GET /v1/models`, `GET /v1/voices`, and `GET /v1/me` need no specific scope,
    any active key may call them. Wildcards (`hear:*`, `voice:*`, …, and the
    global `*`) grant every scope in their family.


    ## Canonical endpoints


    One row per product surface, endpoint, auth, required scope, and lifecycle
    status. **live** = generally available; **deprecated** = works during a
    migration window (don't build new on it); **legacy** = supported for
    existing customers only; **planned** = not yet available.


    | Product | Endpoint | Auth | Scope | Status |

    | --- | --- | --- | --- | --- |

    | Identity | `GET /v1/me` | Bearer | _any active key_ | live |

    | Models | `GET /v1/models` | Bearer | _any active key_ | live |

    | Voices | `GET /v1/voices`, `GET /v1/voices/{id}` | Bearer | _any active
    key_ | live |

    | Hear (batch) | `POST /v1/audio/transcriptions` | Bearer |
    `hear:transcribe` | live |

    | Hear (streaming) | `GET /v1/audio/transcriptions/stream` (WS) |
    Subprotocol | `hear:stream` | live |

    | Cue | `GET /v1/audio/transcriptions/stream` + grounding (WS) | Subprotocol
    | `hear:stream` | live |

    | Hear (async batch) | `POST`/`GET /v1/transcription/jobs` | Bearer |
    `transcribe:jobs` | live |

    | Speak (TTS) | `POST /v1/audio/speech` | Bearer | `voice:synthesize` | live
    |

    | Speak (cloning) | `GET`/`POST /v1/voice/clones` | Bearer | `voice:clone` |
    live |

    | Speak (design) | `/v1/voice/design` | Bearer | `voice:design` | live |

    | Omni (native) | `wss …/v1/omni?agent_id=` | Subprotocol | `omni:session` |
    live |

    | Omni (OpenAI-compat) | `wss …/v1/realtime?model=pyai-omni-realtime` |
    Subprotocol | `omni:session` | live |

    | Omni (alias) | `wss …/v2/omni/chat` | Subprotocol | `omni:session` |
    deprecated |

    | Flow | `wss …/v1/realtime?model=pyai-flow-realtime` | Subprotocol |
    `flow:session` | legacy |

    | Agent profiles (optional config) | `/v1/agents`, `/v1/agents/{id}` |
    Bearer | `omni:session` | live |

    | Trace (config) | `/v1/trace/config`, `/v1/trace/rule-packs` | Bearer |
    `trace:configure` | live |

    | Trace (reads) | `/v1/trace/interactions`, `/violations`, `/findings`,
    `/exposure` | Bearer | `trace:read` | live |

    | Recap (config) | `/v1/recap/config` | Bearer | `recap:configure` | live |

    | Recap (CRM) | `/v1/recap/crm-config` | Bearer | `recap:configure` | live |

    | Recap (reads) | `/v1/recap/calls` | Bearer | `recap:read` | live |

    | Omni call records | `/v1/omni/calls`, `/v1/omni/calls/{id}` | Bearer |
    `omni:read` | live |

    | AMD (stream) | `wss …/v1/amd/stream` (Twilio Media Streams drop-in) |
    Subprotocol | `amd:detect` | live |

    | AMD (config) | `GET`/`POST /v1/amd/config` | Bearer | `amd:configure` |
    live |

    | AMD (reads) | `GET /v1/amd/calls`, `/v1/amd/calls/{id}` | Bearer |
    `amd:read` | live |

    | Telephony | `/v1/telephony/*` | Bearer | `telephony:manage` | live |

    | Agents (create/manage/track feature) | _coming soon_ |, |, | planned |


    WebSocket surfaces authenticate with the `Sec-WebSocket-Protocol:
    pyai-key.<API_KEY>` subprotocol (or `?api_key=` server-side); everything
    else takes the `Authorization: Bearer` key. Telephony's carrier-backed calls
    (search/provision/release) return 404 until a carrier is configured for the
    account.


    ## Rate limits & billing


    Every key has a per-second rate limit (with burst) and a cap on concurrent
    realtime sessions. Exceeding either returns `429` with a `Retry-After`
    header. Usage is metered per minute of audio, transcription minutes (Hear),
    synthesized audio minutes (Speak), and realtime session minutes (Cue, Omni),
    and billed against your plan and credits. List prices: Hear $0.001/min
    (async Transcribe $0.0005/min), Speak $0.04/min streaming ($0.04/min async),
    Cue $0.015/min, Omni $0.05/min, Agents $0.08/min (the create/manage/track
    feature; rolling out). The AMD API bills per **answered** call, the first
    5,000 answered calls each month are free, then $0.004/answered call
    (no-answers, busies, and failed calls are free; AMD bundled with PyAI
    telephony/Omni is included at no charge). Because AMD decides in a fraction
    of the incumbent's dead-air dwell, it is all-in lower than legacy
    answering-machine detection. AI products (Hear, Speak, Cue, Omni) bill **per
    second by default**, the pulse is applied once to each meter's
    invoice-period total, so many short sessions are summed and rounded a single
    time (never minute-rounded per call), and an empty/failed call bills
    nothing. Coarser pulses are available as an optional enterprise override.
    Managed telephony minutes keep a 1-minute pulse (carrier economics).
    Per-character Speak billing is available on enterprise contracts.
  contact:
    name: PyAI
    url: https://pyai.com
servers:
  - url: https://api.pyai.com
    description: Production
security:
  - apiKey: []
  - xApiKey: []
tags:
  - name: Identity
    description: >-
      Introspect the calling key: org/project, env, granted scopes, and
      limits/credit posture. Use it to self-diagnose a 401/403/402.
  - name: Hear
    description: Speech-to-text (streaming + batch)
  - name: Speak
    description: Text-to-speech and voice cloning
  - name: Realtime
    description: >-
      Full-duplex WebSocket sessions: Omni (agentic speech-to-speech with
      knowledge bases + tools). The legacy Flow engine is retired for new
      customers; its /v1/realtime alias routes to Omni.
  - name: Models
    description: Model catalog
  - name: Sandbox
    description: >-
      Zero-friction onboarding for coding agents: mint a free, instant, no-card
      sandbox key with no human steps.
  - name: Transcription Jobs
    description: Async batch transcription
  - name: Agents
    description: >-
      Agent profiles, OPTIONAL pre-stored Omni session config (persona,
      greeting, voice, conversation knobs) you can reference by id instead of
      sending a `configure` frame each call. Optional convenience over the
      zero-state `/v1/omni` primitive; NOT required to connect. (Distinct from
      the upcoming Agents feature, the no-code create/manage/track surface.)
  - name: Trace
    description: >-
      Compliance & guardrails: per-agent config, rule packs, and the exposure /
      violations / interaction-evidence read views
  - name: AMD
    description: >-
      Answering-machine detection: know who or what answered a call (human,
      voicemail, IVR, iPhone/Google screening, dead number, fax) in a fraction
      of the dead-air dwell (under 300 ms for a human, under 800 ms for a
      machine, in-region), with the reason it decided. Twilio Media Streams
      drop-in over `wss …/v1/amd/stream`; one operating-point dial; billed per
      answered call.
  - name: Telephony
    description: >-
      Managed phone numbers: search, provision, route to an agent, and release.
      Call minutes bill on telephony.minutes ($0.01/min).
paths:
  /v1/amd/stream:
    get:
      tags:
        - AMD
      summary: Answering-machine detection (WebSocket)
      description: >-
        Realtime answering-machine detection over a WebSocket. **This surface
        speaks Twilio's Media Streams protocol natively** (`start` / `media` /
        `stop` frames, G.711 μ-law 8 kHz base64, ~20 ms), so migrating from
        Twilio AMD is a one-line-TwiML change, point the call's media at PyAI,
        keep your carrier and your code.


        ```xml

        <Response><Connect>
          <Stream url="wss://api.pyai.com/v1/amd/stream">
            <Parameter name="aggressiveness" value="0.25"/>
            <Parameter name="webhook" value="https://you/amd-events"/>
          </Stream>
        </Connect></Response>

        ```


        Authenticate with the `Sec-WebSocket-Protocol: pyai-key.<API_KEY>`
        subprotocol (or `?api_key=` server-side). Requires the `amd:detect`
        scope. Mid-call, PyAI pushes an `amd` decision event on the socket (and
        to your `webhook`): `answered_by` (PyAI's richer vocabulary),
        `answered_by_twilio` (Twilio's exact `AnsweredBy` enum for drop-in
        routing parity), `confidence`, `decision_ms`, and a human-readable
        `reason`. The per-call `aggressiveness` `<Parameter>` overrides the
        account default from `POST /v1/amd/config`.


        Billed per **answered** call (`amd.calls`), the first 5,000 answered
        calls/month are free, then $0.004/answered call; AMD bundled with PyAI
        telephony/Omni is included.
      operationId: amdStream
      responses:
        '101':
          description: >-
            WebSocket upgrade, Twilio Media Streams protocol; PyAI emits `amd`
            decision events.
        '401':
          $ref: '#/components/responses/Unauthorized'
        '403':
          $ref: '#/components/responses/Forbidden'
components:
  responses:
    Unauthorized:
      description: 'Missing or invalid API key (`code: unauthorized`)'
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/Error'
    Forbidden:
      description: >-
        Key lacks the required scope or the origin is not allow-listed (`code:
        forbidden | origin_not_allowed`)
      content:
        application/json:
          schema:
            $ref: '#/components/schemas/Error'
  schemas:
    Error:
      type: object
      description: >-
        OpenAI-compatible error envelope returned by the gateway data plane
        (401/402/403/429). Control-plane request/resource errors use Problem
        (application/problem+json) instead.
      required:
        - error
      properties:
        error:
          type: object
          required:
            - message
          properties:
            message:
              type: string
              description: Human-readable explanation.
            type:
              type: string
              description: Error category, e.g. rate_limit_error.
            code:
              $ref: '#/components/schemas/ErrorCode'
            param:
              type: string
              nullable: true
              description: Offending parameter when applicable, else null.
    ErrorCode:
      type: string
      description: >-
        Stable, machine-readable error code. Branch on this rather than the
        human `message`.
      enum:
        - invalid_request_error
        - invalid_agent_id
        - unauthorized
        - forbidden
        - origin_not_allowed
        - credit_exhausted
        - key_budget_exceeded
        - insufficient_quota
        - rate_limit_exceeded
        - concurrency_limit_exceeded
        - daily_cap_exceeded
  securitySchemes:
    apiKey:
      type: http
      scheme: bearer
      description: 'Use `Authorization: Bearer pyai_live_...` (or `pyai_test_...`).'
    xApiKey:
      type: apiKey
      in: header
      name: x-api-key
      description: >-
        Header alias for bearer auth on HTTP endpoints. WebSocket auth uses
        subprotocol `pyai-key.<API_KEY>`.

````