The Real-Time Conversational Stack

Edited

Every ANET call relies on three coordinated systems:

  1. Speech-to-Text (STT) – Converts caller speech into text

  2. Language Model (LLM) – Interprets meaning and determines next action

  3. Text-to-Speech (TTS) – Converts system responses into audio

These systems operate continuously throughout the call.

If any layer becomes unstable, guardrails may trigger transfer.