Core System Architecture
Edited
Every ANET call involves three coordinated systems.
Speech-to-Text (STT)
STT:
Converts caller speech into transcript text
Detects pauses and end-of-turn signals
Provides confidence signals
Supports language detection
Accurate transcription directly affects intent identification.
Language Model (LLM)
The LLM:
Interprets the caller meaning
Identifies likely intent
Determines the next best question
Applies configured routing logic
Generates structured summaries
The LLM does not operate independently. It follows configured intent and action rules.
Text-to-Speech (TTS)
TTS:
Converts system responses into audio
Maintains language alignment with STT
Produces real-time conversational responses
When language changes, the entire stack (STT, LLM, TTS) changes together.
TTS
STT
