Core System Architecture

Edited

Every ANET call involves three coordinated systems.

Speech-to-Text (STT)

STT:

Converts caller speech into transcript text
Detects pauses and end-of-turn signals
Provides confidence signals
Supports language detection

Accurate transcription directly affects intent identification.

Language Model (LLM)

The LLM:

Interprets the caller meaning
Identifies likely intent
Determines the next best question
Applies configured routing logic
Generates structured summaries

The LLM does not operate independently. It follows configured intent and action rules.

Text-to-Speech (TTS)

TTS:

Converts system responses into audio
Maintains language alignment with STT
Produces real-time conversational responses

When language changes, the entire stack (STT, LLM, TTS) changes together.

TTS

STT