Voice and Language Processing

Speech Processing and Conversational Stack
The Real-Time Conversational Stack
Every ANET call relies on three coordinated systems: Speech-to-Text (STT) – Converts caller speech into text Language Model (LLM) – Interprets meaning...
Speech-to-Text (STT)
STT is responsible for: Converting caller audio into transcript text Detecting end-of-turn pauses Supporting language detection Providing confidence s...
Smart Formatting
Smart formatting improves transcript clarity by: Formatting numbers Formatting dates Structuring addresses Improving readability of structured data Th...
Language Model (LLM)
The LLM: Interprets transcript meaning Determines most likely intent Decides next question Applies routing logic Generates call summary The LLM follow...
LLM Guardrails
LLM processing is subject to: Timeout limits Retry logic Safety override triggers Transfer safeguards If the LLM: Fails to respond within the configur...
Text-to-Speech (TTS)
TTS converts system-generated text into spoken audio. TTS: Uses configured voice per language Must remain synchronized with STT language Delivers real...
Thinking Indicator
During LLM processing: ANET may emit a short audio “thinking” signal. This indicates active processing. It does not indicate an error. The thinking in...
Barge-In (Caller Interruption)
Barge-in allows callers to interrupt ANET while it is speaking. When interruption confidence exceeds threshold: ANET stops speaking STT resumes listen...
Provider Dependencies
ANET relies on external providers for: STT LLM TTS Provider instability may cause: Delays Timeouts Safeguard transfers These events are visible in tim...
Limitations
Voice processing: Cannot interpret non-verbal cues May mishear low-quality audio Depends on caller clarity Cannot override safety triggers Human overs...
Key Takeaways
ANET operates through STT, LLM, and TTS coordination. Transcript accuracy directly affects intent resolution. Guardrails protect against provider fail...
Language Handling and Switching
Overview: Structured Language Control
ANET supports controlled language transitions in three ways: Pre-call IVR selection Explicit in-call request Automatic language detection (if enabled)...
Supported Languages
ANET supports configured languages per center. If a caller speaks an unsupported language: ANET transfers the call. Transfer reason may be unsupported...
Pre-Call IVR Language Selection
At call start, the caller may hear: “Thank you for calling non-emergency. Para español, presione dos.” Behavior: Pressing configured option triggers l...
How to Test IVR Language Selection
Place test call. Select alternate language option. Confirm: System switches immediately Greeting changes language Transcript reflects correct language...
In-Call Language Switching
ANET can switch mid-conversation when the caller explicitly requests it. Example: Caller: “Can we speak in Spanish?” System behavior: Detect language ...
Automatic Language Detection (If Enabled)
When enabled, ANET evaluates speech early in the call. Typical parameters: Detection window: ~15 seconds Evaluation interval: every few seconds Confid...
Language Switching Guardrails
ANET enforces: Maximum number of language switches per call Detection duration limit Confidence-based filtering These guardrails prevent oscillation. ...
How to Evaluate Language Issues
When reviewing a call: Check transcript language consistency. Review timeline for language change events. Confirm: No excessive switching No partial-s...
Common Language Configuration Risks
❌ Enabling auto-detection without monitoring ❌ Setting low confidence threshold ❌ Ignoring maximum switch cap ❌ Not validating IVR routing ❌ Overlooki...
Supervisor Monitoring Considerations
Supervisors should monitor: Unsupported_language transfers Excessive language_handling low scores Repeated mid-call switching Agent_failure linked to ...
Limitations
Language detection: Relies on early speech samples May misclassify short utterances Cannot interpret mixed-language nuance perfectly Cannot override s...
Key Takeaways
Language switching affects the entire conversational stack. IVR selection is deterministic. Auto-detection relies on confidence thresholds. Guardrails...
Location
After completing this guide, you will understand how ANET’s Verified Location feature works, how to configure it within your environment, and how the ...