Speech-to-Text (STT)

Edited

STT is responsible for:

  • Converting caller audio into transcript text

  • Detecting end-of-turn pauses

  • Supporting language detection

  • Providing confidence signals

Transcript accuracy directly affects:

  • Intent identification

  • Question sequencing

  • Location capture

  • Summary quality


STT Configuration Elements

Depending on deployment, configurable elements may include:

  • Language model selection

  • End-of-turn sensitivity

  • Smart formatting options

  • Confidence thresholds

Improper STT configuration can lead to:

  • Misidentified intent

  • Excessive clarification

  • Conversation instability


How to Validate STT Performance

  1. Place controlled test call.

  2. Speak clearly structured data (address, phone number).

  3. Review transcript in Triage.

  4. Confirm:

    • Numbers formatted correctly

    • No missing words

    • No repeated fragments

  5. Compare transcript against actual speech.

Frequent transcript errors may indicate provider or tuning issues.