Enterprise Speech-to-Text API for English, Cantonese & Mandarin

Achieve human-level transcription accuracy with our specialized trilingual engine. Built for developers needing high-performance speech recognition in mission-critical applications.

speech-to-text api

The multilingual, conversational
Speech to Text model

Fano is the world-leading multilingual speech-to-text model designed for conversation, not just transcription. With built-in turn detection, ultra-low latency, and natural interruption handling, Fano enables real-time, human-like voice agents.

  • Unmatched accuracy for multilingual conversations
  • Ultra-low latency for real-time applications
  • Seamless integration with voice agents

One API for the Real World of Mixed Speech

No more juggling models for Hong Kong’s Cantonese-English-Mandarin conversations—or any multilingual region. Just send audio and get a perfect transcription.

Hong Kong
English
Cantonese
Mandarin
Singapore
English
Singlish
Bahasa (Malaysia)
Tamil
Mandarin
Taiwan
Mandarin
Taiwanese
Malaysia
Bahasa (Malaysia)
English
Tamil
Thailand
Thai
English
Indonesia
Bahasa (Indonesia)
English
Mainland China
Cantonese
Mandarin
Philippines
Tagalog
English
Vietnam
Vietnamese
English
France
French
English
Saudi Arabia
Arabic
English
United Kingdom
English (Global)
English (US)
English (UK)
Cantonese
Mandarin
Taiwanese
Singlish
Bahasa (Malaysia)
Bahasa (Indonesia)
Thai
Vietnamese
Arabic
French
Tamil
Japanese

Built for Versatility

Switch context to see how our API adapts to different workflows.

customer service & compliance

Automate QA & Compliance

Analyze 100% of customer calls. Our API accurately separates speakers and handles noisy audio environments typical of call centers.

  • Asynchronous batch processing
  • High accuracy in challenging audio conditions
  • Multi-speaker diarisation

Meeting intelligence

Effortlessly capture, transcribe, and summarize your meetings

No more language barriers. Our API perfectly captures code-switching between Cantonese, English, and Mandarin, ensuring every detail is recorded accurately without manual language selection.

  • Accurately identify and label 10+ different speakers 
  • Understands context across language switches
  • Real-time transcription

voice agents and applications

Build voicebots that listen, understand, and respond in real time

Power voicebot experiences with Fano’s Speech API, designed for fast, accurate speech recognition across multilingual and mixed-language conversations. From customer service to appointment booking and support workflows, our API helps developers build voicebots that respond naturally without forcing users to change how they speak.

  • Low-latency speech recognition for live voice workflows
  • Built for multilingual and mixed-language conversations
  • Strong performance on contact center and phone-quality audio

Speaker Diarization

Distinguish between speakers in a single audio stream.

Auto Punctuation

Adds punctuation and casing for readable transcripts.

Timestamp

Precise start/end times for every segment recognized.

Custom Vocabulary

Boost accuracy for product names and jargon.

Multilingual Support

Auto mixed-language detection with seamless code-switching capabilities.

Format Support

WAV, MP3, FLAC, OGG, AAC and telephony support.

Frequently Asked Questions

Which languages are supported?

We specialize in English, Cantonese, and Mandarin, and 10+ ASEAN languages. Our model is uniquely designed to handle mixed-language speech (code-switching) within a single audio stream without requiring language switching hints.

Is on-premise deployment available?

Yes, for enterprise customers with strict data sovereignty or security requirements, we offer on-premise deployment options.

What is the latency for real-time streaming?

Our streaming API typically achieves latencies up to under 300ms, making it suitable for live voice assistants and real-time captioning.

Can I fine-tune the model?

Yes, you can upload custom vocabulary lists via the API to improve recognition of product names, acronyms, and industry-specific jargon.

Contact Us

Try It Free. Scale When You’re Ready.

Get started without limits. Explore all features at your own pace — upgrade only when your business grows.