Enterprise Speech-to-Text API for English, Cantonese & Mandarin

Achieve human-level transcription accuracy with our specialized trilingual engine. Built for developers needing high-performance speech recognition in mission-critical applications.

Start Building

speech-to-text api

The multilingual, conversational
Speech to Text model

Fano is the world-leading multilingual speech-to-text model designed for conversation, not just transcription. With built-in turn detection, ultra-low latency, and natural interruption handling, Fano enables real-time, human-like voice agents.

Unmatched accuracy for multilingual conversations
Ultra-low latency for real-time applications
Seamless integration with voice agents

Start Building

One API for the Real World of Mixed Speech

No more juggling models for Hong Kong’s Cantonese-English-Mandarin conversations—or any multilingual region. Just send audio and get a perfect transcription.

Hong Kong

English

Cantonese

Mandarin

Singapore

English

Singlish

Bahasa (Malaysia)

Tamil

Mandarin

Taiwan

Mandarin

Taiwanese

Malaysia

Bahasa (Malaysia)

English

Tamil

Thailand

Thai

English

Indonesia

Bahasa (Indonesia)

English

Mainland China

Cantonese

Mandarin

Philippines

Tagalog

English

Vietnam

Vietnamese

English

France

French

English

Saudi Arabia

Arabic

English

United Kingdom

English (Global)

English (US)

English (UK)

Cantonese

Mandarin

Taiwanese

Singlish

Bahasa (Malaysia)

Bahasa (Indonesia)

Thai

Vietnamese

Arabic

French

Tamil

Japanese

Built for Versatility

Switch context to see how our API adapts to different workflows.

customer service & compliance

Automate QA & Compliance

Analyze 100% of customer calls. Our API accurately separates speakers and handles noisy audio environments typical of call centers.

Asynchronous batch processing
High accuracy in challenging audio conditions
Multi-speaker diarisation

Meeting intelligence

Effortlessly capture, transcribe, and summarize your meetings

No more language barriers. Our API perfectly captures code-switching between Cantonese, English, and Mandarin, ensuring every detail is recorded accurately without manual language selection.

Accurately identify and label 10+ different speakers
Understands context across language switches
Real-time transcription

voice agents and applications

Build voicebots that listen, understand, and respond in real time

Power voicebot experiences with Fano’s Speech API, designed for fast, accurate speech recognition across multilingual and mixed-language conversations. From customer service to appointment booking and support workflows, our API helps developers build voicebots that respond naturally without forcing users to change how they speak.

Low-latency speech recognition for live voice workflows
Built for multilingual and mixed-language conversations
Strong performance on contact center and phone-quality audio

Speaker Diarization

Distinguish between speakers in a single audio stream.

Auto Punctuation

Adds punctuation and casing for readable transcripts.

Timestamp

Precise start/end times for every segment recognized.

Custom Vocabulary

Boost accuracy for product names and jargon.

Multilingual Support

Auto mixed-language detection with seamless code-switching capabilities.

Format Support

WAV, MP3, FLAC, OGG, AAC and telephony support.

Frequently Asked Questions

Which languages are supported?

We specialize in English, Cantonese, and Mandarin, and 10+ ASEAN languages. Our model is uniquely designed to handle mixed-language speech (code-switching) within a single audio stream without requiring language switching hints.

Is on-premise deployment available?

Yes, for enterprise customers with strict data sovereignty or security requirements, we offer on-premise deployment options.

What is the latency for real-time streaming?

Our streaming API typically achieves latencies up to under 300ms, making it suitable for live voice assistants and real-time captioning.

Can I fine-tune the model?

Yes, you can upload custom vocabulary lists via the API to improve recognition of product names, acronyms, and industry-specific jargon.

Contact Us

Try It Free. Scale When You’re Ready.

Get started without limits. Explore all features at your own pace — upgrade only when your business grows.