Introduction
Learn how to convert spoken audio into text inside your app. The Speech-to-Text integration allows your app to transcribe audio into written text using ElevenLabs Scribe, a powerful speech recognition system. This integration enables apps to capture voice input and convert it into accurate, readable text in real time or from recorded audio. Speech-to-text is widely used for accessibility, productivity, and voice-driven interactions.What You Can Build
With Speech-to-Text enabled, your app can support features such as:- Voice Input – Allow users to speak instead of typing, converting their speech into text instantly.
- Meeting Transcription – Record conversations and generate accurate transcripts.
- Voice Commands – Enable users to control app features using spoken instructions.
- Podcast and Audio Transcription – Convert long-form audio into readable text for summaries or content reuse.
- Accessibility Features – Help users interact with your app using voice, improving inclusivity.
How It Works
When the Speech-to-Text integration is enabled, your app sends audio input to ElevenLabs Scribe, which processes the speech and returns a text transcription. Your app can:- Capture live voice input from users
- Process recorded audio files
- Convert speech into structured text
- Use transcribed text for search, summaries, or actions
Example Prompts
You can use prompts like these when building your app: Add voice input Add a voice input button to my app that transcribes what the user says using ElevenLabs Scribe. Add meeting transcription Add a meeting transcription screen to my app that records audio and returns a text summary. These prompts help you quickly implement voice-to-text functionality.Common Use Cases
Developers commonly use the Speech-to-Text integration for:- Note-taking and productivity apps
- Meeting assistants
- Voice-enabled interfaces
- Transcription tools
- Accessibility-focused applications
Best Practices
When implementing speech-to-text features, consider the following:- Provide clear indicators when recording is active
- Allow users to edit transcribed text
- Handle background noise for better accuracy
- Break long recordings into manageable segments
- Combine transcription with summaries for better usability