Speech-to-Text

Introduction

Learn how to convert spoken audio into text inside your app. The Speech-to-Text integration allows your app to transcribe audio into written text using ElevenLabs Scribe, a powerful speech recognition system. This integration enables apps to capture voice input and convert it into accurate, readable text in real time or from recorded audio. Speech-to-text is widely used for accessibility, productivity, and voice-driven interactions.

What You Can Build

With Speech-to-Text enabled, your app can support features such as:

Voice Input – Allow users to speak instead of typing, converting their speech into text instantly.
Meeting Transcription – Record conversations and generate accurate transcripts.
Voice Commands – Enable users to control app features using spoken instructions.
Podcast and Audio Transcription – Convert long-form audio into readable text for summaries or content reuse.
Accessibility Features – Help users interact with your app using voice, improving inclusivity.

How It Works

When the Speech-to-Text integration is enabled, your app sends audio input to ElevenLabs Scribe, which processes the speech and returns a text transcription. Your app can:

Capture live voice input from users
Process recorded audio files
Convert speech into structured text
Use transcribed text for search, summaries, or actions

This allows you to build voice-driven experiences without needing to implement complex speech recognition systems yourself. Because transcription is handled by ElevenLabs, you can focus on how the text is used within your app.

Example Prompts

You can use prompts like these when building your app: Add voice input Add a voice input button to my app that transcribes what the user says using ElevenLabs Scribe. Add meeting transcription Add a meeting transcription screen to my app that records audio and returns a text summary. These prompts help you quickly implement voice-to-text functionality.

Common Use Cases

Developers commonly use the Speech-to-Text integration for:

Note-taking and productivity apps
Meeting assistants
Voice-enabled interfaces
Transcription tools
Accessibility-focused applications

This integration is especially useful when users prefer speaking over typing.

Best Practices

When implementing speech-to-text features, consider the following:

Provide clear indicators when recording is active
Allow users to edit transcribed text
Handle background noise for better accuracy
Break long recordings into manageable segments
Combine transcription with summaries for better usability

​Introduction

​What You Can Build

​How It Works

​Example Prompts

​Common Use Cases

​Best Practices