Skip to main content

Introduction

Learn how to enable real-time voice conversations in your app. Gemini Live allows your app to support fast, natural voice interactions using WebSocket-based audio streaming. With sub-second latency, interruption handling, and multimodal input support, it enables dynamic, real-time conversational experiences.

What You Can Build

With Gemini Live, your app can support:
  • Real-Time Voice Assistants – Enable users to speak and receive instant responses.
  • Multilingual Voice Experiences – Support conversations across multiple languages.
  • Interruptible Conversations – Allow natural back-and-forth dialogue with interruptions.
  • Multimodal Interactions – Combine voice with text, images, or other inputs.
  • Voice Concierge Systems – Build assistants that guide users in real time.

How It Works

When Gemini Live is enabled, your app establishes a real-time connection using WebSocket audio streaming. Your app can:
  • capture and stream user audio input
  • process conversations in real time
  • receive instant AI-generated responses
  • handle interruptions during dialogue
  • combine voice with other input types
This creates a seamless conversational experience with fast response times and flexible interaction modes.

Example Prompts

You can use prompts like these to implement features: Add a real-time voice assistant Add a real-time voice assistant to my app powered by Gemini that responds instantly with natural conversation. Add a multilingual voice concierge Add a multilingual voice concierge to my app using Gemini Live that handles questions with sub-second latency.

Common Use Cases

Gemini Live is commonly used for:
  • AI voice assistants
  • multilingual support systems
  • real-time conversational apps
  • voice-driven navigation or guidance
  • multimodal AI experiences

Best Practices

To get the best results:
  • optimize for low latency to maintain flow
  • handle interruptions naturally
  • design clear conversational states (listening, speaking)
  • support multiple input types for flexibility
  • keep responses concise and context-aware