The Voice primitive transforms text-based agents into voice-enabled conversational AI systems. With built-in speech-to-text (STT) and text-to-speech (TTS) capabilities, agents can understand spoken language and respond with natural-sounding speech, enabling hands-free interactions and accessibility features.Voice is essential for:
// Multilingual voice supportconst result = await agentbase.runAgent({ message: "Bonjour, comment allez-vous?", voice: { enabled: true, input: { language: "auto-detect", // Automatically detect language supportedLanguages: ["en-US", "fr-FR", "es-ES", "de-DE"] }, output: { voice: "fr-FR-Neural2-A", // French voice speed: 1.0 } }, system: "You are a multilingual assistant. Respond in the same language as the user."});
const phoneAgent = await agentbase.createVoiceStream({ voice: { input: { language: "en-US", profanityFilter: true }, output: { voice: "en-US-Neural2-C", style: "friendly" } }, system: `You are a customer service representative for TechCorp. Call Flow: 1. Greet the customer warmly 2. Ask how you can help 3. Listen to their issue 4. Provide solution or escalate if needed 5. Confirm resolution 6. Thank them for calling Guidelines: - Be empathetic and patient - Keep responses concise (2-3 sentences) - Confirm understanding before providing solution - Offer to escalate if you cannot help`, mcpServers: [ { serverName: "crm-system", serverUrl: "https://api.company.com/crm" } ]});// Handle call flowphoneAgent.on('transcript', async (text) => { if (text.toLowerCase().includes('speak to human')) { await phoneAgent.transfer({ destination: "human-support", context: phoneAgent.getConversationContext() }); }});
const healthcareAgent = await agentbase.createVoiceStream({ voice: { input: { language: "en-US", medicalTerminology: true }, output: { voice: "en-US-Neural2-C", style: "empathetic", speed: 0.9 } }, system: `You are a healthcare information assistant. Guidelines: - Use clear, simple language - Be empathetic and reassuring - Never provide medical diagnosis - Always recommend consulting healthcare provider - Maintain HIPAA compliance - Focus on general health information Remind users this is not medical advice.`, rules: [ "Never provide specific medical diagnoses", "Always recommend consulting a healthcare provider for medical concerns", "Do not access or share patient health information without authorization" ]});
// Good: Brief, conversational responsessystem: `Keep responses to 1-2 sentences.Be conversational and natural.Avoid long explanations unless asked.`// Avoid: Long, verbose responsessystem: `Provide detailed, comprehensive explanationscovering all aspects of the topic...`
Design for Interruptions
Copy
// Enable graceful interruption handlingvoice: { output: { interruptible: true, pauseOnInterruption: true }, input: { interruptionHandling: "graceful" }}system: `If interrupted:1. Stop speaking immediately2. Acknowledge the interruption3. Address the new question4. Offer to continue previous topic if relevant`
Use Natural Language
Copy
// Good: Conversational, natural"Sure, I can help with that. Let me check...""Great question! The answer is..."// Avoid: Robotic, formal"Affirmative. Processing request. Please wait.""Query acknowledged. Retrieving data."
Provide Audio Feedback
Copy
// Use audio cues for better UXvoiceStream.on('listening', () => { playSound('listening-chime.mp3');});voiceStream.on('processing', () => { playSound('thinking.mp3');});voiceStream.on('error', () => { playSound('error-tone.mp3');});
Always Handle Audio Errors: Network issues, microphone problems, and audio format incompatibilities can disrupt voice interactions.
Copy
voiceStream.on('error', (error) => { switch (error.type) { case 'microphone_access_denied': speakText("I need microphone access to hear you. Please check your settings."); break; case 'audio_format_unsupported': console.error('Audio format not supported:', error.format); // Fallback to supported format break; case 'network_error': speakText("I'm having trouble connecting. Please check your internet connection."); break; case 'transcription_failed': speakText("Sorry, I didn't catch that. Could you repeat?"); break; default: speakText("Sorry, I'm having technical difficulties. Please try again."); }});
const voiceStream = await agentbase.createVoiceStream({ voice: { input: { emotionDetection: true, sentimentAnalysis: true } }, system: `Adapt your tone based on user emotion. If user sounds frustrated, be more empathetic. If user sounds happy, match their energy.`});voiceStream.on('emotion', (emotion) => { console.log('Detected emotion:', emotion.type); console.log('Confidence:', emotion.confidence); // Adjust response accordingly});
const voiceStream = await agentbase.createVoiceStream({ voice: { input: { multiSpeaker: true, speakerDiarization: true // Separate different speakers } }, system: `You are moderating a group conversation. Address speakers by name when identified. Manage turn-taking and keep discussion on track.`});voiceStream.on('speaker-change', (event) => { console.log('Now speaking:', event.speakerId);});
Remember: Voice interfaces require different design principles than text interfaces. Keep responses concise, design for interruptions, and provide clear audio feedback.