Module 4 finale. The agent stops just writing native — it starts SPEAKING native. ElevenLabs voice cloning per language. When the customer texts in Kirundi, they hear back in a native Kirundi voice. Optional layer, big perceived value. $22–99/mo marginal.
By the end: when a customer messages in Kirundi, they hear back in a native Kirundi voice. Same for FR, EN, Swahili, Arabic.
5 min of CLEAN audio per language from a native speaker. No noise, no music. Varied script: tones, numbers, greetings. Quality matters more than length. Always get written consent before cloning.
Voice Lab → Add Voice → Instant Voice Clone. Upload the 5-min sample. Name clearly: agent_voice_french, agent_voice_swahili. ElevenLabs returns a voice_id for each. Clone is ready in under 60 seconds. Test before production.
Same pattern as Lesson 4.3 — config map. Key = lang code, value = voice ID. VOICE_MAP = { 'fr': voice_id_fr, 'en': voice_id_en, 'sw': voice_id_sw, 'ar': voice_id_ar, 'rn': voice_id_rn }. Store IDs in env vars (don't hardcode).
After Claude generates the reply in the matched language, POST to api.elevenlabs.io/v1/text-to-speech/{voice_id}. Use eleven_multilingual_v2 — handles all 5 languages cleanly. Response: MP3 file. Save to temp storage on your server.
WhatsApp Cloud API supports audio natively. Upload MP3 to Meta's /media endpoint → returns media_id. Send a message of type 'audio' with that media_id. Customer's phone receives a voice note. They tap play → hear the agent speak in their language. Magic — but it's just 5 connected APIs.
Top AI Africa deploys the full Module 4 stack — language detection, native prompts, native KB, voice cloning per language. WhatsApp agents that sound native in French, English, Swahili, Arabic, Kirundi. Free 15-min strategy call.