Kokoro TTS generates voice audio that sounds human. The legal status of that audio — whether it requires disclosure, whether consent is required to receive it, and whether it can be regulated as a deepfake — is actively being determined across multiple jurisdictions.
Key Analysis
No jurisdiction currently mandates disclosure in AI-generated voice messages, but this is changing rapidly: the No Fakes Act (US, proposed 2024), DSA (EU, in force), and IT Act amendments (India) all address synthetic voice.
The distinction between 'informational AI audio' and 'AI audio that impersonates a specific person' is the regulatory dividing line. the WhatsApp AI agent using a generic voice is lower risk than an AI that sounds like a named lawyer.
Consent to receive AI-generated audio is not the same as consent to have voice notes transcribed — they are separate processing activities requiring separate lawful bases.
Risk Signals
AI voice responses with no disclosure that they are AI-generated.
Using a TTS voice profile that could be mistaken for a specific named individual.
No mechanism for users to opt out of receiving AI voice messages.
Action Items
Include a spoken disclosure at the beginning of each AI voice response: 'This is an automated message from [firm name]'.
Select a generic voice profile that is clearly not intended to impersonate any natural person.
Provide an opt-out mechanism: users who prefer text-only responses should be able to set that preference and have it respected.