the WhatsApp AI agent downloads voice notes from Meta's media API, stores them as temp files on the Raspberry Pi, transcribes them with Whisper, and then links the transcript to workspace records. At each step, the voice data is both sensitive (potentially privileged legal communications) and at risk.
Key Analysis
Voice notes downloaded from Meta transit the network — HTTPS is required and must be verified (no certificate validation bypass).
Temp files on disk are readable by any process running as the same user or group. On a Pi running multiple services, this is a broader attack surface than on a single-purpose server.
Transcripts of legal voice notes may be more sensitive than the audio: they are searchable, quotable, and easily copied. They require the same access controls as the original communication.
Risk Signals
Temp files with predictable names (e.g., audio_12345.wav) in a world-readable temp directory.
Transcripts stored in the database without access logging.
Voice note audio retained after transcription without a deletion policy.
Action Items
Use random, unguessable temp file names (8 bytes of random hex minimum).
Set a maximum temp file lifetime of 60 seconds: if the conversion and upload don't complete in 60 seconds, delete and retry.
Implement the same access controls on transcripts as on the workspace items they relate to.