twelve seconds
We tend to think of our voice as something deeply personal. It is how people recognize us on the phone, how our kids know we are home, how we carry tone and warmth and sarcasm across a room. It is one of the few things about a person that feels genuinely unreplicable.
Except it is not. A short twelve second recording, a publicly available model, and a bit of compute will produce a synthetic version of anyone's voice that is, at minimum, unsettling in its accuracy. Which raises some questions that nobody has good answers to yet. If your voice can be cloned with surprising accuracy, do you still own it? What does identity even mean when the most recognizable part of you can be copied and pasted?
This project sits somewhere between art project and cautionary tale. The technology to clone a voice from a few seconds of audio is now trivially accessible. That fact seemed worth demonstrating rather than just writing about.
The original version of this project was presented in Summer 2024 at the Provocation Art show on Salt Spring Island. The technology has advanced significantly since then. What used to take more than a 1 minute of recording to yield a passable voice clone now takes roughly 12 seconds.
This is a live experiment. The voice model runs in real time. Results vary.