Read aloud
A free, private text-to-speech tool: it turns any text into natural-sounding speech entirely on your device — nothing you paste is ever uploaded.
Paste anything. Pick a narrator. Hear it spoken back — fully in your browser.
Loading…
FAQ
How this works under the hood, what model it uses, and what stays on your device.
Is it free?
Yes — completely free, with no sign-up, no account, and no usage limits. The voice model runs on your own device, so there is no per-character API cost to pass on to you.
Is my text private?
Yes. Everything runs in your browser — the text you paste is never sent to a server, and no audio is uploaded. Once the model has loaded you can even disconnect from the network and it keeps working.
Do I need to install anything or sign up?
No. It is a web page — no extension, no app, and no account. Open it, paste text, and press Play. The only download is the voice model itself, which caches in your browser on first use.
Can I download the audio?
Yes. After a clip plays you can save it as a WAV file, or generate and download the audio without playing it first. The file is created in your browser from the same on-device synthesis.
Where does the audio come from?
Your browser. Kokoro-82M runs as ONNX via Transformers.js in a Web Worker. The text you paste is tokenized, fed through the model, and the resulting waveform is played back through your own AudioContext. Nothing is sent to any server.
What model is used?
onnx-community/Kokoro-82M-v1.0-ONNX, 8-bit quantized (~92 MB). It ships 54 voices across American and British accents; the booth surfaces a curated 8 to keep the cartridge rack legible.
What happens on first use?
Roughly 92 MB of model weights download and cache in your browser the first time you press Play. They go into the browser's built-in IndexedDB cache (managed by Transformers.js); subsequent runs read from cache and start within a second or two.
Does it work offline?
After the first model download, yes. The whole pipeline — tokenizer, model, audio playback — runs locally, so you can pull the network cable and the booth still works.
Does it use WebGPU?
The default backend is WebAssembly with the 92 MB q8 model, so there is no WebGPU gate at the door. A “studio quality” toggle can switch to fp16 weights on WebGPU when available.
How is the player wired together?
A small client library (kokoro-js) wraps the model. We drive its streaming API one sentence at a time and queue each chunk onto a single AudioContext so playback is gapless even when synthesis is faster than realtime. The VU meter is bound to an AnalyserNode on that same context — real RMS, not a timer.