Question 1

Is it free?

Accepted Answer

Yes — completely free, with no sign-up, no account, and no usage limits. The voice model runs on your own device, so there is no per-character API cost to pass on to you.

Question 2

Is my text private?

Accepted Answer

Yes. Everything runs in your browser — the text you paste is never sent to a server, and no audio is uploaded. Once the model has loaded you can even disconnect from the network and it keeps working.

Question 3

Do I need to install anything or sign up?

Accepted Answer

No. It is a web page — no extension, no app, and no account. Open it, paste text, and press Play. The only download is the voice model itself, which caches in your browser on first use.

Question 4

Can I download the audio?

Accepted Answer

Yes. After a clip plays you can save it as a WAV file, or generate and download the audio without playing it first. The file is created in your browser from the same on-device synthesis.

Question 5

Where does the audio come from?

Accepted Answer

Your browser. Kokoro-82M (a text-to-speech model) runs as ONNX via Transformers.js, fully on-device. The text you paste is tokenized in a Web Worker, fed through the model, and the resulting waveform is played back through your own AudioContext. Nothing is sent to any server.

Question 6

What model is used?

Accepted Answer

Kokoro-82M v1.0 from onnx-community, 8-bit quantized (~92 MB). It ships 54 voices across American and British accents; the booth surfaces a curated 8.

Question 7

What happens on first use?

Accepted Answer

About 92 MB of model weights download and cache in your browser the first time you press Play. Subsequent runs read from cache and start within a second or two.

Question 8

Does it work offline?

Accepted Answer

After the first model download, yes. The whole pipeline — tokenizer, model, audio playback — runs locally, so you can pull the network cable and the booth still works.

Question 9

Does it use WebGPU?

Accepted Answer

The default backend is WASM with the 92 MB q8 model, so there is no WebGPU gate at the door. A "studio quality" toggle can switch to fp16 weights on WebGPU when available.

Question 10

How is the player wired together?

Accepted Answer

A small client library (kokoro-js) wraps the model; we drive its streaming API one sentence at a time and queue each chunk onto a single AudioContext so playback is gapless even when synthesis is faster than realtime.

Read aloud

FAQ