Loading…
FAQ
How this works under the hood, what models it uses, and what stays on your device.
Where does my audio go?
Nowhere. Transcription runs entirely in your browser via Transformers.js (WebGPU or WebAssembly), and the optional summary uses WebLLM. The audio bytes are read into memory, resampled to 16 kHz mono, and handed to a Web Worker — they never cross the network.
What models are used?
Whisper tiny.en (~75 MB) for the transcript — fast, English-only, runs as ONNX via onnx-community/whisper-tiny.en. Llama 3.2 1B Instruct (~750 MB, q4f16 quantized via MLC-LLM) for the optional summary.
What happens on first use?
Whisper downloads (~75 MB) and caches in your browser the first time you press Rec or Upload. If you ask for a summary, Llama 3.2 1B downloads (~750 MB) on that click. Both are stored in IndexedDB, so subsequent runs load straight from cache.
Does it work offline?
Yes, after the initial downloads. Both models live in IndexedDB, so a second visit runs without any network. Useful for in-flight notes or when you don't want to advertise that you're transcribing something.
How accurate is it?
Whisper tiny.en is the smallest English variant — great for clear speech, but it can struggle with heavy accents, noisy audio, or specialised jargon. Larger Whisper variants (base, small, medium) are more accurate at the cost of download size and inference time.
What is WebGPU? What if my browser doesn’t have it?
WebGPU is the browser API that exposes the GPU for general compute. Modern Chrome and Edge ship it; Safari has it behind a flag. When WebGPU isn't available, the page automatically falls back to WebAssembly — slower, but works everywhere.
Can I run this offline as a PWA?
Not yet — the page is still a regular web app. Once the models are cached, only the HTML/JS shell needs the network; a service worker would make it a proper installable PWA. On the roadmap.