Local-first voice cloning, text-to-speech, PDF reader, and audiobook creator. Runs on macOS (MPS), Windows (CUDA), and Web. Tested on RTX 4090 & 5090.
Supported TTS & Voice Cloning Engines
Each engine brings unique strengths. Use the right one for your task, or combine them for maximum flexibility.
82M parameter model with sub-200ms latency. 21 British and American voices with speed control.
Clone any voice from just 3 seconds of audio. 9 premium preset speakers with style instructions.
Multilingual voice cloning across 23 languages. Clone voices and speak in any supported language.
High-fidelity voice cloning with a large 24GB model for maximum quality and naturalness.
Upload a short audio clip and Qwen3-TTS will learn the voice characteristics. Use style instructions to control emotion, pace, and tone.
The fastest engine in the studio. Generate speech in under 200ms with fine-grained speed control. Includes Emma IPA for phonetic transcription powered by your choice of LLM.
Read PDFs aloud with sentence-by-sentence highlighting, or convert entire documents to audiobooks with chapter markers. Supports PDF, EPUB, TXT, Markdown, and DOCX.
Integrate MimikaStudio into your workflow with a comprehensive REST API and Model Context Protocol server. Use it programmatically from Claude Code, scripts, or your own applications.
The built-in model manager lets you download and manage TTS models directly from the app. See model sizes, status, and switch between engines instantly.
Chatterbox brings multilingual voice cloning. Clone a voice in English and speak in Japanese, or any other supported language.
MimikaStudio runs natively on macOS with MPS acceleration, on Windows with NVIDIA CUDA (tested on RTX 4090 & 5090), and in any browser via Flutter Web.
Metal Performance Shaders acceleration for M1, M2, M3, and M4. Optimized neural inference on Apple Silicon and Intel.
Full CUDA support on Windows. Tested on RTX 4090 and RTX 5090 for maximum inference throughput.
Access MimikaStudio from any browser. The same Flutter app runs as a web UI backed by the local API server.
Kokoro TTS generates speech almost instantly. Real-time performance for interactive use cases.
No cloud, no accounts, no data leaves your machine. All processing happens on-device with local storage.
Full command-line interface and MCP server for automation. Integrate into Claude Code or any workflow.
Free, open source, runs locally on macOS, Windows, and Web. No account needed.