Local-first voice cloning, text-to-speech, PDF reader, and audiobook creator. Now with agentic MCP support for Codex and Claude Code. Optimized for Apple Silicon with native Metal acceleration via MLX and ONNX runtimes for multilingual TTS.
Clone any voice from a 3-second sample. Qwen3-TTS learns timbre, pitch, and cadence while style prompts let you control emotion, pace, and tone.
Turn documents into full audiobooks with chapter markers and natural pacing. Follow along with sentence highlighting, or export hours of polished audio.
Generate narration for YouTube, explainers, and ads. Switch between five TTS engines to find the perfect voice, then fine-tune with style controls.
Everything runs on your Mac with native Metal acceleration. No cloud uploads, no API limits, no subscriptions. Your voices stay on your device.
Listen to samples generated by each TTS engine. For voice cloning demos, compare the reference voice with the generated output.
Voice cloning from a 3-second sample. Compare the reference voice with the generated output.
Genesis4 Style
Generated speech keeps the reference timbre while applying Genesis4 style control.
Genesis4 Style
Reference and generated clips align on pitch contour with a more polished studio-like cadence.
Genesis4 Preset Voice
Preset speaker profile with dependable pronunciation for general narration.
Qwen3-TTSMultilingual Demo
Cross-language cloning maps the same vocal identity into multilingual output.
Expressive voice cloning with emotion control. Natural, emotive speech synthesis.
Emotional Speech Demo
Generated output adds emotional dynamics while preserving speaker identity.
Emotional Speech Demo
Emotion-aware rendering demonstrates richer contour and emphasis with natural prosody shifts.
Supported TTS & Voice Cloning Engines
Each engine brings unique strengths. Use the right one for your task, or combine them for maximum flexibility.
82M parameter model with sub-200ms latency. 21 British and American voices with speed control.
Clone any voice from just 3 seconds of audio. 9 premium preset speakers with style instructions.
Multilingual voice cloning across 23 languages. Clone voices and speak in any supported language.
ONNX-based multilingual synthesis with low-latency local generation across preset voices and multiple languages.
Standalone ONNX-based expressive speech synthesis with preset voices, style control, and emotional delivery for narration.
All models below appear in the in-app model manager with independent download and status tracking.
Standalone CosyVoice3 model package (separate from Supertonic) with expressive preset voices.
Built-in voices with natural prosody. Reading from the Book of Job.
Job 6:1-2 · British Female
Balanced pacing and crisp articulation for long-form reading with book-like clarity.
KokoroJob 14:7 · British Male
Lower-register delivery with stable rhythm suited for document narration and reports.
KokoroJob 42:5-6 · British Female
Warm and expressive tone with subtle emphasis on key phrases for natural spoken prose.
KokoroFull audiobook excerpts generated with Kokoro TTS. H.G. Wells' "A Short History of the World" (~17 minutes each) at 0.95x speed for natural pacing.
British Female · 17+ minutes
Full audiobook narration with Emma's crisp British accent. Perfect for long-form document reading and study material.
KokoroBritish Male · 19+ minutes
Rich male narration with George's authoritative tone. Ideal for history and non-fiction audiobook exports.
KokoroEmotionally expressive standalone ONNX generation with preset voice styles and instant local playback.
Genesis 4 Preview
Female preset tuned for expressive scripture narration with stable pacing and natural emphasis.
CosyVoice3Genesis 4 Preview
Male preset with grounded tone and clear phrasing for audiobook-style read-aloud playback.
CosyVoice3Fast multilingual ONNX synthesis. Instant playback from bundled pre-generated voices.
Genesis 4 Preview
Clear high-register narration tuned for scripture-style reading with quick response time.
SupertonicGenesis 4 Preview
Lower-register biblical narration that keeps a grounded tone while preserving clear verse cadence.
SupertonicThe Voice Prompts tab is the staging area for cloning. Upload or import a prompt once, preview it, tag it, and then reuse it across Qwen3 Clone, Chatterbox, and IndexTTS-2 without duplicating setup.
MimikaStudio can convert a YouTube clip into a reusable prompt locally. Paste the URL, choose an optional start time, listen to the extracted preview, then save it into the shared voice library with transcript and metadata.
The fastest engine in the studio. Generate speech in under 200ms with fine-grained speed control and high naturalness for narration and dialogue.
Read PDFs aloud with sentence-by-sentence highlighting, or convert full PDFs into audiobooks with chapter markers. Audiobook generation uses Kokoro voices.
The new Jobs tab shows all executed tasks across TTS, voice cloning, and audiobook exports. Review status instantly and replay outputs directly from the queue.
Integrate MimikaStudio into your workflow with a comprehensive REST API and the Mimika MCP (Model Context Protocol) server. Use it programmatically from Claude Code, Codex, scripts, or your own applications.
Example prompts once your client is connected to Mimika MCP at http://127.0.0.1:8010:
Codex: Use Mimika MCP tool audiobook_generate_from_file with file_path=/absolute/path/to/document.pdf, voice=bf_emma, output_format=mp3. Then poll audiobook_status until completed.
Claude Code: Call Mimika MCP audiobook_generate_from_file for /absolute/path/to/document.pdf with voice bf_emma and output_format mp3, then track audiobook_status every 10 seconds.
The built-in model manager lets you download and manage TTS models directly from the app. See model sizes, status, and switch between engines instantly.
Configure output folders, view app information, and manage your preferences. Everything you need to tailor MimikaStudio to your needs.
Chatterbox brings multilingual voice cloning. Clone a voice in English and speak in Japanese, or any other supported language. Hebrew requires the Dicta model, which can be downloaded directly from the app.
MimikaStudio runs natively on macOS with MLX-Audio, Apple's machine learning framework. Native Metal acceleration on M1, M2, M3, and M4 chips. Windows support coming soon.
Native Metal acceleration via Apple's MLX framework. Optimized neural inference on M1, M2, M3, and M4 chips.
The codebase supports Windows with CUDA. Pre-built Windows binaries will be available in a future release.
Access MimikaStudio from any browser. The same Flutter app runs as a web UI backed by the local API server.
Kokoro TTS generates speech almost instantly. Real-time performance for interactive use cases.
No cloud, no accounts, no data leaves your machine. All processing happens on-device with local storage.
Full command-line interface and MCP server for automation. Integrate into Claude Code, Codex, or any workflow.
Start with a free trial, then upgrade to a lifetime license. No subscriptions, no recurring fees.
No credit card required
Lifetime access
Runs locally on macOS (Apple Silicon). No account needed. Windows support coming soon.
This is an early alpha version intended for testing and development. Features may be incomplete, unstable, or change significantly before the stable release. Please report any issues on GitHub.
As of February 19, 2026, the MimikaStudio DMG is not yet signed/notarized by Apple. If macOS blocks launch, you must remove the quarantine attribute and approve it via Gatekeeper.
/Applications.# If installed to /Applications (system-wide):
xattr -d com.apple.quarantine /Applications/MimikaStudio.app
# If installed to ~/Applications (user-only):
xattr -d com.apple.quarantine ~/Applications/MimikaStudio.app
Why? macOS quarantines all downloaded apps. For unsigned apps, Gatekeeper may block execution. This command removes the quarantine flag.