LunaVox¶

High-performance C++ inference engine for Qwen3-TTS.

LunaVox is a specialized inference engine that takes a single TTS model family and pushes it to the theoretical limit on commodity hardware. Through a layered architecture — C++ core, ctypes-bound Python orchestration, desktop GUI shell — and deep integration with ONNX Runtime + llama.cpp, it delivers sub-200 ms TTFB and RTF 0.152 on an RTX 3090 for the 0.6B model.

Quick links¶

- :material-rocket-launch: **[Usage Tutorial](en/guide/usage_tutorial.md)** Walk through Base, Voice Cloning, Custom Voice, and Voice Design modes. - :material-console: **[CLI Reference](en/guide/cli_reference.md)** Every flag of `lunavox-cli` with examples. - :material-chart-bar: **[Benchmarks](en/benchmark/windows_performance.md)** Reproducible 100-run perf sweeps across CUDA / Vulkan+DML / CPU. - :material-code-braces: **[Python API](en/api/runtime.md)** Embed LunaVox in your own apps via `lunavox.runtime.Engine`. - :material-cog: **[Synthesis Pathway](en/technical/synthesis_pathway.md)** How text becomes audio end-to-end inside the C++ engine. - :material-file-chart: **[Stats Schema](en/technical/stats_schema.md)** Contract between `--stats-json` and downstream consumers.

Performance snapshot¶

Configuration	TTFB	RTF	Peak RAM	Speedup vs. PyTorch
Official PyTorch (CPU)	—	5.066	5.06 GB	1.00×
Official PyTorch (GPU)	—	3.788	1.59 GB	1.34×
LunaVox (Full CPU)	1248 ms	0.858	1.19 GB	5.90×
LunaVox (CUDA 13)	175 ms	0.213	1.41 GB	23.78×
LunaVox (Vulkan + DML)	194 ms	0.152	0.97 GB	33.33×

Measured on an Intel i9-12900K + RTX 3090 with Qwen3-TTS-12Hz-0.6B-Base, voice cloning mode, 100 measurement runs per backend after 5 warmups. Full distribution and raw stats in benchmark/report.md.

License¶

MIT. See LICENSE.