Stats Schema¶
Three producers emit the same structured stats:
| Producer | Surface | Source |
|---|---|---|
lunavox-cli --stats-json report.json |
JSON file | src/main.cpp |
LunavoxAudio from the C API |
In-memory struct | src/lunavox_c_api.h |
lunavox.runtime.SynthesisStats |
Python dataclass | src/lunavox/runtime/binding.py |
The shared shape is pinned in src/lunavox/core/stats_schema.py. Adding a field means editing that module and src/main.cpp + src/lunavox_c_api.h + src/lunavox/runtime/binding.py in the same commit.
Top-level StatsJSON¶
{
"t_load_ms": 1714, // wall time inside Engine::load_models
"t_warmup_ms": 565, // warmup portion of t_load_ms (decoder first-run)
"runs": [ ... RunStats ... ]
}
RunStats¶
{
"run_id": 1,
"sample_rate": 24000,
"n_samples": 71040,
"audio_duration_s": 2.96,
"rtf": 0.175,
"effective_language_id": -1,
"timing_ms": { ... TimingMs ... },
"stream": { ... StreamStats ... },
"mem": { ... MemoryBytes ... }
}
rtf = timing_ms.total / 1000 / audio_duration_s. Lower is faster; < 1.0 is faster than realtime.
TimingMs (milliseconds)¶
| Field | Always populated | Description |
|---|---|---|
tokenize |
✓ | Text → token IDs |
encode |
✓ | Speaker encoder (0 when using a pre-computed embedding JSON) |
generate |
✓ | LLM sequence generation (talker + predictor + sampling) |
decode |
✓ | ONNX decoder session + post-processing |
total |
✓ | Sum of the above + overhead |
first_audio |
✓ | Wall time to first decoded PCM chunk (streaming pipeline) |
llama_prefill / llama_decode_loop / talker_post / predictor_sample / talker_decode |
Detailed sub-timings; require the LUNAVOX_TIMING build flag |
|
decoder_tensor_prep / decoder_ort_run / decoder_tensor_extract / decoder_state_trim / pcm_gather |
Same |
StreamStats¶
| Field | Description |
|---|---|
first_chunk_frames |
Frames in the first decoded chunk (TTFB tuning knob) |
t_first_audio_ms |
Same value the C API exposes as audio.first_audio_ms |
MemoryBytes (bytes)¶
| Field | Description |
|---|---|
rss_start / rss_end |
Process RSS at synth entry / exit |
rss_peak |
High-water RSS during the synth |
phys_start / phys_end / phys_peak |
macOS phys_footprint (equal to RSS on Windows / Linux) |
C API Subset¶
LunavoxAudio exposes a subset of the above directly on the audio struct, so the C / Python binding does not need an extra round-trip. Memory / VRAM samples live in a nested LunavoxMemStats block so callers see a single mem.*_peak_delta_bytes computation path instead of juggling flat fields:
typedef struct LunavoxMemStats {
uint64_t rss_start_bytes;
uint64_t rss_end_bytes;
uint64_t rss_peak_bytes;
uint64_t vram_start_bytes;
uint64_t vram_end_bytes;
uint64_t vram_peak_bytes;
uint32_t vram_measured; /* 1 = NVML returned per-PID attributed bytes; 0 = not measured */
uint32_t _pad;
} LunavoxMemStats;
typedef struct LunavoxAudio {
const float* samples;
int32_t n_samples;
int32_t sample_rate;
int64_t t_tokenize_ms;
int64_t t_encode_ms;
int64_t t_generate_ms;
int64_t t_decode_ms;
int64_t t_total_ms;
int64_t audio_duration_ms;
float rtf;
float _pad;
LunavoxMemStats mem;
} LunavoxAudio;
VRAM attribution¶
vram_* fields are sampled via NVML using nvmlDevice*RunningProcesses and filtered to the engine's own PID — readings reflect LunaVox's own allocations, not whole-device usage. When NVML is unavailable, or when the driver cannot attribute a per-process byte count for the current process, vram_measured stays 0 and the vram_* fields are undefined. Clients MUST gate VRAM rendering on vram_measured, not on vram_peak_bytes > 0 (a zero reading on a CPU-only run is a legitimate measurement).
Python / HTTP mirrors¶
The Python binding mirrors this as MemStats + SynthesisStats (see lunavox.runtime.params), and the HTTP / WS API echoes it as SynthStatsResponse { mem: MemStatsResponse } with the same field names. MemStats.rss_peak_delta_bytes / vram_peak_delta_bytes are computed properties — they return peak - start clamped at zero, which is the single correct "synthesis-driven growth" figure for UIs to display. The finer llama_* / decoder_* sub-timings only appear in the JSON file output.
Consumers¶
benchmark/run_benchmark.py— reads--stats-jsonto compute the 100-run latency / TTFB / RTF / memory distribution that powersbenchmark/report.md.lunavox.gui.widgets.stats_card::StatsCard.update_stats— rendersSynthesisStatsdirectly into the GUI's stats card (no legacy dict projection).- New tooling should import from
lunavox.core.stats_schemainstead of poking at free-form dicts.