Stats Schema¶

Three producers emit the same structured stats:

Producer	Surface	Source
`lunavox-cli --stats-json report.json`	JSON file	`src/main.cpp`
`LunavoxAudio` from the C API	In-memory struct	`src/lunavox_c_api.h`
`lunavox.runtime.SynthesisStats`	Python dataclass	`src/lunavox/runtime/binding.py`

The shared shape is pinned in src/lunavox/core/stats_schema.py. Adding a field means editing that module and src/main.cpp + src/lunavox_c_api.h + src/lunavox/runtime/binding.py in the same commit.

Top-level `StatsJSON`¶

{
  "t_load_ms":    1714,    // wall time inside Engine::load_models
  "t_warmup_ms":   565,    // warmup portion of t_load_ms (decoder first-run)
  "runs": [ ... RunStats ... ]
}

`RunStats`¶

{
  "run_id":              1,
  "sample_rate":         24000,
  "n_samples":           71040,
  "audio_duration_s":    2.96,
  "rtf":                 0.175,
  "effective_language_id": -1,
  "timing_ms": { ... TimingMs ... },
  "stream":    { ... StreamStats ... },
  "mem":       { ... MemoryBytes ... }
}

rtf = timing_ms.total / 1000 / audio_duration_s. Lower is faster; < 1.0 is faster than realtime.

`TimingMs` (milliseconds)¶

Field	Always populated	Description
`tokenize`	✓	Text → token IDs
`encode`	✓	Speaker encoder (0 when using a pre-computed embedding JSON)
`generate`	✓	LLM sequence generation (talker + predictor + sampling)
`decode`	✓	ONNX decoder session + post-processing
`total`	✓	Sum of the above + overhead
`first_audio`	✓	Wall time to first decoded PCM chunk (streaming pipeline)
`llama_prefill` / `llama_decode_loop` / `talker_post` / `predictor_sample` / `talker_decode`		Detailed sub-timings; require the `LUNAVOX_TIMING` build flag
`decoder_tensor_prep` / `decoder_ort_run` / `decoder_tensor_extract` / `decoder_state_trim` / `pcm_gather`		Same

`StreamStats`¶

Field	Description
`first_chunk_frames`	Frames in the first decoded chunk (TTFB tuning knob)
`t_first_audio_ms`	Same value the C API exposes as `audio.first_audio_ms`

`MemoryBytes` (bytes)¶

Field	Description
`rss_start` / `rss_end`	Process RSS at synth entry / exit
`rss_peak`	High-water RSS during the synth
`phys_start` / `phys_end` / `phys_peak`	macOS `phys_footprint` (equal to RSS on Windows / Linux)

C API Subset¶

LunavoxAudio exposes a subset of the above directly on the audio struct, so the C / Python binding does not need an extra round-trip. Memory / VRAM samples live in a nested LunavoxMemStats block so callers see a single mem.*_peak_delta_bytes computation path instead of juggling flat fields:

typedef struct LunavoxMemStats {
    uint64_t rss_start_bytes;
    uint64_t rss_end_bytes;
    uint64_t rss_peak_bytes;
    uint64_t vram_start_bytes;
    uint64_t vram_end_bytes;
    uint64_t vram_peak_bytes;
    uint32_t vram_measured;  /* 1 = NVML returned per-PID attributed bytes; 0 = not measured */
    uint32_t _pad;
} LunavoxMemStats;

typedef struct LunavoxAudio {
    const float* samples;
    int32_t  n_samples;
    int32_t  sample_rate;
    int64_t  t_tokenize_ms;
    int64_t  t_encode_ms;
    int64_t  t_generate_ms;
    int64_t  t_decode_ms;
    int64_t  t_total_ms;
    int64_t  audio_duration_ms;
    float    rtf;
    float    _pad;
    LunavoxMemStats mem;
} LunavoxAudio;

VRAM attribution¶

vram_* fields are sampled via NVML using nvmlDevice*RunningProcesses and filtered to the engine's own PID — readings reflect LunaVox's own allocations, not whole-device usage. When NVML is unavailable, or when the driver cannot attribute a per-process byte count for the current process, vram_measured stays 0 and the vram_* fields are undefined. Clients MUST gate VRAM rendering on vram_measured, not on vram_peak_bytes > 0 (a zero reading on a CPU-only run is a legitimate measurement).

Python / HTTP mirrors¶

The Python binding mirrors this as MemStats + SynthesisStats (see lunavox.runtime.params), and the HTTP / WS API echoes it as SynthStatsResponse { mem: MemStatsResponse } with the same field names. MemStats.rss_peak_delta_bytes / vram_peak_delta_bytes are computed properties — they return peak - start clamped at zero, which is the single correct "synthesis-driven growth" figure for UIs to display. The finer llama_* / decoder_* sub-timings only appear in the JSON file output.

Consumers¶

benchmark/run_benchmark.py — reads --stats-json to compute the 100-run latency / TTFB / RTF / memory distribution that powers benchmark/report.md.
lunavox.gui.widgets.stats_card::StatsCard.update_stats — renders SynthesisStats directly into the GUI's stats card (no legacy dict projection).
New tooling should import from lunavox.core.stats_schema instead of poking at free-form dicts.

Stats Schema¶

Top-level StatsJSON¶

RunStats¶

TimingMs (milliseconds)¶

StreamStats¶

MemoryBytes (bytes)¶