Model Profile & Runtime Contract¶
model_profile.json is the contract every loader honours. It lives in three places that must be edited in the same commit:
| Site | Type / file | How it reads |
|---|---|---|
| C++ engine (authoritative) | lunavox::ModelProfile in src/model_profile.h |
src/lunavox_engine.cpp::load_model_profile |
| Python runtime (display only) | lunavox.model.ModelProfile in src/lunavox/model/profile.py |
ModelProfile.load(model_dir) |
| Disk artifact | models/<name>/model_profile.json |
Written by lunavox convert |
The C++ side is strict (is_valid aborts Engine::load_models on any mismatch). The Python side is permissive — display metadata only.
1. Identity¶
| Field | Type | Default | Meaning |
|---|---|---|---|
version |
int | 1 | Schema version. Readers refuse unknown majors. |
model_type |
string | "base" |
One of base, custom, design. Gates which --mode values the model accepts. |
model_size |
string | "unknown" |
Display label, e.g. "0.6b", "1.7b". |
instruct_support |
bool | false | Whether the model accepts --instruct (custom / design only). |
2. Runtime Limits¶
| Field | Notes |
|---|---|
talker_n_ctx |
Talker LLM context cap. Must be ≤ talker_n_ctx_train. |
talker_n_ctx_train |
Training context length of the underlying weights. |
predictor_n_ctx |
Q1..Q15 predictor context cap. |
codec_num_codebooks |
Must be 16 for the current runtime. |
codec_id_start / codec_id_end |
Codec token ID range (start inclusive, end exclusive). |
predictor_vocab_size |
Must equal (codec_id_end - codec_id_start) * (codec_num_codebooks - 1). |
3. Special Tokens¶
| Group | Fields |
|---|---|
| Codec framing | codec_pad_id, codec_bos_id, codec_eos_id |
| Think gating | codec_think_id, codec_nothink_id, codec_think_bos_id, codec_think_eos_id |
| Text framing | tts_pad_id, tts_bos_id, tts_eos_id |
4. Generation Defaults¶
Pulled from upstream generation_config.json. Each field is overridable on the CLI.
| Field | Default | CLI override |
|---|---|---|
default_max_new_tokens |
400 | --max-tokens |
default_temperature |
0.6 | --temperature |
default_top_p |
1.0 | --top-p |
default_top_k |
50 | --top-k |
default_repetition_penalty |
1.05 | --repetition-penalty |
default_predictor_do_sample |
true | --predictor-greedy |
default_predictor_temperature |
0.6 | --predictor-temperature |
default_predictor_top_p |
1.0 | --predictor-top-p |
default_predictor_top_k |
50 | --predictor-top-k |
default_seed |
42 | --seed |
default_predictor_seed |
45 | --predictor-seed |
5. Language / Speaker Maps¶
| Field | Shape |
|---|---|
language_map |
{lowercase_name: language_id} |
speaker_map |
{lowercase_name: speaker_id} — populated only for custom models |
speaker_dialect_map |
{lowercase_name: lowercase_dialect_tag} (optional) |
Keys are pre-lowercased by the writer; ModelProfile::resolve_* lowercases lookups, so callers do not normalize.
6. Mode Routing & Hard Errors¶
--mode is optional. Routing follows model_type:
base→base(auto-switches toclonewhen--referenceis given)custom→customdesign→design
Hard errors (loader aborts):
basemodel +--instruct→--instruct is forbidden in base modecustom/design+--reference→mode 'clone' is incompatible with model_type ...- Any 0.6B model +
--instruct(no model in this size class supports it) - Talker weight other than
qwen3_tts_talker.q5_k.gguf
7. Quality Gate¶
./build/lunavox-cli --help # CLI smoke test
./build/lunavox-cli -m models/base_small -t "hello" -o out.wav
Single inference should finish in under 20 s on the smallest model. Always do a manual listening check before any quantization comparison — automated metrics never replace ears.