Train Qwen3-1.7B to expand search queries into structured hyde:/lex:/vec: output for QMD's hybrid retrieval pipeline.
hyde: A hypothetical document passage that would answer the query.
lex: keyword1
lex: keyword2
vec: semantic query reformulation
vec: another semantic variation
hyde: always comes FIRST (one line max)lex: lines for BM25 keyword search (1-3 lines, short keywords)vec: lines for vector similarity search (1-3 lines, natural language)Single destination: tobil/qmd-query-expansion-1.7B
-v1, -v2, -v4, etc.)-sft or -grpo repos for final modelstobil/qmd-query-expansion-1.7B-ggufAll JSONL files in data/ are training data:
data/
├── qmd_expansion_v2.jsonl
├── qmd_expansion_handcrafted_only.jsonl
├── qmd_only_sampled.jsonl
├── qmd_only_variants.jsonl
└── ... any additional .jsonl files
All .jsonl files in data/ should be concatenated for training runs.
Each JSONL line: {"input": "query", "output": "hyde:...\nlex:...\nvec:..."}
| Script | Purpose |
|---|---|
dataset/generate_data.py |
Generate via Claude API (high quality) |
dataset/generate_data_offline.py |
Transform from HuggingFace datasets |
dataset/prepare_data.py |
Format for Qwen3 chat template |
dataset/clean_data.py |
Detect and fix technical term issues |
generate_only_variants.py |
Generate /only:lex and /only:vec variants |
All training outputs go to outputs/ (gitignored):
outputs/
├── sft/ # SFT checkpoint
└── grpo/ # GRPO checkpoint
Always use Qwen3-1.7B as the base model unless explicitly stated otherwise.
Training can run locally (requires CUDA GPU) or via HuggingFace Jobs (cloud GPU, no local hardware needed).
# Local (requires CUDA)
uv run train.py sft --config configs/sft.yaml
# Output: outputs/sft/
# Cloud (HuggingFace Jobs - no local GPU needed)
hf jobs uv run --flavor a10g-large --secrets HF_TOKEN --timeout 2h jobs/sft.py
# Local (requires CUDA)
uv run train.py grpo --config configs/grpo.yaml
# Output: outputs/grpo/
# Cloud (HuggingFace Jobs - no local GPU needed)
hf jobs uv run --flavor a10g-large --secrets HF_TOKEN --timeout 4h jobs/grpo.py
If no local CUDA device is available, use hf jobs to run training in the cloud:
hf jobs ps # List running jobs
hf jobs logs <job-id> # Stream logs
hf jobs inspect <job-id> # Check status
hf jobs cancel <job-id> # Cancel a job
The jobs/ directory contains self-contained scripts that include all dependencies inline.
# Eval local model
uv run eval.py --model ./outputs/grpo
# Eval HuggingFace model
uv run eval.py --model tobil/qmd-query-expansion-1.7B
# Save eval results to file
uv run eval.py --model ./outputs/grpo -o eval_results.json
reward.py is the single source of truth for scoring:
# Self-test the reward function
uv run reward.py
See SCORING.md for the full rubric.
Never upload without eval. Every model push must include eval results.
data/*.jsonl → outputs/sft/outputs/grpo/uv run eval.py --model ./outputs/grpo -o eval_results.jsontobil/qmd-query-expansion-1.7Btobil/qmd-query-expansion-1.7B-ggufsrc/llm.ts DEFAULT_GENERATE_MODEL if repo name changedfinetune/
├── reward.py # Scoring function (single source of truth)
├── train.py # Unified SFT + GRPO training
├── eval.py # Generate and score expansions
├── convert_gguf.py # GGUF conversion
├── SCORING.md # Detailed scoring rubric
├── CLAUDE.md # This file
├── data/ # All training JSONL files
├── outputs/ # Local training outputs (gitignored)
├── dataset/ # Data generation scripts
├── jobs/ # Self-contained HuggingFace Jobs scripts
├── configs/ # Training configs (sft.yaml, grpo.yaml)
└── evals/ # Test queries and results