# GRPO (Experimental) This folder contains the **experimental** GRPO training path for query expansion. It is not part of the default production pipeline. ## Files - `grpo.yaml` – experimental GRPO hyperparameters - `grpo.py` – standalone GRPO training script ## Run ```bash # Recommended default: run from repo root cd /home/tobi/qmd uv run finetune/experiments/grpo/grpo.py # Or use unified entrypoint (deprecated in main pipeline): uv run train.py grpo --config finetune/experiments/grpo/grpo.yaml ``` ## Notes - Current mainline focuses on SFT-only quality and benchmarks. - Keep this workflow isolated unless you are explicitly experimenting with reinforcement-learning refinement.