suby/qmd

Autors	SHA1 Ziņojums	Datums
Tobi Lutke	9b3a209a97 Fix GRPO training: apply chat template to prompts	4 mēneši atpakaļ
Tobi Lutke	891f3262cf Fix GRPO reward function to handle think blocks and end tokens	4 mēneši atpakaļ
Tobi Lutke	8a1c4cdab0 Add 1.7B and 4B GRPO training and GGUF conversion scripts	4 mēneši atpakaļ