Tobi Lutke
|
9b3a209a97
Fix GRPO training: apply chat template to prompts
|
4 ヶ月 前 |
Tobi Lutke
|
891f3262cf
Fix GRPO reward function to handle think blocks and end tokens
|
4 ヶ月 前 |
Tobi Lutke
|
8a1c4cdab0
Add 1.7B and 4B GRPO training and GGUF conversion scripts
|
4 ヶ月 前 |