Просмотр исходного кода

Merge pull request #38 from odysseus0/fix/readme-model-sizes

docs: fix query expansion model size (Qwen3-1.7B, not 0.6B)
Tobias Lütke 4 месяцев назад
Родитель
Сommit
5b1671d2f6
1 измененных файлов с 3 добавлено и 3 удалено
  1. 3 3
      README.md

+ 3 - 3
README.md

@@ -112,7 +112,7 @@ Although the tool works perfectly fine when you just tell your agent to use it o
                         ▼                             ▼
                ┌────────────────┐            ┌────────────────┐
                │ Query Expansion│            │  Original Query│
-               │   (Qwen3-0.6B) │            │   (×2 weight)  │
+               │   (Qwen3-1.7B) │            │   (×2 weight)  │
                └───────┬────────┘            └───────┬────────┘
                        │                             │
                        │ 2 alternative queries       │
@@ -213,7 +213,7 @@ QMD uses three local GGUF models (auto-downloaded on first use):
 |-------|---------|------|
 | `embeddinggemma-300M-Q8_0` | Vector embeddings | ~300MB |
 | `qwen3-reranker-0.6b-q8_0` | Re-ranking | ~640MB |
-| `Qwen3-0.6B-Q8_0` | Query expansion | ~640MB |
+| `Qwen3-1.7B-Q8_0` | Query expansion | ~2.2GB |
 
 Models are downloaded from HuggingFace and cached in `~/.cache/qmd/models/`.
 
@@ -515,7 +515,7 @@ Models are configured in `src/llm.ts` as HuggingFace URIs:
 ```typescript
 const DEFAULT_EMBED_MODEL = "hf:ggml-org/embeddinggemma-300M-GGUF/embeddinggemma-300M-Q8_0.gguf";
 const DEFAULT_RERANK_MODEL = "hf:ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF/qwen3-reranker-0.6b-q8_0.gguf";
-const DEFAULT_GENERATE_MODEL = "hf:ggml-org/Qwen3-0.6B-GGUF/Qwen3-0.6B-Q8_0.gguf";
+const DEFAULT_GENERATE_MODEL = "hf:ggml-org/Qwen3-1.7B-GGUF/Qwen3-1.7B-Q8_0.gguf";
 ```
 
 ### EmbeddingGemma Prompt Format