hace 4 meses · 0ff9bec129
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,11 +2,54 @@
 
				 
			
 
				 ## [Unreleased]
			
 
				 
			
 
				+13 community PRs merged. GPU initialization replaced with node-llama-cpp's
			
 
				+built-in `autoAttempt` — deleting ~220 lines of manual fallback code and
			
 
				+fixing GPU issues reported across 10+ PRs in one shot. Reranking is faster
			
 
				+through chunk deduplication and a parallelism cap that prevents VRAM
			
 
				+exhaustion.
			
 
				+
			
 
				 ### Changes
			
 
				 
			
 
				-- Query: add `--explain` for `qmd query` to expose retrieval score traces
			
 
				-  in JSON and CLI output. Includes backend scores (FTS/vector), per-list
			
 
				-  RRF contributions, top-rank bonus, reranker score, and final blended score.
			
 
				+- **GPU init**: use node-llama-cpp's `build: "autoAttempt"` instead of manual
			
 
				+  GPU backend detection. Automatically tries Metal/CUDA/Vulkan and falls back
			
 
				+  gracefully. #310 (thanks @giladgd — the node-llama-cpp author)
			
 
				+- **Query `--explain`**: `qmd query --explain` exposes retrieval score traces
			
 
				+  — backend scores, per-list RRF contributions, top-rank bonus, reranker
			
 
				+  score, and final blended score. Works in JSON and CLI output. #242
			
 
				+  (thanks @vyalamar)
			
 
				+- **Collection ignore patterns**: `ignore: ["Sessions/**", "*.tmp"]` in
			
 
				+  collection config to exclude files from indexing. #304 (thanks @sebkouba)
			
 
				+- **Multilingual embeddings**: `QMD_EMBED_MODEL` env var lets you swap in
			
 
				+  models like Qwen3-Embedding for non-English collections. #273 (thanks
			
 
				+  @daocoding)
			
 
				+- **Configurable expansion context**: `QMD_EXPAND_CONTEXT_SIZE` env var
			
 
				+  (default 2048) — previously used the model's full 40960-token window,
			
 
				+  wasting VRAM. #313 (thanks @0xble)
			
 
				+- **`candidateLimit` exposed**: `-C` / `--candidate-limit` flag and MCP
			
 
				+  parameter to tune how many candidates reach the reranker. #255 (thanks
			
 
				+  @pandysp)
			
 
				+- **MCP multi-session**: HTTP transport now supports multiple concurrent
			
 
				+  client sessions, each with its own server instance. #286 (thanks @joelev)
			
 
				+
			
 
				+### Fixes
			
 
				+
			
 
				+- **Reranking performance**: cap parallel rerank contexts at 4 to prevent
			
 
				+  VRAM exhaustion on high-core machines. Deduplicate identical chunk texts
			
 
				+  before reranking — same content from different files now shares a single
			
 
				+  reranker call. Cache scores by content hash instead of file path.
			
 
				+- Deactivate stale docs when all files are removed from a collection and
			
 
				+  `qmd update` is run. #312 (thanks @0xble)
			
 
				+- Handle emoji-only filenames (`🐘.md` → `1f418.md`) instead of crashing.
			
 
				+  #308 (thanks @debugerman)
			
 
				+- Skip unreadable files during indexing (e.g. iCloud-evicted files returning
			
 
				+  EAGAIN) instead of crashing. #253 (thanks @jimmynail)
			
 
				+- Suppress progress bar escape sequences when stderr is not a TTY. #230
			
 
				+  (thanks @dgilperez)
			
 
				+- Emit format-appropriate empty output (`[]` for JSON, CSV header for CSV,
			
 
				+  etc.) instead of plain text "No results." #228 (thanks @amsminn)
			
 
				+- Correct Windows sqlite-vec package name (`sqlite-vec-windows-x64`) and add
			
 
				+  `sqlite-vec-linux-arm64`. #225 (thanks @ilepn)
			
 
				+- Fix claude plugin setup CLI commands in README. #311 (thanks @gi11es)
			
 
				 
			
 
				 ## [1.1.1] - 2026-03-06