Changelog
All notable changes to QMD will be documented in this file.
1.0.0 - 2026-02-15
Node.js Compatibility
QMD now runs on both Node.js (>=22) and Bun. Install with npm install -g @tobilu/qmd or bun install -g @tobilu/qmd — your choice. The qmd wrapper auto-detects Node.js via tsx and works out of the box with mise, asdf, nvm, and Homebrew installs.
Performance
- Parallel embedding & reranking — multiple contexts split work across CPU cores (or VRAM on GPU), delivering up to 2.7x faster reranking and significantly faster embedding on multi-core machines
- Flash attention — ~20% less VRAM per reranking context, enabling more parallel contexts on GPU
- Right-sized contexts — reranker context dropped from 40960 to 2048 tokens (17x less memory), since chunks are capped at ~900 tokens
- Adaptive parallelism — automatically scales context count based on available VRAM (GPU) or CPU math cores
- CPU thread splitting — each context runs on its own cores for true parallelism instead of contending on a single context
GPU Auto-Detection
- Probes for CUDA, Metal, and Vulkan at startup — uses the best available backend
- Falls back gracefully to CPU with a warning if GPU init fails
qmd status now shows device info (GPU type, VRAM usage)
Test Suite
- Tests split into
src/*.test.ts (unit), src/models/*.test.ts (model), and src/integration/*.test.ts (CLI/integration)
- Vitest config for Node.js; bun test still works for Bun
- New
eval-bm25 and store.helpers.unit test suites
Fixes
- Prevent VRAM waste from duplicate context creation during concurrent loads
- Collection-aware FTS filtering for scoped keyword search
0.9.0 - 2026-02-15
Initial public release.
Features
- Hybrid search pipeline — BM25 full-text + vector similarity + LLM reranking with Reciprocal Rank Fusion
- Smart chunking — scored markdown break points keep sections, paragraphs, and code blocks intact (~900 tokens/chunk, 15% overlap)
- Query expansion — fine-tuned Qwen3 1.7B model generates search variations for better recall
- Cross-encoder reranking — Qwen3-Reranker scores candidates with position-aware blending
- Vector embeddings — EmbeddingGemma 300M via node-llama-cpp, all on-device
- MCP server — stdio and HTTP transports for Claude Desktop, Claude Code, and any MCP client
- Collection management — index multiple directories with glob patterns
- Context annotations — add descriptions to collections and paths for richer search
- Document IDs — 6-char content hash for stable references across re-indexes
- Multi-get — retrieve multiple documents by glob pattern, comma list, or docids
- Multiple output formats — JSON, CSV, Markdown, XML, files list
- Claude Code plugin — inline status checks and MCP integration
Fixes
- Handle dense content (code) that tokenizes beyond expected chunk size
- Proper cleanup of Metal GPU resources
- SQLite-vec readiness verification after extension load
- Reactivate deactivated documents on re-index
- BM25 score normalization with Math.abs
- Bun UTF-8 path corruption workaround