Changelog

All notable changes to QMD will be documented in this file.

1.0.0 - 2026-02-15

Node.js Compatibility

QMD now runs on both Node.js (>=22) and Bun. Install with npm install -g @tobilu/qmd or bun install -g @tobilu/qmd — your choice. The qmd wrapper auto-detects Node.js via tsx and works out of the box with mise, asdf, nvm, and Homebrew installs.

Performance

Parallel embedding & reranking — multiple contexts split work across CPU cores (or VRAM on GPU), delivering up to 2.7x faster reranking and significantly faster embedding on multi-core machines
Flash attention — ~20% less VRAM per reranking context, enabling more parallel contexts on GPU
Right-sized contexts — reranker context dropped from 40960 to 2048 tokens (17x less memory), since chunks are capped at ~900 tokens
Adaptive parallelism — automatically scales context count based on available VRAM (GPU) or CPU math cores
CPU thread splitting — each context runs on its own cores for true parallelism instead of contending on a single context

GPU Auto-Detection

Probes for CUDA, Metal, and Vulkan at startup — uses the best available backend
Falls back gracefully to CPU with a warning if GPU init fails
qmd status now shows device info (GPU type, VRAM usage)

Test Suite

Tests split into src/*.test.ts (unit), src/models/*.test.ts (model), and src/integration/*.test.ts (CLI/integration)
Vitest config for Node.js; bun test still works for Bun
New eval-bm25 and store.helpers.unit test suites

Fixes

Prevent VRAM waste from duplicate context creation during concurrent loads
Collection-aware FTS filtering for scoped keyword search

0.9.0 - 2026-02-15

Initial public release.

Features

Hybrid search pipeline — BM25 full-text + vector similarity + LLM reranking with Reciprocal Rank Fusion
Smart chunking — scored markdown break points keep sections, paragraphs, and code blocks intact (~900 tokens/chunk, 15% overlap)
Query expansion — fine-tuned Qwen3 1.7B model generates search variations for better recall
Cross-encoder reranking — Qwen3-Reranker scores candidates with position-aware blending
Vector embeddings — EmbeddingGemma 300M via node-llama-cpp, all on-device
MCP server — stdio and HTTP transports for Claude Desktop, Claude Code, and any MCP client
Collection management — index multiple directories with glob patterns
Context annotations — add descriptions to collections and paths for richer search
Document IDs — 6-char content hash for stable references across re-indexes
Multi-get — retrieve multiple documents by glob pattern, comma list, or docids
Multiple output formats — JSON, CSV, Markdown, XML, files list
Claude Code plugin — inline status checks and MCP integration

Fixes

Handle dense content (code) that tokenizes beyond expected chunk size
Proper cleanup of Metal GPU resources
SQLite-vec readiness verification after extension load
Reactivate deactivated documents on re-index
BM25 score normalization with Math.abs
Bun UTF-8 path corruption workaround

CHANGELOG.md 3.3 KB History Raw

Changelog

1.0.0 - 2026-02-15

Node.js Compatibility

Performance

GPU Auto-Detection

Test Suite

Fixes

0.9.0 - 2026-02-15

Features

Fixes

CHANGELOG.md 3.3 KB

History Raw