CHANGELOG.md 3.3 KB

Changelog

All notable changes to QMD will be documented in this file.

1.0.0 - 2026-02-15

Node.js Compatibility

QMD now runs on both Node.js (>=22) and Bun. Install with npm install -g @tobilu/qmd or bun install -g @tobilu/qmd — your choice. The qmd wrapper auto-detects Node.js via tsx and works out of the box with mise, asdf, nvm, and Homebrew installs.

Performance

  • Parallel embedding & reranking — multiple contexts split work across CPU cores (or VRAM on GPU), delivering up to 2.7x faster reranking and significantly faster embedding on multi-core machines
  • Flash attention — ~20% less VRAM per reranking context, enabling more parallel contexts on GPU
  • Right-sized contexts — reranker context dropped from 40960 to 2048 tokens (17x less memory), since chunks are capped at ~900 tokens
  • Adaptive parallelism — automatically scales context count based on available VRAM (GPU) or CPU math cores
  • CPU thread splitting — each context runs on its own cores for true parallelism instead of contending on a single context

GPU Auto-Detection

  • Probes for CUDA, Metal, and Vulkan at startup — uses the best available backend
  • Falls back gracefully to CPU with a warning if GPU init fails
  • qmd status now shows device info (GPU type, VRAM usage)

Test Suite

  • Tests split into src/*.test.ts (unit), src/models/*.test.ts (model), and src/integration/*.test.ts (CLI/integration)
  • Vitest config for Node.js; bun test still works for Bun
  • New eval-bm25 and store.helpers.unit test suites

Fixes

  • Prevent VRAM waste from duplicate context creation during concurrent loads
  • Collection-aware FTS filtering for scoped keyword search

0.9.0 - 2026-02-15

Initial public release.

Features

  • Hybrid search pipeline — BM25 full-text + vector similarity + LLM reranking with Reciprocal Rank Fusion
  • Smart chunking — scored markdown break points keep sections, paragraphs, and code blocks intact (~900 tokens/chunk, 15% overlap)
  • Query expansion — fine-tuned Qwen3 1.7B model generates search variations for better recall
  • Cross-encoder reranking — Qwen3-Reranker scores candidates with position-aware blending
  • Vector embeddings — EmbeddingGemma 300M via node-llama-cpp, all on-device
  • MCP server — stdio and HTTP transports for Claude Desktop, Claude Code, and any MCP client
  • Collection management — index multiple directories with glob patterns
  • Context annotations — add descriptions to collections and paths for richer search
  • Document IDs — 6-char content hash for stable references across re-indexes
  • Multi-get — retrieve multiple documents by glob pattern, comma list, or docids
  • Multiple output formats — JSON, CSV, Markdown, XML, files list
  • Claude Code plugin — inline status checks and MCP integration

Fixes

  • Handle dense content (code) that tokenizes beyond expected chunk size
  • Proper cleanup of Metal GPU resources
  • SQLite-vec readiness verification after extension load
  • Reactivate deactivated documents on re-index
  • BM25 score normalization with Math.abs
  • Bun UTF-8 path corruption workaround