|
|
@@ -1,68 +1,275 @@
|
|
|
# Changelog
|
|
|
|
|
|
-All notable changes to QMD will be documented in this file.
|
|
|
+## [Unreleased]
|
|
|
|
|
|
## [1.0.0] - 2026-02-15
|
|
|
|
|
|
-### Node.js Compatibility
|
|
|
+QMD now runs on both Node.js and Bun, with up to 2.7x faster reranking
|
|
|
+through parallel GPU contexts. GPU auto-detection replaces the unreliable
|
|
|
+`gpu: "auto"` with explicit CUDA/Metal/Vulkan probing.
|
|
|
|
|
|
-QMD now runs on both **Node.js (>=22)** and **Bun**. Install with `npm install -g @tobilu/qmd` or `bun install -g @tobilu/qmd` — your choice. The `qmd` wrapper auto-detects Node.js via `tsx` and works out of the box with mise, asdf, nvm, and Homebrew installs.
|
|
|
+### Changes
|
|
|
|
|
|
-### Performance
|
|
|
+- Runtime: support Node.js (>=22) alongside Bun via a cross-runtime SQLite
|
|
|
+ abstraction layer (`src/db.ts`). `bun:sqlite` on Bun, `better-sqlite3` on
|
|
|
+ Node. The `qmd` wrapper auto-detects a suitable Node.js install via PATH,
|
|
|
+ then falls back to mise, asdf, nvm, and Homebrew locations.
|
|
|
+- Performance: parallel embedding & reranking via multiple LlamaContext
|
|
|
+ instances — up to 2.7x faster on multi-core machines.
|
|
|
+- Performance: flash attention for ~20% less VRAM per reranking context,
|
|
|
+ enabling more parallel contexts on GPU.
|
|
|
+- Performance: right-sized reranker context (40960 → 2048 tokens, 17x less
|
|
|
+ memory) since chunks are capped at ~900 tokens.
|
|
|
+- Performance: adaptive parallelism — context count computed from available
|
|
|
+ VRAM (GPU) or CPU math cores rather than hardcoded.
|
|
|
+- GPU: probe for CUDA, Metal, Vulkan explicitly at startup instead of
|
|
|
+ relying on node-llama-cpp's `gpu: "auto"`. `qmd status` shows device info.
|
|
|
+- Tests: reorganized into flat `test/` directory with vitest for Node.js and
|
|
|
+ bun test for Bun. New `eval-bm25` and `store.helpers.unit` suites.
|
|
|
|
|
|
-- **Parallel embedding & reranking** — multiple contexts split work across CPU cores (or VRAM on GPU), delivering up to **2.7x faster reranking** and significantly faster embedding on multi-core machines
|
|
|
-- **Flash attention** — ~20% less VRAM per reranking context, enabling more parallel contexts on GPU
|
|
|
-- **Right-sized contexts** — reranker context dropped from 40960 to 2048 tokens (17x less memory), since chunks are capped at ~900 tokens
|
|
|
-- **Adaptive parallelism** — automatically scales context count based on available VRAM (GPU) or CPU math cores
|
|
|
-- **CPU thread splitting** — each context runs on its own cores for true parallelism instead of contending on a single context
|
|
|
+### Fixes
|
|
|
+
|
|
|
+- Prevent VRAM waste from duplicate context creation during concurrent
|
|
|
+ `embedBatch` calls — initialization lock now covers the full path.
|
|
|
+- Collection-aware FTS filtering so scoped keyword search actually restricts
|
|
|
+ results to the requested collection.
|
|
|
|
|
|
-### GPU Auto-Detection
|
|
|
+## [0.9.0] - 2026-02-15
|
|
|
|
|
|
-- Probes for CUDA, Metal, and Vulkan at startup — uses the best available backend
|
|
|
-- Falls back gracefully to CPU with a warning if GPU init fails
|
|
|
-- `qmd status` now shows device info (GPU type, VRAM usage)
|
|
|
+First published release on npm as `@tobilu/qmd`. MCP HTTP transport with
|
|
|
+daemon mode cuts warm query latency from ~16s to ~10s by keeping models
|
|
|
+loaded between requests.
|
|
|
|
|
|
-### Test Suite
|
|
|
+### Changes
|
|
|
|
|
|
-- Tests split into `src/*.test.ts` (unit), `src/models/*.test.ts` (model), and `src/integration/*.test.ts` (CLI/integration)
|
|
|
-- Vitest config for Node.js; bun test still works for Bun
|
|
|
-- New `eval-bm25` and `store.helpers.unit` test suites
|
|
|
+- MCP: HTTP transport with daemon lifecycle — `qmd mcp --http --daemon`
|
|
|
+ starts a background server, `qmd mcp stop` shuts it down. Models stay warm
|
|
|
+ in VRAM between queries. #149 (thanks @igrigorik)
|
|
|
+- Search: type-routed query expansion preserves lex/vec/hyde type info and
|
|
|
+ routes to the appropriate backend. Eliminates ~4 wasted backend calls per
|
|
|
+ query (10.0 → 6.0 calls, 1278ms → 549ms). #149 (thanks @igrigorik)
|
|
|
+- Search: unified pipeline — extracted `hybridQuery()` and
|
|
|
+ `vectorSearchQuery()` to `store.ts` so CLI and MCP share identical logic.
|
|
|
+ Fixes a class of bugs where results differed between the two. #149 (thanks
|
|
|
+ @igrigorik)
|
|
|
+- MCP: dynamic instructions generated at startup from actual index state —
|
|
|
+ LLMs see collection names, doc counts, and content descriptions. #149
|
|
|
+ (thanks @igrigorik)
|
|
|
+- MCP: tool renames (vsearch → vector_search, query → deep_search) with
|
|
|
+ rewritten descriptions for better tool selection. #149 (thanks @igrigorik)
|
|
|
+- Integration: Claude Code plugin with inline status checks and MCP
|
|
|
+ integration. #99 (thanks @galligan)
|
|
|
|
|
|
### Fixes
|
|
|
|
|
|
-- Prevent VRAM waste from duplicate context creation during concurrent loads
|
|
|
-- Collection-aware FTS filtering for scoped keyword search
|
|
|
+- BM25 score normalization — formula was inverted (`1/(1+|x|)` instead of
|
|
|
+ `|x|/(1+|x|)`), so strong matches scored *lowest*. Broke `--min-score`
|
|
|
+ filtering and made the "strong signal" short-circuit dead code. #76 (thanks
|
|
|
+ @dgilperez)
|
|
|
+- Normalize Unicode paths to NFC for macOS compatibility. #82 (thanks
|
|
|
+ @c-stoeckl)
|
|
|
+- Handle dense content (code) that tokenizes beyond expected chunk size.
|
|
|
+- Proper cleanup of Metal GPU resources on process exit.
|
|
|
+- SQLite-vec readiness verification after extension load.
|
|
|
+- Reactivate deactivated documents on re-index instead of creating duplicates.
|
|
|
+- Bun UTF-8 path corruption workaround for non-ASCII filenames.
|
|
|
+- Disable following symlinks in glob.scan to avoid infinite loops.
|
|
|
|
|
|
----
|
|
|
+## [0.8.0] - 2026-01-28
|
|
|
|
|
|
-## [0.9.0] - 2026-02-15
|
|
|
+Fine-tuned query expansion model trained with GRPO replaces the stock Qwen3
|
|
|
+0.6B. The training pipeline scores expansions on named entity preservation,
|
|
|
+format compliance, and diversity — producing noticeably better lexical
|
|
|
+variations and HyDE documents.
|
|
|
+
|
|
|
+### Changes
|
|
|
+
|
|
|
+- LLM: deploy GRPO-trained (Group Relative Policy Optimization) query
|
|
|
+ expansion model, hosted on HuggingFace and auto-downloaded on first use.
|
|
|
+ Better preservation of proper nouns and technical terms in expansions.
|
|
|
+- LLM: `/only:lex` mode for single-type expansions — useful when you know
|
|
|
+ which search backend will help.
|
|
|
+- LLM: HyDE output moved to first position so vector search can start
|
|
|
+ embedding while other expansions generate.
|
|
|
+- LLM: session lifecycle management via `withLLMSession()` pattern — ensures
|
|
|
+ cleanup even on failure, similar to database transactions.
|
|
|
+- Integration: org-mode title extraction support. #50 (thanks @sh54)
|
|
|
+- Integration: SQLite extension loading in Nix devshell. #48 (thanks @sh54)
|
|
|
+- Integration: AI agent discovery via skills.sh. #64 (thanks @Algiras)
|
|
|
+
|
|
|
+### Fixes
|
|
|
+
|
|
|
+- Use sequential embedding on CPU-only systems — parallel contexts caused a
|
|
|
+ race condition where contexts competed for CPU cores, making things slower.
|
|
|
+ #54 (thanks @freeman-jiang)
|
|
|
+- Fix `collectionName` column in vector search SQL (was still using old
|
|
|
+ `collectionId` from before YAML migration). #61 (thanks @jdvmi00)
|
|
|
+- Fix Qwen3 sampling params to prevent repetition loops — stock
|
|
|
+ temperature/top-p caused occasional infinite repeat patterns.
|
|
|
+- Add `--index` option to CLI argument parser (was documented but not wired
|
|
|
+ up). #84 (thanks @Tritlo)
|
|
|
+- Fix DisposedError during slow batch embedding. #41 (thanks @wuhup)
|
|
|
|
|
|
-Initial public release.
|
|
|
+## [0.7.0] - 2026-01-09
|
|
|
|
|
|
-### Features
|
|
|
+First community contributions. The project gained external contributors,
|
|
|
+surfacing bugs that only appear in diverse environments — Homebrew sqlite-vec
|
|
|
+paths, case-sensitive model filenames, and sqlite-vec JOIN incompatibilities.
|
|
|
|
|
|
-- **Hybrid search pipeline** — BM25 full-text + vector similarity + LLM reranking with Reciprocal Rank Fusion
|
|
|
-- **Smart chunking** — scored markdown break points keep sections, paragraphs, and code blocks intact (~900 tokens/chunk, 15% overlap)
|
|
|
-- **Query expansion** — fine-tuned Qwen3 1.7B model generates search variations for better recall
|
|
|
-- **Cross-encoder reranking** — Qwen3-Reranker scores candidates with position-aware blending
|
|
|
-- **Vector embeddings** — EmbeddingGemma 300M via node-llama-cpp, all on-device
|
|
|
-- **MCP server** — stdio and HTTP transports for Claude Desktop, Claude Code, and any MCP client
|
|
|
-- **Collection management** — index multiple directories with glob patterns
|
|
|
-- **Context annotations** — add descriptions to collections and paths for richer search
|
|
|
-- **Document IDs** — 6-char content hash for stable references across re-indexes
|
|
|
-- **Multi-get** — retrieve multiple documents by glob pattern, comma list, or docids
|
|
|
-- **Multiple output formats** — JSON, CSV, Markdown, XML, files list
|
|
|
-- **Claude Code plugin** — inline status checks and MCP integration
|
|
|
+### Changes
|
|
|
+
|
|
|
+- Indexing: native `realpathSync()` replaces `readlink -f` subprocess spawn
|
|
|
+ per file. On a 5000-file collection this eliminates 5000 shell spawns,
|
|
|
+ ~15% faster. #8 (thanks @burke)
|
|
|
+- Indexing: single-pass tokenization — chunking algorithm tokenized each
|
|
|
+ document twice (count then split); now tokenizes once and reuses. #9
|
|
|
+ (thanks @burke)
|
|
|
|
|
|
### Fixes
|
|
|
|
|
|
-- Handle dense content (code) that tokenizes beyond expected chunk size
|
|
|
-- Proper cleanup of Metal GPU resources
|
|
|
-- SQLite-vec readiness verification after extension load
|
|
|
-- Reactivate deactivated documents on re-index
|
|
|
-- BM25 score normalization with Math.abs
|
|
|
-- Bun UTF-8 path corruption workaround
|
|
|
+- Fix `vsearch` and `query` hanging — sqlite-vec's virtual table doesn't
|
|
|
+ support the JOIN pattern used; rewrote to subquery. #23 (thanks @mbrendan)
|
|
|
+- Fix MCP server exiting immediately after startup — process had no active
|
|
|
+ handles keeping the event loop alive. #29 (thanks @mostlydev)
|
|
|
+- Fix collection filter SQL to properly restrict vector search results.
|
|
|
+- Support non-ASCII filenames in collection filter.
|
|
|
+- Skip empty files during indexing instead of crashing on zero-length content.
|
|
|
+- Fix case sensitivity in Qwen3 model filename resolution. #15 (thanks
|
|
|
+ @gavrix)
|
|
|
+- Fix sqlite-vec loading on macOS with Homebrew (`BREW_PREFIX` detection).
|
|
|
+ #42 (thanks @komsit37)
|
|
|
+- Fix Nix flake to use correct `src/qmd.ts` path. #7 (thanks @burke)
|
|
|
+- Fix docid lookup with quotes support in get command. #36 (thanks
|
|
|
+ @JoshuaLelon)
|
|
|
+- Fix query expansion model size in documentation. #38 (thanks @odysseus0)
|
|
|
+
|
|
|
+## [0.6.0] - 2025-12-28
|
|
|
+
|
|
|
+Replaced Ollama HTTP API with node-llama-cpp for all LLM operations. Ollama
|
|
|
+adds convenience but also a running server dependency. node-llama-cpp loads
|
|
|
+GGUF models directly in-process — zero external dependencies. Models
|
|
|
+auto-download from HuggingFace on first use.
|
|
|
+
|
|
|
+### Changes
|
|
|
+
|
|
|
+- LLM: structured query expansion via JSON schema grammar constraints.
|
|
|
+ Model produces typed expansions — **lexical** (BM25 keywords), **vector**
|
|
|
+ (semantic rephrasings), **HyDE** (hypothetical document excerpts) — so each
|
|
|
+ routes to the right backend instead of sending everything everywhere.
|
|
|
+- LLM: lazy model loading with 2-minute inactivity auto-unload. Keeps memory
|
|
|
+ low when idle while avoiding ~3s model load on every query.
|
|
|
+- Search: conditional query expansion — when BM25 returns strong results, the
|
|
|
+ expensive LLM expansion is skipped entirely.
|
|
|
+- Search: multi-chunk reranking — documents with multiple relevant chunks
|
|
|
+ scored by aggregating across all chunks rather than best single chunk.
|
|
|
+- Search: cosine distance for vector search (was L2).
|
|
|
+- Search: embeddinggemma nomic-style prompt formatting.
|
|
|
+- Testing: evaluation harness with synthetic test documents and Hit@K metrics
|
|
|
+ for BM25, vector, and hybrid RRF.
|
|
|
+
|
|
|
+## [0.5.0] - 2025-12-13
|
|
|
+
|
|
|
+Collections and contexts moved from SQLite tables to YAML at
|
|
|
+`~/.config/qmd/index.yml`. SQLite was overkill for config — you can't share
|
|
|
+it, and it's opaque. YAML is human-readable and version-controllable. The
|
|
|
+migration was extensive (35+ commits) because every part of the system that
|
|
|
+touched collections or contexts had to be updated.
|
|
|
+
|
|
|
+### Changes
|
|
|
+
|
|
|
+- Config: YAML-based collections and contexts replace SQLite tables.
|
|
|
+ `collections` and `path_contexts` tables dropped from schema. Collections
|
|
|
+ support an optional `update:` command (e.g., `git pull`) before re-index.
|
|
|
+- CLI: `qmd collection add/list/remove/rename` commands with `--name` and
|
|
|
+ `--mask` glob pattern support.
|
|
|
+- CLI: `qmd ls` virtual file tree — list collections, files in a collection,
|
|
|
+ or files under a path prefix.
|
|
|
+- CLI: `qmd context add/list/check/rm` with hierarchical context inheritance.
|
|
|
+ A query to `qmd://notes/2024/jan/` inherits context from `notes/`,
|
|
|
+ `notes/2024/`, and `notes/2024/jan/`.
|
|
|
+- CLI: `qmd context add / "text"` for global context across all collections.
|
|
|
+- CLI: `qmd context check` audit command to find paths without context.
|
|
|
+- Paths: `qmd://` virtual URI scheme for portable document references.
|
|
|
+ `qmd://notes/ideas.md` works regardless of where the collection lives on
|
|
|
+ disk. Works in `get`, `multi-get`, `ls`, and context commands.
|
|
|
+- CLI: document IDs (docid) — first 6 chars of content hash for stable
|
|
|
+ references. Shown as `#abc123` in search results, usable with `get` and
|
|
|
+ `multi-get`.
|
|
|
+- CLI: `--line-numbers` flag for get command output.
|
|
|
+
|
|
|
+## [0.4.0] - 2025-12-10
|
|
|
+
|
|
|
+MCP server for AI agent integration. Without it, agents had to shell out to
|
|
|
+`qmd search` and parse CLI output. The monolithic `qmd.ts` (1840 lines) was
|
|
|
+split into focused modules with the project's first test suite (215 tests).
|
|
|
+
|
|
|
+### Changes
|
|
|
+
|
|
|
+- MCP: stdio server with tools for search, vector search, hybrid query,
|
|
|
+ document retrieval, and status. Runs over stdio transport for Claude
|
|
|
+ Desktop and MCP clients.
|
|
|
+- MCP: spec-compliant with June 2025 MCP specification — removed non-spec
|
|
|
+ `mimeType`, added `isError: true` to errors, `structuredContent` for
|
|
|
+ machine-readable results, proper URI encoding.
|
|
|
+- MCP: simplified tool naming (`qmd_search` → `search`) since MCP already
|
|
|
+ namespaces by server.
|
|
|
+- Architecture: extract `store.ts` (1221 LOC), `llm.ts` (539 LOC),
|
|
|
+ `formatter.ts` (359 LOC), `mcp.ts` (503 LOC) from monolithic `qmd.ts`.
|
|
|
+- Testing: 215 tests (store: 96, llm: 60, mcp: 59) with mocked Ollama for
|
|
|
+ fast, deterministic runs. Before this: zero tests.
|
|
|
+
|
|
|
+## [0.3.0] - 2025-12-08
|
|
|
+
|
|
|
+Document chunking for vector search. A 5000-word document about many topics
|
|
|
+gets a single embedding that averages everything together, matching poorly for
|
|
|
+specific queries. Chunking produces one embedding per ~900-token section with
|
|
|
+focused semantic signal.
|
|
|
+
|
|
|
+### Changes
|
|
|
+
|
|
|
+- Search: markdown-aware chunking — prefers heading boundaries, then paragraph
|
|
|
+ breaks, then sentence boundaries. 15% overlap between chunks ensures
|
|
|
+ cross-boundary queries still match.
|
|
|
+- Search: multi-chunk scoring bonus (+0.02 per additional chunk, capped at
|
|
|
+ +0.1 for 5+ chunks). Documents relevant in multiple sections rank higher.
|
|
|
+- CLI: display paths show collection-relative paths and extracted titles
|
|
|
+ (from H1 headings or YAML frontmatter) instead of raw filesystem paths.
|
|
|
+- CLI: `--all` flag returns all matches (use with `--min-score` to filter).
|
|
|
+- CLI: byte-based progress bar with ETA for `embed` command.
|
|
|
+- CLI: human-readable time formatting ("15m 4s" instead of "904.2s").
|
|
|
+- CLI: documents >64KB truncated with warning during embedding.
|
|
|
+
|
|
|
+## [0.2.0] - 2025-12-08
|
|
|
+
|
|
|
+### Changes
|
|
|
+
|
|
|
+- CLI: `--json`, `--csv`, `--files`, `--md`, `--xml` output format flags.
|
|
|
+ `--json` for programmatic access, `--files` for piping, `--md`/`--xml` for
|
|
|
+ LLM consumption, `--csv` for spreadsheets.
|
|
|
+- CLI: `qmd status` shows index health — document count, size, embedding
|
|
|
+ coverage, time since last update.
|
|
|
+- Search: weighted RRF — original query gets 2x weight relative to expanded
|
|
|
+ queries since the user's actual words are a more reliable signal.
|
|
|
+
|
|
|
+## [0.1.0] - 2025-12-07
|
|
|
+
|
|
|
+Initial implementation. Built in a single day for searching personal markdown
|
|
|
+notes, journals, and meeting transcripts.
|
|
|
+
|
|
|
+### Changes
|
|
|
+
|
|
|
+- Search: SQLite FTS5 with BM25 ranking. Chose SQLite over Elasticsearch
|
|
|
+ because QMD is a personal tool — single binary, no server dependencies.
|
|
|
+- Search: sqlite-vec for vector similarity. Same rationale: in-process, no
|
|
|
+ external vector database.
|
|
|
+- Search: Reciprocal Rank Fusion to combine BM25 and vector results. RRF is
|
|
|
+ parameter-free and handles missing signals gracefully.
|
|
|
+- LLM: Ollama for embeddings, reranking, and query expansion. Later replaced
|
|
|
+ with node-llama-cpp in 0.6.0.
|
|
|
+- CLI: `qmd add`, `qmd embed`, `qmd search`, `qmd vsearch`, `qmd query`,
|
|
|
+ `qmd get`. ~1800 lines of TypeScript in a single `qmd.ts` file.
|
|
|
|
|
|
+[Unreleased]: https://github.com/tobi/qmd/compare/v1.0.0...HEAD
|
|
|
[1.0.0]: https://github.com/tobi/qmd/releases/tag/v1.0.0
|
|
|
-[0.9.0]: https://github.com/tobi/qmd/releases/tag/v0.9.0
|
|
|
+[0.9.0]: https://github.com/tobi/qmd/compare/v0.8.0...v0.9.0
|
|
|
|