před 4 měsíci · 09803a75b7
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -33,6 +33,23 @@ jobs:
 
				           node-version: 22
			
 
				           registry-url: https://registry.npmjs.org
			
 
				 
			
 
				+      - run: npm run build
			
 
				       - run: npm publish --provenance --access public
			
 
				         env:
			
 
				           NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
			
 
				+
			
 
				+      - name: Extract release notes
			
 
				+        id: notes
			
 
				+        run: |
			
 
				+          VERSION="${GITHUB_REF_NAME#v}"
			
 
				+          NOTES=$(./scripts/extract-changelog.sh "$VERSION")
			
 
				+          # Write to file for gh release (avoids quoting issues)
			
 
				+          echo "$NOTES" > /tmp/release-notes.md
			
 
				+
			
 
				+      - name: Create GitHub release
			
 
				+        run: |
			
 
				+          gh release create "$GITHUB_REF_NAME" \
			
 
				+            --title "$GITHUB_REF_NAME" \
			
 
				+            --notes-file /tmp/release-notes.md
			
 
				+        env:
			
 
				+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
			
--- a/.gitignore
+++ b/.gitignore
@@ -1,4 +1,5 @@
 
				 node_modules/
			
 
				+dist/
			
 
				 .npmrc
			
 
				 *.sqlite
			
 
				 .DS_Store
			
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,68 +1,275 @@
 
				 # Changelog
			
 
				 
			
 
				-All notable changes to QMD will be documented in this file.
			
 
				+## [Unreleased]
			
 
				 
			
 
				 ## [1.0.0] - 2026-02-15
			
 
				 
			
 
				-### Node.js Compatibility
			
 
				+QMD now runs on both Node.js and Bun, with up to 2.7x faster reranking
			
 
				+through parallel GPU contexts. GPU auto-detection replaces the unreliable
			
 
				+`gpu: "auto"` with explicit CUDA/Metal/Vulkan probing.
			
 
				 
			
 
				-QMD now runs on both **Node.js (>=22)** and **Bun**. Install with `npm install -g @tobilu/qmd` or `bun install -g @tobilu/qmd` — your choice. The `qmd` wrapper auto-detects Node.js via `tsx` and works out of the box with mise, asdf, nvm, and Homebrew installs.
			
 
				+### Changes
			
 
				 
			
 
				-### Performance
			
 
				+- Runtime: support Node.js (>=22) alongside Bun via a cross-runtime SQLite
			
 
				+  abstraction layer (`src/db.ts`). `bun:sqlite` on Bun, `better-sqlite3` on
			
 
				+  Node. The `qmd` wrapper auto-detects a suitable Node.js install via PATH,
			
 
				+  then falls back to mise, asdf, nvm, and Homebrew locations.
			
 
				+- Performance: parallel embedding & reranking via multiple LlamaContext
			
 
				+  instances — up to 2.7x faster on multi-core machines.
			
 
				+- Performance: flash attention for ~20% less VRAM per reranking context,
			
 
				+  enabling more parallel contexts on GPU.
			
 
				+- Performance: right-sized reranker context (40960 → 2048 tokens, 17x less
			
 
				+  memory) since chunks are capped at ~900 tokens.
			
 
				+- Performance: adaptive parallelism — context count computed from available
			
 
				+  VRAM (GPU) or CPU math cores rather than hardcoded.
			
 
				+- GPU: probe for CUDA, Metal, Vulkan explicitly at startup instead of
			
 
				+  relying on node-llama-cpp's `gpu: "auto"`. `qmd status` shows device info.
			
 
				+- Tests: reorganized into flat `test/` directory with vitest for Node.js and
			
 
				+  bun test for Bun. New `eval-bm25` and `store.helpers.unit` suites.
			
 
				 
			
 
				-- **Parallel embedding & reranking** — multiple contexts split work across CPU cores (or VRAM on GPU), delivering up to **2.7x faster reranking** and significantly faster embedding on multi-core machines
			
 
				-- **Flash attention** — ~20% less VRAM per reranking context, enabling more parallel contexts on GPU
			
 
				-- **Right-sized contexts** — reranker context dropped from 40960 to 2048 tokens (17x less memory), since chunks are capped at ~900 tokens
			
 
				-- **Adaptive parallelism** — automatically scales context count based on available VRAM (GPU) or CPU math cores
			
 
				-- **CPU thread splitting** — each context runs on its own cores for true parallelism instead of contending on a single context
			
 
				+### Fixes
			
 
				+
			
 
				+- Prevent VRAM waste from duplicate context creation during concurrent
			
 
				+  `embedBatch` calls — initialization lock now covers the full path.
			
 
				+- Collection-aware FTS filtering so scoped keyword search actually restricts
			
 
				+  results to the requested collection.
			
 
				 
			
 
				-### GPU Auto-Detection
			
 
				+## [0.9.0] - 2026-02-15
			
 
				 
			
 
				-- Probes for CUDA, Metal, and Vulkan at startup — uses the best available backend
			
 
				-- Falls back gracefully to CPU with a warning if GPU init fails
			
 
				-- `qmd status` now shows device info (GPU type, VRAM usage)
			
 
				+First published release on npm as `@tobilu/qmd`. MCP HTTP transport with
			
 
				+daemon mode cuts warm query latency from ~16s to ~10s by keeping models
			
 
				+loaded between requests.
			
 
				 
			
 
				-### Test Suite
			
 
				+### Changes
			
 
				 
			
 
				-- Tests split into `src/*.test.ts` (unit), `src/models/*.test.ts` (model), and `src/integration/*.test.ts` (CLI/integration)
			
 
				-- Vitest config for Node.js; bun test still works for Bun
			
 
				-- New `eval-bm25` and `store.helpers.unit` test suites
			
 
				+- MCP: HTTP transport with daemon lifecycle — `qmd mcp --http --daemon`
			
 
				+  starts a background server, `qmd mcp stop` shuts it down. Models stay warm
			
 
				+  in VRAM between queries. #149 (thanks @igrigorik)
			
 
				+- Search: type-routed query expansion preserves lex/vec/hyde type info and
			
 
				+  routes to the appropriate backend. Eliminates ~4 wasted backend calls per
			
 
				+  query (10.0 → 6.0 calls, 1278ms → 549ms). #149 (thanks @igrigorik)
			
 
				+- Search: unified pipeline — extracted `hybridQuery()` and
			
 
				+  `vectorSearchQuery()` to `store.ts` so CLI and MCP share identical logic.
			
 
				+  Fixes a class of bugs where results differed between the two. #149 (thanks
			
 
				+  @igrigorik)
			
 
				+- MCP: dynamic instructions generated at startup from actual index state —
			
 
				+  LLMs see collection names, doc counts, and content descriptions. #149
			
 
				+  (thanks @igrigorik)
			
 
				+- MCP: tool renames (vsearch → vector_search, query → deep_search) with
			
 
				+  rewritten descriptions for better tool selection. #149 (thanks @igrigorik)
			
 
				+- Integration: Claude Code plugin with inline status checks and MCP
			
 
				+  integration. #99 (thanks @galligan)
			
 
				 
			
 
				 ### Fixes
			
 
				 
			
 
				-- Prevent VRAM waste from duplicate context creation during concurrent loads
			
 
				-- Collection-aware FTS filtering for scoped keyword search
			
 
				+- BM25 score normalization — formula was inverted (`1/(1+|x|)` instead of
			
 
				+  `|x|/(1+|x|)`), so strong matches scored *lowest*. Broke `--min-score`
			
 
				+  filtering and made the "strong signal" short-circuit dead code. #76 (thanks
			
 
				+  @dgilperez)
			
 
				+- Normalize Unicode paths to NFC for macOS compatibility. #82 (thanks
			
 
				+  @c-stoeckl)
			
 
				+- Handle dense content (code) that tokenizes beyond expected chunk size.
			
 
				+- Proper cleanup of Metal GPU resources on process exit.
			
 
				+- SQLite-vec readiness verification after extension load.
			
 
				+- Reactivate deactivated documents on re-index instead of creating duplicates.
			
 
				+- Bun UTF-8 path corruption workaround for non-ASCII filenames.
			
 
				+- Disable following symlinks in glob.scan to avoid infinite loops.
			
 
				 
			
 
				----
			
 
				+## [0.8.0] - 2026-01-28
			
 
				 
			
 
				-## [0.9.0] - 2026-02-15
			
 
				+Fine-tuned query expansion model trained with GRPO replaces the stock Qwen3
			
 
				+0.6B. The training pipeline scores expansions on named entity preservation,
			
 
				+format compliance, and diversity — producing noticeably better lexical
			
 
				+variations and HyDE documents.
			
 
				+
			
 
				+### Changes
			
 
				+
			
 
				+- LLM: deploy GRPO-trained (Group Relative Policy Optimization) query
			
 
				+  expansion model, hosted on HuggingFace and auto-downloaded on first use.
			
 
				+  Better preservation of proper nouns and technical terms in expansions.
			
 
				+- LLM: `/only:lex` mode for single-type expansions — useful when you know
			
 
				+  which search backend will help.
			
 
				+- LLM: HyDE output moved to first position so vector search can start
			
 
				+  embedding while other expansions generate.
			
 
				+- LLM: session lifecycle management via `withLLMSession()` pattern — ensures
			
 
				+  cleanup even on failure, similar to database transactions.
			
 
				+- Integration: org-mode title extraction support. #50 (thanks @sh54)
			
 
				+- Integration: SQLite extension loading in Nix devshell. #48 (thanks @sh54)
			
 
				+- Integration: AI agent discovery via skills.sh. #64 (thanks @Algiras)
			
 
				+
			
 
				+### Fixes
			
 
				+
			
 
				+- Use sequential embedding on CPU-only systems — parallel contexts caused a
			
 
				+  race condition where contexts competed for CPU cores, making things slower.
			
 
				+  #54 (thanks @freeman-jiang)
			
 
				+- Fix `collectionName` column in vector search SQL (was still using old
			
 
				+  `collectionId` from before YAML migration). #61 (thanks @jdvmi00)
			
 
				+- Fix Qwen3 sampling params to prevent repetition loops — stock
			
 
				+  temperature/top-p caused occasional infinite repeat patterns.
			
 
				+- Add `--index` option to CLI argument parser (was documented but not wired
			
 
				+  up). #84 (thanks @Tritlo)
			
 
				+- Fix DisposedError during slow batch embedding. #41 (thanks @wuhup)
			
 
				 
			
 
				-Initial public release.
			
 
				+## [0.7.0] - 2026-01-09
			
 
				 
			
 
				-### Features
			
 
				+First community contributions. The project gained external contributors,
			
 
				+surfacing bugs that only appear in diverse environments — Homebrew sqlite-vec
			
 
				+paths, case-sensitive model filenames, and sqlite-vec JOIN incompatibilities.
			
 
				 
			
 
				-- **Hybrid search pipeline** — BM25 full-text + vector similarity + LLM reranking with Reciprocal Rank Fusion
			
 
				-- **Smart chunking** — scored markdown break points keep sections, paragraphs, and code blocks intact (~900 tokens/chunk, 15% overlap)
			
 
				-- **Query expansion** — fine-tuned Qwen3 1.7B model generates search variations for better recall
			
 
				-- **Cross-encoder reranking** — Qwen3-Reranker scores candidates with position-aware blending
			
 
				-- **Vector embeddings** — EmbeddingGemma 300M via node-llama-cpp, all on-device
			
 
				-- **MCP server** — stdio and HTTP transports for Claude Desktop, Claude Code, and any MCP client
			
 
				-- **Collection management** — index multiple directories with glob patterns
			
 
				-- **Context annotations** — add descriptions to collections and paths for richer search
			
 
				-- **Document IDs** — 6-char content hash for stable references across re-indexes
			
 
				-- **Multi-get** — retrieve multiple documents by glob pattern, comma list, or docids
			
 
				-- **Multiple output formats** — JSON, CSV, Markdown, XML, files list
			
 
				-- **Claude Code plugin** — inline status checks and MCP integration
			
 
				+### Changes
			
 
				+
			
 
				+- Indexing: native `realpathSync()` replaces `readlink -f` subprocess spawn
			
 
				+  per file. On a 5000-file collection this eliminates 5000 shell spawns,
			
 
				+  ~15% faster. #8 (thanks @burke)
			
 
				+- Indexing: single-pass tokenization — chunking algorithm tokenized each
			
 
				+  document twice (count then split); now tokenizes once and reuses. #9
			
 
				+  (thanks @burke)
			
 
				 
			
 
				 ### Fixes
			
 
				 
			
 
				-- Handle dense content (code) that tokenizes beyond expected chunk size
			
 
				-- Proper cleanup of Metal GPU resources
			
 
				-- SQLite-vec readiness verification after extension load
			
 
				-- Reactivate deactivated documents on re-index
			
 
				-- BM25 score normalization with Math.abs
			
 
				-- Bun UTF-8 path corruption workaround
			
 
				+- Fix `vsearch` and `query` hanging — sqlite-vec's virtual table doesn't
			
 
				+  support the JOIN pattern used; rewrote to subquery. #23 (thanks @mbrendan)
			
 
				+- Fix MCP server exiting immediately after startup — process had no active
			
 
				+  handles keeping the event loop alive. #29 (thanks @mostlydev)
			
 
				+- Fix collection filter SQL to properly restrict vector search results.
			
 
				+- Support non-ASCII filenames in collection filter.
			
 
				+- Skip empty files during indexing instead of crashing on zero-length content.
			
 
				+- Fix case sensitivity in Qwen3 model filename resolution. #15 (thanks
			
 
				+  @gavrix)
			
 
				+- Fix sqlite-vec loading on macOS with Homebrew (`BREW_PREFIX` detection).
			
 
				+  #42 (thanks @komsit37)
			
 
				+- Fix Nix flake to use correct `src/qmd.ts` path. #7 (thanks @burke)
			
 
				+- Fix docid lookup with quotes support in get command. #36 (thanks
			
 
				+  @JoshuaLelon)
			
 
				+- Fix query expansion model size in documentation. #38 (thanks @odysseus0)
			
 
				+
			
 
				+## [0.6.0] - 2025-12-28
			
 
				+
			
 
				+Replaced Ollama HTTP API with node-llama-cpp for all LLM operations. Ollama
			
 
				+adds convenience but also a running server dependency. node-llama-cpp loads
			
 
				+GGUF models directly in-process — zero external dependencies. Models
			
 
				+auto-download from HuggingFace on first use.
			
 
				+
			
 
				+### Changes
			
 
				+
			
 
				+- LLM: structured query expansion via JSON schema grammar constraints.
			
 
				+  Model produces typed expansions — **lexical** (BM25 keywords), **vector**
			
 
				+  (semantic rephrasings), **HyDE** (hypothetical document excerpts) — so each
			
 
				+  routes to the right backend instead of sending everything everywhere.
			
 
				+- LLM: lazy model loading with 2-minute inactivity auto-unload. Keeps memory
			
 
				+  low when idle while avoiding ~3s model load on every query.
			
 
				+- Search: conditional query expansion — when BM25 returns strong results, the
			
 
				+  expensive LLM expansion is skipped entirely.
			
 
				+- Search: multi-chunk reranking — documents with multiple relevant chunks
			
 
				+  scored by aggregating across all chunks rather than best single chunk.
			
 
				+- Search: cosine distance for vector search (was L2).
			
 
				+- Search: embeddinggemma nomic-style prompt formatting.
			
 
				+- Testing: evaluation harness with synthetic test documents and Hit@K metrics
			
 
				+  for BM25, vector, and hybrid RRF.
			
 
				+
			
 
				+## [0.5.0] - 2025-12-13
			
 
				+
			
 
				+Collections and contexts moved from SQLite tables to YAML at
			
 
				+`~/.config/qmd/index.yml`. SQLite was overkill for config — you can't share
			
 
				+it, and it's opaque. YAML is human-readable and version-controllable. The
			
 
				+migration was extensive (35+ commits) because every part of the system that
			
 
				+touched collections or contexts had to be updated.
			
 
				+
			
 
				+### Changes
			
 
				+
			
 
				+- Config: YAML-based collections and contexts replace SQLite tables.
			
 
				+  `collections` and `path_contexts` tables dropped from schema. Collections
			
 
				+  support an optional `update:` command (e.g., `git pull`) before re-index.
			
 
				+- CLI: `qmd collection add/list/remove/rename` commands with `--name` and
			
 
				+  `--mask` glob pattern support.
			
 
				+- CLI: `qmd ls` virtual file tree — list collections, files in a collection,
			
 
				+  or files under a path prefix.
			
 
				+- CLI: `qmd context add/list/check/rm` with hierarchical context inheritance.
			
 
				+  A query to `qmd://notes/2024/jan/` inherits context from `notes/`,
			
 
				+  `notes/2024/`, and `notes/2024/jan/`.
			
 
				+- CLI: `qmd context add / "text"` for global context across all collections.
			
 
				+- CLI: `qmd context check` audit command to find paths without context.
			
 
				+- Paths: `qmd://` virtual URI scheme for portable document references.
			
 
				+  `qmd://notes/ideas.md` works regardless of where the collection lives on
			
 
				+  disk. Works in `get`, `multi-get`, `ls`, and context commands.
			
 
				+- CLI: document IDs (docid) — first 6 chars of content hash for stable
			
 
				+  references. Shown as `#abc123` in search results, usable with `get` and
			
 
				+  `multi-get`.
			
 
				+- CLI: `--line-numbers` flag for get command output.
			
 
				+
			
 
				+## [0.4.0] - 2025-12-10
			
 
				+
			
 
				+MCP server for AI agent integration. Without it, agents had to shell out to
			
 
				+`qmd search` and parse CLI output. The monolithic `qmd.ts` (1840 lines) was
			
 
				+split into focused modules with the project's first test suite (215 tests).
			
 
				+
			
 
				+### Changes
			
 
				+
			
 
				+- MCP: stdio server with tools for search, vector search, hybrid query,
			
 
				+  document retrieval, and status. Runs over stdio transport for Claude
			
 
				+  Desktop and MCP clients.
			
 
				+- MCP: spec-compliant with June 2025 MCP specification — removed non-spec
			
 
				+  `mimeType`, added `isError: true` to errors, `structuredContent` for
			
 
				+  machine-readable results, proper URI encoding.
			
 
				+- MCP: simplified tool naming (`qmd_search` → `search`) since MCP already
			
 
				+  namespaces by server.
			
 
				+- Architecture: extract `store.ts` (1221 LOC), `llm.ts` (539 LOC),
			
 
				+  `formatter.ts` (359 LOC), `mcp.ts` (503 LOC) from monolithic `qmd.ts`.
			
 
				+- Testing: 215 tests (store: 96, llm: 60, mcp: 59) with mocked Ollama for
			
 
				+  fast, deterministic runs. Before this: zero tests.
			
 
				+
			
 
				+## [0.3.0] - 2025-12-08
			
 
				+
			
 
				+Document chunking for vector search. A 5000-word document about many topics
			
 
				+gets a single embedding that averages everything together, matching poorly for
			
 
				+specific queries. Chunking produces one embedding per ~900-token section with
			
 
				+focused semantic signal.
			
 
				+
			
 
				+### Changes
			
 
				+
			
 
				+- Search: markdown-aware chunking — prefers heading boundaries, then paragraph
			
 
				+  breaks, then sentence boundaries. 15% overlap between chunks ensures
			
 
				+  cross-boundary queries still match.
			
 
				+- Search: multi-chunk scoring bonus (+0.02 per additional chunk, capped at
			
 
				+  +0.1 for 5+ chunks). Documents relevant in multiple sections rank higher.
			
 
				+- CLI: display paths show collection-relative paths and extracted titles
			
 
				+  (from H1 headings or YAML frontmatter) instead of raw filesystem paths.
			
 
				+- CLI: `--all` flag returns all matches (use with `--min-score` to filter).
			
 
				+- CLI: byte-based progress bar with ETA for `embed` command.
			
 
				+- CLI: human-readable time formatting ("15m 4s" instead of "904.2s").
			
 
				+- CLI: documents >64KB truncated with warning during embedding.
			
 
				+
			
 
				+## [0.2.0] - 2025-12-08
			
 
				+
			
 
				+### Changes
			
 
				+
			
 
				+- CLI: `--json`, `--csv`, `--files`, `--md`, `--xml` output format flags.
			
 
				+  `--json` for programmatic access, `--files` for piping, `--md`/`--xml` for
			
 
				+  LLM consumption, `--csv` for spreadsheets.
			
 
				+- CLI: `qmd status` shows index health — document count, size, embedding
			
 
				+  coverage, time since last update.
			
 
				+- Search: weighted RRF — original query gets 2x weight relative to expanded
			
 
				+  queries since the user's actual words are a more reliable signal.
			
 
				+
			
 
				+## [0.1.0] - 2025-12-07
			
 
				+
			
 
				+Initial implementation. Built in a single day for searching personal markdown
			
 
				+notes, journals, and meeting transcripts.
			
 
				+
			
 
				+### Changes
			
 
				+
			
 
				+- Search: SQLite FTS5 with BM25 ranking. Chose SQLite over Elasticsearch
			
 
				+  because QMD is a personal tool — single binary, no server dependencies.
			
 
				+- Search: sqlite-vec for vector similarity. Same rationale: in-process, no
			
 
				+  external vector database.
			
 
				+- Search: Reciprocal Rank Fusion to combine BM25 and vector results. RRF is
			
 
				+  parameter-free and handles missing signals gracefully.
			
 
				+- LLM: Ollama for embeddings, reranking, and query expansion. Later replaced
			
 
				+  with node-llama-cpp in 0.6.0.
			
 
				+- CLI: `qmd add`, `qmd embed`, `qmd search`, `qmd vsearch`, `qmd query`,
			
 
				+  `qmd get`. ~1800 lines of TypeScript in a single `qmd.ts` file.
			
 
				 
			
 
				+[Unreleased]: https://github.com/tobi/qmd/compare/v1.0.0...HEAD
			
 
				 [1.0.0]: https://github.com/tobi/qmd/releases/tag/v1.0.0
			
 
				-[0.9.0]: https://github.com/tobi/qmd/releases/tag/v0.9.0
			
 
				+[0.9.0]: https://github.com/tobi/qmd/compare/v0.8.0...v0.9.0
			
 
				 
			
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -149,4 +149,17 @@ bun test --preload ./src/test-preload.ts test/
 
				 ## Do NOT compile
			
 
				 
			
 
				 - Never run `bun build --compile` - it overwrites the shell wrapper and breaks sqlite-vec
			
 
				-- The `qmd` file is a shell script that runs `bun src/qmd.ts` - do not replace it
			
 
				+- The `qmd` file is a shell script that runs compiled JS from `dist/` - do not replace it
			
 
				+- `npm run build` compiles TypeScript to `dist/` via `tsc -p tsconfig.build.json`
			
 
				+
			
 
				+## Releasing
			
 
				+
			
 
				+Use `/release <version>` to cut a release. Full changelog standards,
			
 
				+release workflow, and git hook setup are documented in the
			
 
				+[release skill](skills/release/SKILL.md).
			
 
				+
			
 
				+Key points:
			
 
				+- Add changelog entries under `## [Unreleased]` **as you make changes**
			
 
				+- The release script renames `[Unreleased]` → `[X.Y.Z] - date` at release time
			
 
				+- Credit external PRs with `#NNN (thanks @username)`
			
 
				+- GitHub releases roll up the full minor series (e.g. 1.2.0 through 1.2.3)
			
--- a/README.md
+++ b/README.md
@@ -6,6 +6,8 @@ QMD combines BM25 full-text search, vector semantic search, and LLM re-ranking
 
				 
			
 
				 ![QMD Architecture](assets/qmd-architecture.png)
			
 
				 
			
 
				+You can read more about QMD's progress in the [CHANGELOG](CHANGELOG.md).
			
 
				+
			
 
				 ## Quick Start
			
 
				 
			
 
				 ```sh
			
@@ -23,7 +25,7 @@ qmd collection add ~/notes --name notes
 
				 qmd collection add ~/Documents/meetings --name meetings
			
 
				 qmd collection add ~/work/docs --name docs
			
 
				 
			
 
				-# Add context to help with search results
			
 
				+# Add context to help with search results, each piece of context will be returned when matching sub documents are returned. This works as a tree. This is the key feature of QMD as it allows LLMs to make much better contextual choices when selecting documents. Don't sleep on it!
			
 
				 qmd context add qmd://notes "Personal notes and ideas"
			
 
				 qmd context add qmd://meetings "Meeting transcripts and notes"
			
 
				 qmd context add qmd://docs "Work documentation"
			
--- a/package.json
+++ b/package.json
@@ -7,14 +7,14 @@
 
				     "qmd": "qmd"
			
 
				   },
			
 
				   "files": [
			
 
				-    "src/**/*.ts",
			
 
				-    "!src/**/*.test.ts",
			
 
				-    "!src/test-preload.ts",
			
 
				+    "dist/",
			
 
				     "qmd",
			
 
				     "LICENSE",
			
 
				     "CHANGELOG.md"
			
 
				   ],
			
 
				   "scripts": {
			
 
				+    "prepare": "[ -d .git ] && ./scripts/install-hooks.sh || true",
			
 
				+    "build": "tsc -p tsconfig.build.json",
			
 
				     "test": "vitest run --reporter=verbose test/",
			
 
				     "qmd": "tsx src/qmd.ts",
			
 
				     "index": "tsx src/qmd.ts index",
			
--- a/qmd
+++ b/qmd
@@ -43,4 +43,4 @@ while [[ -L "$SOURCE" ]]; do
 
				 done
			
 
				 SCRIPT_DIR="$(cd -P "$(dirname "$SOURCE")" && pwd)"
			
 
				 
			
 
				-exec "$NODE" --import tsx "$SCRIPT_DIR/src/qmd.ts" "$@"
			
 
				+exec "$NODE" "$SCRIPT_DIR/dist/qmd.js" "$@"
			
--- a/scripts/extract-changelog.sh
+++ b/scripts/extract-changelog.sh
@@ -0,0 +1,78 @@
 
				+#!/usr/bin/env bash
			
 
				+set -euo pipefail
			
 
				+
			
 
				+# Extract cumulative release notes from CHANGELOG.md.
			
 
				+#
			
 
				+# For a given version (e.g. 1.0.5), extracts all entries from the current
			
 
				+# minor series back to x.x.0 (e.g. 1.0.0 through 1.0.5). This means each
			
 
				+# GitHub release restates the full arc of changes for the minor series.
			
 
				+#
			
 
				+# The [Unreleased] section is included — it contains the content that will
			
 
				+# become [X.Y.Z] when the release script runs. If the version is already
			
 
				+# released, [Unreleased] may be empty and is omitted.
			
 
				+#
			
 
				+# Fails if neither [Unreleased] nor [X.Y.Z] has content in the changelog.
			
 
				+#
			
 
				+# Usage: scripts/extract-changelog.sh <version>
			
 
				+# Example: scripts/extract-changelog.sh 1.0.5
			
 
				+#   -> extracts [Unreleased] + [1.0.5], [1.0.4], ..., [1.0.0]
			
 
				+
			
 
				+VERSION="${1:?Usage: extract-changelog.sh <version>}"
			
 
				+
			
 
				+# Parse major.minor.patch from version
			
 
				+IFS='.' read -r MAJOR MINOR PATCH <<< "$VERSION"
			
 
				+
			
 
				+if [[ ! -f CHANGELOG.md ]]; then
			
 
				+  echo "CHANGELOG.md not found" >&2
			
 
				+  exit 1
			
 
				+fi
			
 
				+
			
 
				+# Extract [Unreleased] section and all [X.Y.Z] sections matching our minor series.
			
 
				+OUTPUT=""
			
 
				+CAPTURING=false
			
 
				+UNRELEASED_CONTENT=""
			
 
				+IN_UNRELEASED=false
			
 
				+
			
 
				+while IFS= read -r line; do
			
 
				+  if [[ "$line" =~ ^##\ \[Unreleased\] ]]; then
			
 
				+    CAPTURING=true
			
 
				+    IN_UNRELEASED=true
			
 
				+  elif [[ "$line" =~ ^##\ \[([0-9]+\.[0-9]+\.[0-9]+)\] ]]; then
			
 
				+    IN_UNRELEASED=false
			
 
				+    ENTRY_VERSION="${BASH_REMATCH[1]}"
			
 
				+    IFS='.' read -r E_MAJOR E_MINOR E_PATCH <<< "$ENTRY_VERSION"
			
 
				+    if [[ "$E_MAJOR" == "$MAJOR" && "$E_MINOR" == "$MINOR" ]]; then
			
 
				+      CAPTURING=true
			
 
				+      OUTPUT+="$line"$'\n'
			
 
				+    else
			
 
				+      CAPTURING=false
			
 
				+    fi
			
 
				+  elif [[ "$line" =~ ^##\  ]]; then
			
 
				+    IN_UNRELEASED=false
			
 
				+    CAPTURING=false
			
 
				+  elif $CAPTURING; then
			
 
				+    if $IN_UNRELEASED; then
			
 
				+      UNRELEASED_CONTENT+="$line"$'\n'
			
 
				+    else
			
 
				+      OUTPUT+="$line"$'\n'
			
 
				+    fi
			
 
				+  fi
			
 
				+done < CHANGELOG.md
			
 
				+
			
 
				+# Only include [Unreleased] if it has non-blank content
			
 
				+TRIMMED=$(echo "$UNRELEASED_CONTENT" | sed '/^[[:space:]]*$/d')
			
 
				+if [[ -n "$TRIMMED" ]]; then
			
 
				+  OUTPUT="## [Unreleased]"$'\n'"$UNRELEASED_CONTENT$OUTPUT"
			
 
				+fi
			
 
				+
			
 
				+# Fail if we got nothing
			
 
				+TRIMMED_OUTPUT=$(echo "$OUTPUT" | sed '/^[[:space:]]*$/d')
			
 
				+if [[ -z "$TRIMMED_OUTPUT" ]]; then
			
 
				+  echo "error: no changelog content found for $VERSION" >&2
			
 
				+  echo "Expected either:" >&2
			
 
				+  echo "  ## [Unreleased]  (with content)" >&2
			
 
				+  echo "  ## [$VERSION] - YYYY-MM-DD" >&2
			
 
				+  exit 1
			
 
				+fi
			
 
				+
			
 
				+printf '%s' "$OUTPUT"
			
--- a/scripts/install-hooks.sh
+++ b/scripts/install-hooks.sh
@@ -0,0 +1,19 @@
 
				+#!/usr/bin/env bash
			
 
				+set -euo pipefail
			
 
				+
			
 
				+# Self-installing git hooks for qmd
			
 
				+# Called from package.json "prepare" script after bun install
			
 
				+
			
 
				+REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
			
 
				+HOOKS_DIR="$REPO_ROOT/.git/hooks"
			
 
				+
			
 
				+if [[ ! -d "$HOOKS_DIR" ]]; then
			
 
				+  echo "Not a git repository, skipping hook install"
			
 
				+  exit 0
			
 
				+fi
			
 
				+
			
 
				+# Install pre-push hook
			
 
				+cp "$REPO_ROOT/scripts/pre-push" "$HOOKS_DIR/pre-push"
			
 
				+chmod +x "$HOOKS_DIR/pre-push"
			
 
				+
			
 
				+echo "Installed git hooks: pre-push"
			
--- a/scripts/pre-push
+++ b/scripts/pre-push
@@ -0,0 +1,112 @@
 
				+#!/usr/bin/env bash
			
 
				+set -euo pipefail
			
 
				+
			
 
				+# Pre-push hook: validates v* tag pushes before they reach the remote.
			
 
				+#
			
 
				+# Checks:
			
 
				+#   1. package.json version matches the tag
			
 
				+#   2. CHANGELOG.md has a "## [{version}] - {date}" entry
			
 
				+#   3. CI passed upstream on GitHub for the tagged commit
			
 
				+#
			
 
				+# Installed automatically by: bun install (via prepare script)
			
 
				+
			
 
				+while read -r local_ref local_sha remote_ref remote_sha; do
			
 
				+  # Only validate v* tag pushes
			
 
				+  if [[ "$local_ref" != refs/tags/v* ]]; then
			
 
				+    continue
			
 
				+  fi
			
 
				+
			
 
				+  # Skip tag deletions
			
 
				+  if [[ "$local_sha" == "0000000000000000000000000000000000000000" ]]; then
			
 
				+    continue
			
 
				+  fi
			
 
				+
			
 
				+  TAG="${local_ref#refs/tags/}"
			
 
				+  VERSION="${TAG#v}"
			
 
				+
			
 
				+  echo "Validating release $TAG..."
			
 
				+
			
 
				+  # --- 1. package.json version must match the tag ---
			
 
				+  PKG_VERSION=$(jq -r .version package.json)
			
 
				+  if [[ "$PKG_VERSION" != "$VERSION" ]]; then
			
 
				+    echo ""
			
 
				+    echo "ABORT: package.json version is $PKG_VERSION but tag is $TAG"
			
 
				+    echo "Run: jq --arg v '$VERSION' '.version = \$v' package.json > tmp && mv tmp package.json"
			
 
				+    exit 1
			
 
				+  fi
			
 
				+  echo "  package.json version: $PKG_VERSION"
			
 
				+
			
 
				+  # --- 2. CHANGELOG.md must have an entry for this version ---
			
 
				+  if [[ ! -f CHANGELOG.md ]]; then
			
 
				+    echo ""
			
 
				+    echo "ABORT: CHANGELOG.md not found"
			
 
				+    exit 1
			
 
				+  fi
			
 
				+
			
 
				+  if ! grep -q "^## \[$VERSION\] - " CHANGELOG.md; then
			
 
				+    echo ""
			
 
				+    echo "ABORT: CHANGELOG.md has no entry for this release"
			
 
				+    echo "Expected heading: ## [$VERSION] - $(date +%Y-%m-%d)"
			
 
				+    echo ""
			
 
				+    echo "Write the changelog entry first. See CLAUDE.md for guidelines."
			
 
				+    exit 1
			
 
				+  fi
			
 
				+  echo "  CHANGELOG.md: entry found"
			
 
				+
			
 
				+  # --- 3. CI must have passed on GitHub for this commit ---
			
 
				+  COMMIT=$(git rev-parse "$TAG" 2>/dev/null || git rev-parse HEAD)
			
 
				+
			
 
				+  if ! command -v gh &>/dev/null; then
			
 
				+    echo "  CI check: skipped (gh CLI not installed)"
			
 
				+    continue
			
 
				+  fi
			
 
				+
			
 
				+  # Check GitHub Actions check runs
			
 
				+  CHECK_JSON=$(gh api "repos/{owner}/{repo}/commits/$COMMIT/check-runs" 2>/dev/null || echo "")
			
 
				+
			
 
				+  if [[ -z "$CHECK_JSON" ]]; then
			
 
				+    echo "  CI check: skipped (could not reach GitHub API)"
			
 
				+    continue
			
 
				+  fi
			
 
				+
			
 
				+  TOTAL=$(echo "$CHECK_JSON" | jq -r '.total_count // 0' 2>/dev/null || echo "0")
			
 
				+
			
 
				+  if [[ "$TOTAL" -eq 0 ]] 2>/dev/null; then
			
 
				+    # No checks found — commit may not be pushed yet
			
 
				+    MAIN_SHA=$(git rev-parse origin/main 2>/dev/null || echo "")
			
 
				+    if [[ "$COMMIT" != "$MAIN_SHA" ]]; then
			
 
				+      echo ""
			
 
				+      echo "WARNING: No CI runs found for $COMMIT and it's not the tip of origin/main."
			
 
				+      echo "Push the commit to main first and wait for CI to pass."
			
 
				+      read -p "Push tag anyway? [y/N] " -n 1 -r </dev/tty
			
 
				+      echo ""
			
 
				+      [[ $REPLY =~ ^[Yy]$ ]] || exit 1
			
 
				+    else
			
 
				+      echo "  CI check: no runs found (commit is on main)"
			
 
				+    fi
			
 
				+  else
			
 
				+    FAILED=$(echo "$CHECK_JSON" | jq '[.check_runs // [] | .[] | select(.conclusion == "failure")] | length' 2>/dev/null || echo "0")
			
 
				+    PENDING=$(echo "$CHECK_JSON" | jq '[.check_runs // [] | .[] | select(.status != "completed")] | length' 2>/dev/null || echo "0")
			
 
				+
			
 
				+    if [[ "$FAILED" -gt 0 ]] 2>/dev/null; then
			
 
				+      echo ""
			
 
				+      echo "ABORT: CI failed for commit $COMMIT"
			
 
				+      echo "Check: https://github.com/tobi/qmd/commit/$COMMIT"
			
 
				+      exit 1
			
 
				+    fi
			
 
				+
			
 
				+    if [[ "$PENDING" -gt 0 ]] 2>/dev/null; then
			
 
				+      echo ""
			
 
				+      echo "WARNING: CI still running for commit $COMMIT ($PENDING pending)"
			
 
				+      read -p "Push tag anyway? [y/N] " -n 1 -r </dev/tty
			
 
				+      echo ""
			
 
				+      [[ $REPLY =~ ^[Yy]$ ]] || exit 1
			
 
				+    else
			
 
				+      echo "  CI check: all passed"
			
 
				+    fi
			
 
				+  fi
			
 
				+
			
 
				+  echo "All checks passed for $TAG"
			
 
				+done
			
 
				+
			
 
				+exit 0
			
--- a/scripts/release.sh
+++ b/scripts/release.sh
@@ -2,6 +2,11 @@
 
				 set -euo pipefail
			
 
				 
			
 
				 # QMD Release Script
			
 
				+#
			
 
				+# Renames the [Unreleased] section in CHANGELOG.md to the new version,
			
 
				+# bumps package.json, commits, and creates a tag. The actual publish
			
 
				+# happens via GitHub Actions when the tag is pushed.
			
 
				+#
			
 
				 # Usage: ./scripts/release.sh [patch|minor|major|<version>]
			
 
				 # Examples:
			
 
				 #   ./scripts/release.sh patch     # 0.9.0 -> 0.9.1
			
@@ -41,74 +46,60 @@ bump_version() {
 
				 }
			
 
				 
			
 
				 NEW=$(bump_version "$CURRENT" "$BUMP")
			
 
				+DATE=$(date +%Y-%m-%d)
			
 
				 echo "New version:     $NEW"
			
 
				 echo ""
			
 
				 
			
 
				-# Confirm
			
 
				-read -p "Release v$NEW? [y/N] " -n 1 -r
			
 
				-echo ""
			
 
				-[[ $REPLY =~ ^[Yy]$ ]] || { echo "Aborted."; exit 1; }
			
 
				+# --- Validate CHANGELOG.md ---
			
 
				 
			
 
				-# Gather commits since last tag (or all if no tags)
			
 
				-LAST_TAG=$(git describe --tags --abbrev=0 2>/dev/null || echo "")
			
 
				-if [[ -n "$LAST_TAG" ]]; then
			
 
				-  RANGE="$LAST_TAG..HEAD"
			
 
				-else
			
 
				-  RANGE="HEAD"
			
 
				+if [[ ! -f CHANGELOG.md ]]; then
			
 
				+  echo "Error: CHANGELOG.md not found" >&2
			
 
				+  exit 1
			
 
				 fi
			
 
				 
			
 
				-echo ""
			
 
				-echo "Commits since ${LAST_TAG:-beginning}:"
			
 
				-git log "$RANGE" --oneline --no-decorate
			
 
				-echo ""
			
 
				-
			
 
				-# Generate changelog entry
			
 
				-DATE=$(date +%Y-%m-%d)
			
 
				-ENTRY="## [$NEW] - $DATE"$'\n'$'\n'
			
 
				-
			
 
				-# Collect conventional commits
			
 
				-FEATS=$(git log "$RANGE" --oneline --no-decorate --grep="^feat" | sed 's/^[a-f0-9]* feat[:(]/- /' | sed 's/)$//' || true)
			
 
				-FIXES=$(git log "$RANGE" --oneline --no-decorate --grep="^fix" | sed 's/^[a-f0-9]* fix[:(]/- /' | sed 's/)$//' || true)
			
 
				-OTHER=$(git log "$RANGE" --oneline --no-decorate --grep="^feat" --grep="^fix" --grep="^docs" --grep="^chore" --grep="^refactor" --invert-grep | sed 's/^[a-f0-9]* /- /' || true)
			
 
				-
			
 
				-if [[ -n "$FEATS" ]]; then
			
 
				-  ENTRY+="### Features"$'\n'$'\n'"$FEATS"$'\n'$'\n'
			
 
				-fi
			
 
				-if [[ -n "$FIXES" ]]; then
			
 
				-  ENTRY+="### Fixes"$'\n'$'\n'"$FIXES"$'\n'$'\n'
			
 
				-fi
			
 
				-if [[ -n "$OTHER" ]]; then
			
 
				-  ENTRY+="### Other"$'\n'$'\n'"$OTHER"$'\n'$'\n'
			
 
				+# The [Unreleased] section must have content
			
 
				+if ! grep -q "^## \[Unreleased\]" CHANGELOG.md; then
			
 
				+  echo "Error: no [Unreleased] section in CHANGELOG.md" >&2
			
 
				+  echo "" >&2
			
 
				+  echo "Add your changes under an [Unreleased] heading first:" >&2
			
 
				+  echo "" >&2
			
 
				+  echo "  ## [Unreleased]" >&2
			
 
				+  echo "" >&2
			
 
				+  echo "  ### Changes" >&2
			
 
				+  echo "  - Your change here" >&2
			
 
				+  exit 1
			
 
				 fi
			
 
				 
			
 
				-# Add link reference
			
 
				-LINK="[$NEW]: https://github.com/tobi/qmd/compare/v$CURRENT...v$NEW"
			
 
				+# --- Preview release notes ---
			
 
				 
			
 
				-# Show what will be added
			
 
				-echo "--- Changelog entry ---"
			
 
				-echo "$ENTRY"
			
 
				-echo "$LINK"
			
 
				+echo "--- Release notes (will appear on GitHub) ---"
			
 
				+./scripts/extract-changelog.sh "$NEW"
			
 
				 echo "--- End ---"
			
 
				 echo ""
			
 
				-read -p "Looks good? [y/N] " -n 1 -r
			
 
				+
			
 
				+# --- Confirm ---
			
 
				+
			
 
				+read -p "Release v$NEW? [y/N] " -n 1 -r
			
 
				 echo ""
			
 
				 [[ $REPLY =~ ^[Yy]$ ]] || { echo "Aborted."; exit 1; }
			
 
				 
			
 
				-# Update package.json version
			
 
				-jq --arg v "$NEW" '.version = $v' package.json > package.json.tmp && mv package.json.tmp package.json
			
 
				+# --- Rename [Unreleased] -> [X.Y.Z] - date, add fresh [Unreleased] ---
			
 
				 
			
 
				-# Prepend changelog entry (after the header line)
			
 
				-if [[ -f CHANGELOG.md ]]; then
			
 
				-  # Insert after "# Changelog" header and any blank lines
			
 
				-  awk -v entry="$ENTRY$LINK" '
			
 
				-    /^# Changelog/ { print; getline; print; print ""; print entry; print ""; next }
			
 
				-    { print }
			
 
				-  ' CHANGELOG.md > CHANGELOG.md.tmp && mv CHANGELOG.md.tmp CHANGELOG.md
			
 
				-else
			
 
				-  echo "# Changelog"$'\n'$'\n'"$ENTRY$LINK" > CHANGELOG.md
			
 
				-fi
			
 
				+sed -i '' "s/^## \[Unreleased\].*/## [$NEW] - $DATE/" CHANGELOG.md
			
 
				+
			
 
				+# Insert a new empty [Unreleased] section after the header
			
 
				+awk '
			
 
				+  /^## \['"$NEW"'\]/ && !done {
			
 
				+    print "## [Unreleased]\n"
			
 
				+    done = 1
			
 
				+  }
			
 
				+  { print }
			
 
				+' CHANGELOG.md > CHANGELOG.md.tmp && mv CHANGELOG.md.tmp CHANGELOG.md
			
 
				+
			
 
				+# --- Bump version and commit ---
			
 
				+
			
 
				+jq --arg v "$NEW" '.version = $v' package.json > package.json.tmp && mv package.json.tmp package.json
			
 
				 
			
 
				-# Commit and tag
			
 
				 git add package.json CHANGELOG.md
			
 
				 git commit -m "release: v$NEW"
			
 
				 git tag -a "v$NEW" -m "v$NEW"
			
@@ -116,9 +107,6 @@ git tag -a "v$NEW" -m "v$NEW"
 
				 echo ""
			
 
				 echo "Created commit and tag v$NEW"
			
 
				 echo ""
			
 
				-echo "Next steps:"
			
 
				-echo "  git push origin main --tags   # push to GitHub"
			
 
				-echo "  npm publish                   # publish to npm"
			
 
				+echo "Next: push to trigger the publish workflow"
			
 
				 echo ""
			
 
				-echo "Or both at once:"
			
 
				-echo "  git push origin main --tags && npm publish"
			
 
				+echo "  git push origin main --tags"
			
--- a/skills/release/SKILL.md
+++ b/skills/release/SKILL.md
@@ -0,0 +1,114 @@
 
				+---
			
 
				+name: release
			
 
				+description: Manage releases for this project. Validates changelog, installs git hooks, and cuts releases. Use when user says "/release", "release 1.0.5", "cut a release", or asks about the release process. NOT auto-invoked by the model.
			
 
				+disable-model-invocation: true
			
 
				+---
			
 
				+
			
 
				+# Release
			
 
				+
			
 
				+Cut a release, validate the changelog, and ensure git hooks are installed.
			
 
				+
			
 
				+## Usage
			
 
				+
			
 
				+`/release 1.0.5` or `/release patch` (bumps patch from current version).
			
 
				+
			
 
				+## Process
			
 
				+
			
 
				+When the user triggers `/release <version>`:
			
 
				+
			
 
				+1. **Install hooks** — run `scripts/install-hooks.sh` (idempotent)
			
 
				+2. **Validate changelog** — confirm `## [Unreleased]` has content
			
 
				+3. **Preview** — show the user what will be released (unreleased content + minor series rollup via `scripts/extract-changelog.sh`)
			
 
				+4. **Ask for confirmation** — do NOT proceed without explicit user approval
			
 
				+5. **Run `scripts/release.sh <version>`** — renames `[Unreleased]`, bumps version, commits, tags
			
 
				+6. **Remind** — tell the user to `git push origin main --tags`
			
 
				+
			
 
				+If any step fails, stop and explain. Never force-push or skip validation.
			
 
				+
			
 
				+## Changelog Standard
			
 
				+
			
 
				+The changelog lives in `CHANGELOG.md` and follows [Keep a Changelog](https://keepachangelog.com/) conventions.
			
 
				+
			
 
				+### Heading format
			
 
				+
			
 
				+- `## [Unreleased]` — accumulates entries between releases
			
 
				+- `## [X.Y.Z] - YYYY-MM-DD` — released versions
			
 
				+
			
 
				+The release script renames `[Unreleased]` → `[X.Y.Z] - date` and inserts a
			
 
				+fresh empty `[Unreleased]` section automatically.
			
 
				+
			
 
				+### Structure of a release entry
			
 
				+
			
 
				+Each version entry has two parts:
			
 
				+
			
 
				+**1. Highlights (optional, 1-4 sentences of prose)**
			
 
				+
			
 
				+Immediately after the version heading, before any `###` section. This is the
			
 
				+elevator pitch — what would you tell someone in 30 seconds? Only include for
			
 
				+releases with significant changes. Skip for small patches.
			
 
				+
			
 
				+```markdown
			
 
				+## [1.1.0] - 2026-03-01
			
 
				+
			
 
				+QMD now runs on both Node.js and Bun, with up to 2.7x faster reranking
			
 
				+through parallel contexts. GPU auto-detection replaces the unreliable
			
 
				+`gpu: "auto"` with explicit CUDA/Metal/Vulkan probing.
			
 
				+```
			
 
				+
			
 
				+**2. Detailed changelog (`### Changes` and `### Fixes`)**
			
 
				+
			
 
				+```markdown
			
 
				+### Changes
			
 
				+
			
 
				+- Runtime: support Node.js (>=22) alongside Bun. The `qmd` wrapper
			
 
				+  auto-detects a suitable install via PATH. #149 (thanks @igrigorik)
			
 
				+- Performance: parallel embedding & reranking — up to 2.7x faster on
			
 
				+  multi-core machines.
			
 
				+
			
 
				+### Fixes
			
 
				+
			
 
				+- Prevent VRAM waste from duplicate context creation during concurrent
			
 
				+  `embedBatch` calls. #152 (thanks @jkrems)
			
 
				+```
			
 
				+
			
 
				+### Writing guidelines
			
 
				+
			
 
				+- **Explain the why, not just the what.** The changelog is for users.
			
 
				+- **Include numbers.** "2.7x faster", "17x less memory".
			
 
				+- **Group by theme, not by file.** "Performance" not "Changes to llm.ts".
			
 
				+- **Don't list every commit.** Aggregate related changes.
			
 
				+- **Credit contributors:** end bullets with `#NNN (thanks @username)` for
			
 
				+  external PRs. No need to credit the repo owner.
			
 
				+
			
 
				+### What not to include
			
 
				+
			
 
				+- Internal refactors with no user-visible effect
			
 
				+- Dependency bumps (unless fixing a user-facing bug)
			
 
				+- CI/tooling changes (unless affecting the release artifact)
			
 
				+- Test additions (unless validating a fix worth mentioning)
			
 
				+
			
 
				+## GitHub Release Notes
			
 
				+
			
 
				+Each GitHub release includes the full changelog for the **minor series** back
			
 
				+to x.x.0. Releasing v1.2.3 includes entries for 1.2.3, 1.2.2, 1.2.1, and
			
 
				+1.2.0. The `scripts/extract-changelog.sh` script handles this, and the
			
 
				+publish workflow (`publish.yml`) calls it to populate the GitHub release.
			
 
				+
			
 
				+## Git Hooks
			
 
				+
			
 
				+The pre-push hook (`scripts/pre-push`) blocks `v*` tag pushes unless:
			
 
				+
			
 
				+1. `package.json` version matches the tag
			
 
				+2. `CHANGELOG.md` has a `## [X.Y.Z] - date` entry for the version
			
 
				+3. CI passed on GitHub for the tagged commit
			
 
				+
			
 
				+Run `skills/release/scripts/install-hooks.sh` to install (also runs
			
 
				+automatically via `bun install` prepare script).
			
 
				+
			
 
				+## Scripts
			
 
				+
			
 
				+- [`scripts/install-hooks.sh`](scripts/install-hooks.sh) — install/update git hooks
			
 
				+- Project scripts used during release:
			
 
				+  - `scripts/release.sh` — rename [Unreleased], bump version, commit, tag
			
 
				+  - `scripts/extract-changelog.sh` — extract minor series notes for GitHub release
			
 
				+  - `scripts/pre-push` — pre-push validation hook
			
--- a/skills/release/scripts/install-hooks.sh
+++ b/skills/release/scripts/install-hooks.sh
@@ -0,0 +1,38 @@
 
				+#!/usr/bin/env bash
			
 
				+set -euo pipefail
			
 
				+
			
 
				+# Install git hooks for release validation.
			
 
				+# Idempotent — safe to run multiple times.
			
 
				+
			
 
				+REPO_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
			
 
				+if [[ -z "$REPO_ROOT" ]]; then
			
 
				+  echo "Error: not in a git repository" >&2
			
 
				+  exit 1
			
 
				+fi
			
 
				+
			
 
				+HOOKS_DIR="$REPO_ROOT/.git/hooks"
			
 
				+SOURCE="$REPO_ROOT/scripts/pre-push"
			
 
				+
			
 
				+if [[ ! -f "$SOURCE" ]]; then
			
 
				+  echo "Error: scripts/pre-push not found at $SOURCE" >&2
			
 
				+  exit 1
			
 
				+fi
			
 
				+
			
 
				+# Install pre-push hook
			
 
				+if [[ -L "$HOOKS_DIR/pre-push" ]] && [[ "$(readlink "$HOOKS_DIR/pre-push")" == "$SOURCE" ]]; then
			
 
				+  echo "pre-push hook: already installed (symlink)"
			
 
				+elif [[ -f "$HOOKS_DIR/pre-push" ]]; then
			
 
				+  # Existing hook that isn't our symlink — back it up
			
 
				+  BACKUP="$HOOKS_DIR/pre-push.backup.$(date +%s)"
			
 
				+  echo "pre-push hook: backing up existing hook to $(basename "$BACKUP")"
			
 
				+  mv "$HOOKS_DIR/pre-push" "$BACKUP"
			
 
				+  ln -sf "$SOURCE" "$HOOKS_DIR/pre-push"
			
 
				+  echo "pre-push hook: installed (symlink → scripts/pre-push)"
			
 
				+else
			
 
				+  ln -sf "$SOURCE" "$HOOKS_DIR/pre-push"
			
 
				+  echo "pre-push hook: installed (symlink → scripts/pre-push)"
			
 
				+fi
			
 
				+
			
 
				+# Ensure the source is executable
			
 
				+chmod +x "$SOURCE"
			
 
				+echo "Done."
			
--- a/src/llm.ts
+++ b/src/llm.ts
@@ -494,6 +494,7 @@ export class LlamaCpp implements LLM {
 
				       // Detect available GPU types and use the best one.
			
 
				       // We can't rely on gpu:"auto" — it returns false even when CUDA is available
			
 
				       // (likely a binary/build config issue in node-llama-cpp).
			
 
				+      // @ts-expect-error node-llama-cpp API compat
			
 
				       const gpuTypes = await getLlamaGpuTypes();
			
 
				       // Prefer CUDA > Metal > Vulkan > CPU
			
 
				       const preferred = (["cuda", "metal", "vulkan"] as const).find(g => gpuTypes.includes(g));
			
@@ -733,7 +734,7 @@ export class LlamaCpp implements LLM {
 
				             contextSize: LlamaCpp.RERANK_CONTEXT_SIZE,
			
 
				             flashAttention: true,
			
 
				             ...(threads > 0 ? { threads } : {}),
			
 
				-          }));
			
 
				+          } as any));
			
 
				         } catch {
			
 
				           if (this.rerankContexts.length === 0) {
			
 
				             // Flash attention might not be supported — retry without it
			
@@ -828,7 +829,7 @@ export class LlamaCpp implements LLM {
 
				       if (n === 1) {
			
 
				         // Single context: sequential (no point splitting)
			
 
				         const context = contexts[0]!;
			
 
				-        const embeddings = [];
			
 
				+        const embeddings: ({ embedding: number[]; model: string } | null)[] = [];
			
 
				         for (const text of texts) {
			
 
				           try {
			
 
				             const embedding = await context.getEmbeddingFor(text);
			
--- a/src/mcp.ts
+++ b/src/mcp.ts
@@ -682,6 +682,6 @@ export async function startMcpHttpServer(port: number, options?: { quiet?: boole
 
				 }
			
 
				 
			
 
				 // Run if this is the main module
			
 
				-if (fileURLToPath(import.meta.url) === process.argv[1] || process.argv[1]?.endsWith("/mcp.ts")) {
			
 
				+if (fileURLToPath(import.meta.url) === process.argv[1] || process.argv[1]?.endsWith("/mcp.ts") || process.argv[1]?.endsWith("/mcp.js")) {
			
 
				   startMcpServer().catch(console.error);
			
 
				 }
			
--- a/src/qmd.ts
+++ b/src/qmd.ts
@@ -2207,7 +2207,7 @@ async function showVersion(): Promise<void> {
 
				 }
			
 
				 
			
 
				 // Main CLI - only run if this is the main module
			
 
				-if (fileURLToPath(import.meta.url) === process.argv[1] || process.argv[1]?.endsWith("/qmd.ts")) {
			
 
				+if (fileURLToPath(import.meta.url) === process.argv[1] || process.argv[1]?.endsWith("/qmd.ts") || process.argv[1]?.endsWith("/qmd.js")) {
			
 
				   const cli = parseCLI();
			
 
				 
			
 
				   if (cli.values.version) {
			
@@ -2489,8 +2489,11 @@ if (fileURLToPath(import.meta.url) === process.argv[1] || process.argv[1]?.endsW
 
				           mkdirSync(cacheDir, { recursive: true });
			
 
				           const logPath = resolve(cacheDir, "mcp.log");
			
 
				           const logFd = openSync(logPath, "w"); // truncate — fresh log per daemon run
			
 
				-          const tsxLoader = pathJoin(dirname(fileURLToPath(import.meta.url)), "..", "node_modules", "tsx", "dist", "esm", "index.mjs");
			
 
				-          const child = nodeSpawn(process.execPath, ["--import", tsxLoader, fileURLToPath(import.meta.url), "mcp", "--http", "--port", String(port)], {
			
 
				+          const selfPath = fileURLToPath(import.meta.url);
			
 
				+          const spawnArgs = selfPath.endsWith(".ts")
			
 
				+            ? ["--import", pathJoin(dirname(selfPath), "..", "node_modules", "tsx", "dist", "esm", "index.mjs"), selfPath, "mcp", "--http", "--port", String(port)]
			
 
				+            : [selfPath, "mcp", "--http", "--port", String(port)];
			
 
				+          const child = nodeSpawn(process.execPath, spawnArgs, {
			
 
				             stdio: ["ignore", logFd, logFd],
			
 
				             detached: true,
			
 
				           });
			
--- a/src/store.ts
+++ b/src/store.ts
@@ -23,7 +23,7 @@ import {
 
				   formatDocForEmbedding,
			
 
				   type RerankDocument,
			
 
				   type ILLMSession,
			
 
				-} from "./llm";
			
 
				+} from "./llm.js";
			
 
				 import {
			
 
				   findContextForPath as collectionsFindContextForPath,
			
 
				   addContext as collectionsAddContext,
			
@@ -37,7 +37,7 @@ import {
 
				   setGlobalContext,
			
 
				   loadConfig as collectionsLoadConfig,
			
 
				   type NamedCollection,
			
 
				-} from "./collections";
			
 
				+} from "./collections.js";
			
 
				 
			
 
				 // =============================================================================
			
 
				 // Configuration
			
--- a/tsconfig.build.json
+++ b/tsconfig.build.json
@@ -0,0 +1,11 @@
 
				+{
			
 
				+  "extends": "./tsconfig.json",
			
 
				+  "compilerOptions": {
			
 
				+    "noEmit": false,
			
 
				+    "outDir": "dist",
			
 
				+    "declaration": true,
			
 
				+    "noImplicitAny": false
			
 
				+  },
			
 
				+  "include": ["src/**/*.ts"],
			
 
				+  "exclude": ["src/**/*.test.ts", "src/test-preload.ts", "src/bench-*.ts"]
			
 
				+}