Internal fork of @tobilu/qmd — local hybrid search (BM25 + vector). Upstream: github.com/tobi/qmd. Vendored into /srv/vendor/qmd, consumed by Oivo CLI via file: dep as @oivo/qmd.

Tobi Lutke e963555ff8 Add status command, fix collections, improve CLI output il y a 5 mois
.gitignore 39193ea252 Initial commit: QMD - Quick Markdown Search il y a 5 mois
CLAUDE.md e963555ff8 Add status command, fix collections, improve CLI output il y a 5 mois
README.md 39193ea252 Initial commit: QMD - Quick Markdown Search il y a 5 mois
bun.lock 39193ea252 Initial commit: QMD - Quick Markdown Search il y a 5 mois
package.json 39193ea252 Initial commit: QMD - Quick Markdown Search il y a 5 mois
qmd 39193ea252 Initial commit: QMD - Quick Markdown Search il y a 5 mois
qmd.ts e963555ff8 Add status command, fix collections, improve CLI output il y a 5 mois
tsconfig.json 39193ea252 Initial commit: QMD - Quick Markdown Search il y a 5 mois

README.md

QMD - Quick Markdown Search

A CLI tool for searching markdown knowledge bases using hybrid retrieval: combining BM25 full-text search, vector semantic search, and LLM re-ranking.

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              QMD Search Pipeline                            │
└─────────────────────────────────────────────────────────────────────────────┘

                              ┌─────────────────┐
                              │   User Query    │
                              └────────┬────────┘
                                       │
                        ┌──────────────┴──────────────┐
                        ▼                             ▼
               ┌────────────────┐            ┌────────────────┐
               │ Query Expansion│            │  Direct Query  │
               │  (qwen3:0.6b)  │            │    (×2 weight) │
               └───────┬────────┘            └───────┬────────┘
                       │                             │
                       │ 1 alternative query         │
                       └──────────────┬──────────────┘
                                      │
                    ┌─────────────────┼─────────────────┐
                    ▼                 ▼                 ▼
           ┌────────────────┐ ┌────────────────┐ ┌────────────────┐
           │   FTS Search   │ │   FTS Search   │ │   FTS Search   │
           │    (BM25)      │ │    (BM25)      │ │    (BM25)      │
           └───────┬────────┘ └───────┬────────┘ └───────┬────────┘
                   │                  │                  │
           ┌───────┴────────┐ ┌───────┴────────┐ ┌───────┴────────┐
           │ Vector Search  │ │ Vector Search  │ │ Vector Search  │
           │(embeddinggemma)│ │(embeddinggemma)│ │(embeddinggemma)│
           └───────┬────────┘ └───────┬────────┘ └───────┬────────┘
                   │                  │                  │
                   └──────────────────┼──────────────────┘
                                      │
                                      ▼
                          ┌───────────────────────┐
                          │   RRF Fusion + Bonus  │
                          │  (Top-rank preserved) │
                          │     Top 30 Kept       │
                          └───────────┬───────────┘
                                      │
                                      ▼
                          ┌───────────────────────┐
                          │    LLM Re-ranking     │
                          │  (qwen3-reranker)     │
                          │  Yes/No + logprobs    │
                          └───────────┬───────────┘
                                      │
                                      ▼
                          ┌───────────────────────┐
                          │  Position-Aware Blend │
                          │  (RRF + Reranker)     │
                          └───────────────────────┘

Score Normalization & Fusion

Search Backends

Backend Raw Score Conversion Range
FTS (BM25) SQLite FTS5 BM25 Math.abs(score) 0 to ~25+
Vector Cosine distance 1 / (1 + distance) 0.0 to 1.0
Reranker LLM 0-10 rating score / 10 0.0 to 1.0

Fusion Strategy

The query command uses Reciprocal Rank Fusion (RRF) with position-aware blending:

  1. Query Expansion: Original query (×2 for weighting) + 1 LLM variation
  2. Parallel Retrieval: Each query searches both FTS and vector indexes
  3. RRF Fusion: Combine all result lists using score = Σ(1/(k+rank+1)) where k=60
  4. Top-Rank Bonus: Documents ranking #1 in any list get +0.05, #2-3 get +0.02
  5. Top-K Selection: Take top 30 candidates for reranking
  6. Re-ranking: LLM scores each document (yes/no with logprobs confidence)
  7. Position-Aware Blending:
    • RRF rank 1-3: 75% retrieval, 25% reranker (preserves exact matches)
    • RRF rank 4-10: 60% retrieval, 40% reranker
    • RRF rank 11+: 40% retrieval, 60% reranker (trust reranker more)

Why this approach: Pure RRF can dilute exact matches when expanded queries don't match. The top-rank bonus preserves documents that score #1 for the original query. Position-aware blending prevents the reranker from destroying high-confidence retrieval results.

Score Interpretation

Score Meaning
0.8 - 1.0 Highly relevant
0.5 - 0.8 Moderately relevant
0.2 - 0.5 Somewhat relevant
0.0 - 0.2 Low relevance

Requirements

System Requirements

  • Bun >= 1.0.0
  • macOS: Homebrew SQLite (for extension support)

    brew install sqlite
    
  • Ollama running locally (default: http://localhost:11434)

Ollama Models

QMD uses three models (auto-pulled if missing):

Model Purpose Size
embeddinggemma Vector embeddings ~1.6GB
ExpedientFalcon/qwen3-reranker:0.6b-q8_0 Re-ranking (trained) ~640MB
qwen3:0.6b Query expansion ~400MB
# Pre-pull models (optional)
ollama pull embeddinggemma
ollama pull ExpedientFalcon/qwen3-reranker:0.6b-q8_0
ollama pull qwen3:0.6b

Installation

bun install

Usage

Index Markdown Files

# Index all .md files in current directory
qmd index

# Index with custom glob pattern
qmd index "**/*.md"

# Index specific directory
qmd index "docs/**/*.md"

Generate Vector Embeddings

# Embed all indexed documents
qmd embed

Search Commands

┌──────────────────────────────────────────────────────────────────┐
│                        Search Modes                              │
├──────────┬───────────────────────────────────────────────────────┤
│ search   │ BM25 full-text search only                           │
│ vsearch  │ Vector semantic search only                          │
│ query    │ Hybrid: FTS + Vector + Query Expansion + Re-ranking  │
└──────────┴───────────────────────────────────────────────────────┘
# Full-text search (fast, keyword-based)
qmd search "authentication flow"

# Vector search (semantic similarity)
qmd vsearch "how to login"

# Hybrid search with re-ranking (best quality)
qmd query "user authentication"

Options

-n <num>           # Number of results (default: 5)
--min-score <num>  # Minimum score threshold (default: 0)
--full             # Show full document content
-csv               # CSV output (for piping/scripting)
-md                # Output as markdown
-xml               # Output as XML
--index <name>     # Use named index

Output Format

Default output is colorized CLI format (respects NO_COLOR env):

 93%  docs/guide.md:42
  │ This section covers the **craftsmanship** of building
  │ quality software with attention to detail.
  │ See also: engineering principles

 67%  notes/meeting.md:15
  │ Discussion about code quality and craftsmanship
  │ in the development process.
  • Score: Color-coded (green >70%, yellow >40%, dim otherwise)
  • Path: Shortened relative to current directory
  • Line: Line number where match was found (omitted for vector-only results)
  • Snippet: Context around match with query terms highlighted

Examples

# Get 10 results with minimum score 0.3
qmd query -n 10 --min-score 0.3 "API design patterns"

# Output as markdown for LLM context
qmd search -md --full "error handling"

# Use separate index for different knowledge base
qmd --index work search "quarterly reports"

Manage Collections

# List all indexed collections
qmd list

# Show database statistics
qmd stats

# Forget a collection
qmd forget

Data Storage

Index stored in: ~/.cache/qmd/index.sqlite

Schema

collections     -- Indexed directories and glob patterns
documents       -- Markdown content with metadata
documents_fts   -- FTS5 full-text index
content_vectors -- Embedding cache (by content hash)
vectors_vec     -- sqlite-vec vector index

Environment Variables

Variable Default Description
OLLAMA_URL http://localhost:11434 Ollama API endpoint
XDG_CACHE_HOME ~/.cache Cache directory location

How It Works

Indexing Flow

Markdown Files ──► Parse Title ──► Hash Content ──► Store in SQLite
                      │                                    │
                      └─► FTS5 Index ◄─────────────────────┘

Embedding Flow

Document ──► Format for EmbeddingGemma ──► Ollama API ──► Store Vector
              "title: X | text: Y"           /api/embed

Query Flow (Hybrid)

Query ──► Expand (3 variations) ──► FTS + Vector (per variation)
                                            │
                                            ▼
                                   Merge (max score)
                                            │
                                            ▼
                                   Top 25 candidates
                                            │
                                            ▼
                                   LLM Re-rank (0-10)
                                            │
                                            ▼
                                   Final ranked results

Model Configuration

Models are configured as constants in qmd.ts:

const DEFAULT_EMBED_MODEL = "embeddinggemma";
const DEFAULT_RERANK_MODEL = "ExpedientFalcon/qwen3-reranker:0.6b-q8_0";
const DEFAULT_QUERY_MODEL = "qwen3:0.6b";

EmbeddingGemma Prompt Format

// For queries
"task: search result | query: {query}"

// For documents
"title: {title} | text: {content}"

Qwen3-Reranker

A dedicated reranker model trained on relevance classification:

System: Judge whether the Document meets the requirements based on the Query
        and the Instruct provided. Note that the answer can only be "yes" or "no".

User: <Instruct>: Given a search query, determine if the document is relevant...
      <Query>: {query}
      <Document>: {doc}
  • Uses logprobs: true to extract token probabilities
  • Outputs yes/no with confidence score (0.0 - 1.0)
  • num_predict: 1 - Only need the yes/no token

Qwen3 (Query Expansion)

  • num_predict: 150 - For generating query variations

License

MIT