Transform a random typed query into a great set of retrieval-optimized expansions.
Input: "auth config"
Output:
lex: authentication configuration
lex: auth settings setup
vec: how to configure authentication settings
vec: authentication configuration options
hyde: Authentication can be configured by setting the AUTH_SECRET environment variable and enabling the auth middleware in your application's config file.
| Prefix | Purpose | Required | Count |
|---|---|---|---|
lex: |
BM25 keyword variations (shorter, keyword-focused) | Yes | 1-3 |
vec: |
Semantic reformulations (natural language) | Yes | 1-3 |
hyde: |
Hypothetical document passage | Optional | 0-1 |
| Criterion | Points | Deduction |
|---|---|---|
Has at least one lex: line |
+10 | -10 if missing |
Has at least one vec: line |
+10 | -10 if missing |
All lines have valid prefix (lex:, vec:, hyde:) |
+10 | -5 per invalid line |
| No garbage/prose outside of prefixed lines | - | -10 if present |
| Criterion | Points | Deduction |
|---|---|---|
| 2+ different types present (lex + vec) | +10 | -10 if only one type |
| 2+ total expansions | +5 | -5 if only one |
| Multiple lex: lines are diverse (edit distance > 3) | +5 | -2 per duplicate pair |
| Multiple vec: lines are diverse (edit distance > 5) | +5 | -2 per duplicate pair |
| lex/vec not identical to original query | +5 | -5 per line that equals query |
| Criterion | Points | Deduction |
|---|---|---|
| Hyde present and well-formed | +5 | - |
| Hyde is concise (50-200 chars) | +5 | -3 if too short, -5 if too long |
| Hyde has no newlines | +5 | -5 if contains newlines |
| Hyde has no excessive repetition | +5 | -3 if word repeats 3+ times |
| Criterion | Points | Deduction |
|---|---|---|
| Base relevance | +10 | Subjective |
| Lex lines preserve key terms from query | +5 | -5 if lex is generic (e.g., "features" without "shop.app") |
| Lex lines are keyword-focused (shorter) | +5 | -2 if lex is longer than vec |
| Vec lines are natural language (complete phrases) | +5 | -2 if vec is just keywords |
Key Term Preservation Rule: lex: lines MUST contain at least one significant word from the original query (excluding stopwords like "what", "is", "how", "the", etc.).
Bad: what is shop.app → lex: features and benefits
Good: what is shop.app → lex: shop.app features
Total Score = Format + Diversity + Hyde + Quality
Max Score = 100 (80 without hyde)
Rating:
Query: "react hooks"
Output:
lex: react hooks tutorial
lex: usestate useeffect
vec: how to use react hooks in functional components
vec: react hooks best practices guide
hyde: React Hooks allow you to use state and lifecycle features in functional components without writing a class.
Scoring:
Query: "auth"
Output:
auth is an important concept that relates to authentication.
The answer should be in Chinese.
The answer should be in Chinese.
Scoring:
Query: "docker networking"
Output:
lex: docker networking
vec: docker networking
hyde: Docker networking is an important concept. Docker networking is used for container communication. Docker networking configuration is essential.
Scoring:
def word_repetition_score(text):
words = text.lower().split()
counts = Counter(words)
# Deduct for words appearing 3+ times (excluding stopwords)
stopwords = {'the', 'a', 'an', 'is', 'are', 'to', 'for', 'of', 'in', 'and', 'or'}
repeated = sum(1 for w, c in counts.items() if c >= 3 and w not in stopwords)
return max(0, 5 - repeated * 2)
def is_diverse(a, b, min_distance=3):
"""Check if two strings are sufficiently different."""
a, b = a.lower().strip(), b.lower().strip()
if a == b:
return False
# Simple: check if one is not a substring of the other
if a in b or b in a:
return False
# Check edit distance (simplified)
return len(set(a.split()) ^ set(b.split())) >= min_distance
def echoes_query(expansion, query):
"""Check if expansion is just echoing the query."""
exp = expansion.lower().strip()
q = query.lower().strip()
return exp == q or exp in q or q in exp