SCORING.md 9.5 KB

QMD Query Expansion Scoring

Goal

Transform a random typed query into a great set of retrieval-optimized expansions.

Input: "auth config" Output:

hyde: Authentication can be configured by setting the AUTH_SECRET environment variable and enabling the auth middleware in your application's config file.
lex: authentication configuration
lex: auth settings setup
vec: how to configure authentication settings
vec: authentication configuration options

Output Format

Prefix Purpose Required Count
lex: BM25 keyword variations (shorter, keyword-focused) Yes 1-3
vec: Semantic reformulations (natural language) Yes 1-3
hyde: Hypothetical document passage Optional 0-1

Scoring Criteria

1. Format Compliance (0-30 points)

Criterion Points Deduction
Has at least one lex: line +10 -10 if missing
Has at least one vec: line +10 -10 if missing
All lines have valid prefix (lex:, vec:, hyde:) +10 -5 per invalid line
No garbage/prose outside of prefixed lines - -10 if present

2. Diversity & Coverage (0-30 points)

Criterion Points Deduction
2+ different types present (lex + vec) +10 -10 if only one type
2+ total expansions +5 -5 if only one
Multiple lex: lines are diverse (edit distance > 3) +5 -2 per duplicate pair
Multiple vec: lines are diverse (edit distance > 5) +5 -2 per duplicate pair
lex/vec not identical to original query +5 -5 per line that equals query

3. Hyde Quality (0-20 points, optional bonus)

Criterion Points Deduction
Hyde present and well-formed +5 -
Hyde is concise (50-200 chars) +5 -3 if too short, -5 if too long
Hyde has no newlines +5 -5 if contains newlines
Hyde has no excessive repetition +5 -3 if word repeats 3+ times

4. Content Quality (0-20 points)

Criterion Points Deduction
Base relevance +5 Subjective
Lex lines preserve key terms from query +5 -5 if lex is generic
Lex lines are keyword-focused (shorter) +5 -2 if lex is longer than vec
Vec lines are natural language (complete phrases) +5 -2 if vec is just keywords

5. Named Entity Preservation (0-20 points, CRITICAL)

Named entities are proper nouns, brand names, technical terms, and acronyms that MUST appear in lex queries. This prevents generic expansions that lose the specific topic.

Criterion Points Deduction
All lex lines contain at least one entity +15 -
Some lex lines contain entities +5 -
NO lex lines contain entities - -30 HEAVY PENALTY
Generic filler phrases in lex - -15 per phrase
Entities also in vec lines +5 -

Named Entity Detection:

  • All-caps acronyms: TDS, API, GPU, AWS
  • Capitalized proper nouns: React, Docker, Kubernetes
  • Technical terms: node.js, C++, .NET
  • CamelCase: JavaScript, TypeScript
  • Compound names: TDS motorsports → both words are entities

Generic Filler Phrases (BANNED in lex):

  • "find information about"
  • "search for", "look up"
  • "get information", "learn about"
  • "details about", "guide to"

Examples:

Query Bad Lex (Score: 0.30) Good Lex (Score: 1.00)
who is TDS motorsports lex: find information about lex: TDS motorsports history
lex: company details lex: TDS motorsports founders
how to use React hooks lex: programming tutorial lex: React hooks tutorial
lex: how to code lex: useEffect useState hooks

Key Rule: If a query mentions a specific entity (brand, product, technology), EVERY lex line should include that entity or a direct variation of it.

Score Calculation

Total Score = Format + Diversity + Hyde + Quality + Entity
Max Score = 120 (100 without hyde)
Normalized = Total / Max (0.0 - 1.0)

Rating:

  • 0.80-1.00: Excellent
  • 0.60-0.79: Good
  • 0.40-0.59: Acceptable
  • 0.20-0.39: Poor
  • 0.00-0.19: Failed

Note: Entity score can go negative, heavily penalizing outputs that miss named entities.

Examples

Good Example (Score: 85/100)

Query: "react hooks"

Output:

hyde: React Hooks allow you to use state and lifecycle features in functional components without writing a class.
lex: react hooks tutorial
lex: usestate useeffect
vec: how to use react hooks in functional components
vec: react hooks best practices guide

Scoring:

  • Format: 30/30 (all prefixes valid)
  • Diversity: 28/30 (2 types, 4 expansions, good diversity)
  • Hyde: 17/20 (present, good length, no repetition)
  • Quality: 10/20 (relevant, good structure)

Bad Example (Score: 15/100)

Query: "auth"

Output:

auth is an important concept that relates to authentication.
The answer should be in Chinese.
The answer should be in Chinese.

Scoring:

  • Format: 0/30 (no valid prefixes)
  • Diversity: 0/30 (no valid expansions)
  • Hyde: 0/20 (N/A)
  • Quality: 15/20 (some relevance but wrong format)

Medium Example (Score: 55/100)

Query: "docker networking"

Output:

hyde: Docker networking is an important concept. Docker networking is used for container communication. Docker networking configuration is essential.
lex: docker networking
vec: docker networking

Scoring:

  • Format: 30/30 (valid prefixes)
  • Diversity: 10/30 (lex=vec=query, no diversity)
  • Hyde: 5/20 (too repetitive - "docker networking" 3x)
  • Quality: 10/20 (relevant but low effort)

Heuristics

Repetition Detection

def word_repetition_score(text):
    words = text.lower().split()
    counts = Counter(words)
    # Deduct for words appearing 3+ times (excluding stopwords)
    stopwords = {'the', 'a', 'an', 'is', 'are', 'to', 'for', 'of', 'in', 'and', 'or'}
    repeated = sum(1 for w, c in counts.items() if c >= 3 and w not in stopwords)
    return max(0, 5 - repeated * 2)

Diversity Check (Simple)

def is_diverse(a, b, min_distance=3):
    """Check if two strings are sufficiently different."""
    a, b = a.lower().strip(), b.lower().strip()
    if a == b:
        return False
    # Simple: check if one is not a substring of the other
    if a in b or b in a:
        return False
    # Check edit distance (simplified)
    return len(set(a.split()) ^ set(b.split())) >= min_distance

Query Echo Detection

def echoes_query(expansion, query):
    """Check if expansion is just echoing the query."""
    exp = expansion.lower().strip()
    q = query.lower().strip()
    return exp == q or exp in q or q in exp

Named Entity Extraction

KEY_TERM_STOPWORDS = {'what', 'is', 'how', 'to', 'the', 'a', 'an', 'in', 'on', 'for', 'of',
                      'and', 'or', 'with', 'my', 'your', 'do', 'does', 'can', 'i', 'me', 'we',
                      'who', 'where', 'when', 'why', 'which', 'find', 'get', 'show', 'tell'}

def extract_named_entities(query: str) -> set:
    """Extract named entities using simple heuristics."""
    entities = set()
    words = query.split()
    prev_was_entity = False

    for i, word in enumerate(words):
        clean = word.strip('.,!?:;()[]"\'')
        if not clean:
            prev_was_entity = False
            continue

        is_entity = False

        # All-caps acronyms: TDS, API, GPU
        if clean.isupper() and len(clean) >= 2:
            entities.add(clean.lower())
            is_entity = True
        # Capitalized proper nouns (not first word)
        elif i > 0 and clean[0].isupper() and clean.lower() not in KEY_TERM_STOPWORDS:
            entities.add(clean.lower())
            is_entity = True
        # Technical terms: node.js, C++
        elif any(c in clean for c in '.+-#@') and len(clean) >= 2:
            entities.add(clean.lower())
            is_entity = True
        # CamelCase: JavaScript
        elif len(clean) > 1 and any(c.isupper() for c in clean[1:]) and clean[0].isupper():
            entities.add(clean.lower())
            is_entity = True
        # Word following an entity (compound names: TDS motorsports)
        elif prev_was_entity and clean.lower() not in KEY_TERM_STOPWORDS:
            entities.add(clean.lower())
            is_entity = True

        prev_was_entity = is_entity

    return entities

Generic Phrase Detection

GENERIC_LEX_PHRASES = {
    'find information about', 'search for', 'look up', 'get information',
    'learn about', 'information on', 'details about', 'find out about',
    'what is', 'how to', 'guide to', 'help with'
}

def lex_is_generic(lex_line: str) -> bool:
    """Check if lex line is a useless generic filler."""
    lex_lower = lex_line.lower().strip()
    for phrase in GENERIC_LEX_PHRASES:
        if phrase in lex_lower:
            # Check if there's specific content beyond the generic phrase
            remaining = lex_lower
            for word in phrase.split():
                remaining = remaining.replace(word, '', 1).strip()
            if len(remaining) < 3:  # Nothing specific left
                return True
    return False

Training Data Requirements

  1. EOM tokens: Ensure training examples end with proper end-of-message tokens
  2. Diverse examples: Include varied query types (short, long, technical, casual)
  3. Quality hyde: Hyde passages should be informative, not template-y
  4. No repetition: Avoid "This is important. This is very important." patterns