Files
Yichuan Wang 3b8dc6368e Ast fork (#92)
2025-09-08 18:43:31 -07:00

3.8 KiB

LEANN Grep Search Usage Guide

Overview

LEANN's grep search functionality provides exact text matching for finding specific code patterns, error messages, function names, or exact phrases in your indexed documents.

Basic Usage

from leann.api import LeannSearcher

searcher = LeannSearcher("your_index_path")

# Exact text search
results = searcher.search("def authenticate_user", use_grep=True, top_k=5)

for result in results:
    print(f"Score: {result.score}")
    print(f"Text: {result.text[:100]}...")
    print("-" * 40)
# Semantic search - finds conceptually similar content
semantic_results = searcher.search("machine learning algorithms", top_k=3)

# Grep search - finds exact text matches
grep_results = searcher.search("def train_model", use_grep=True, top_k=3)

Use Cases

  • Code Search: Finding specific function definitions, class names, or variable references
  • Error Debugging: Locating exact error messages or stack traces
  • Documentation: Finding specific API endpoints or exact terminology

Examples

# Find function definitions
functions = searcher.search("def __init__", use_grep=True)

# Find import statements
imports = searcher.search("from sklearn import", use_grep=True)

# Find specific error types
errors = searcher.search("FileNotFoundError", use_grep=True)

# Find TODO comments
todos = searcher.search("TODO:", use_grep=True)

# Find configuration entries
configs = searcher.search("server_port=", use_grep=True)

Technical Details

How It Works

  1. File Location: Grep search operates on the raw text stored in .jsonl files
  2. Command Execution: Uses the system grep command with case-insensitive search
  3. Result Processing: Parses JSON lines and extracts text and metadata
  4. Scoring: Simple frequency-based scoring based on query term occurrences

Search Process

Query: "def train_model"
  ↓
grep -i -n "def train_model" documents.leann.passages.jsonl
  ↓
Parse matching JSON lines
  ↓
Calculate scores based on term frequency
  ↓
Return top_k results

Scoring Algorithm

# Term frequency in document
score = text.lower().count(query.lower())

Results are ranked by score (highest first), with higher scores indicating more occurrences of the search term.

Error Handling

Common Issues

Grep Command Not Found

RuntimeError: grep command not found. Please install grep or use semantic search.

Solution: Install grep on your system:

  • Ubuntu/Debian: sudo apt-get install grep
  • macOS: grep is pre-installed
  • Windows: Use WSL or install grep via Git Bash/MSYS2

No Results Found

# Check if your query exists in the raw data
results = searcher.search("your_query", use_grep=True)
if not results:
    print("No exact matches found. Try:")
    print("1. Check spelling and case")
    print("2. Use partial terms")
    print("3. Switch to semantic search")

Complete Example

#!/usr/bin/env python3
"""
Grep Search Example
Demonstrates grep search for exact text matching.
"""

from leann.api import LeannSearcher

def demonstrate_grep_search():
    # Initialize searcher
    searcher = LeannSearcher("my_index")

    print("=== Function Search ===")
    functions = searcher.search("def __init__", use_grep=True, top_k=5)
    for i, result in enumerate(functions, 1):
        print(f"{i}. Score: {result.score}")
        print(f"   Preview: {result.text[:60]}...")
        print()

    print("=== Error Search ===")
    errors = searcher.search("FileNotFoundError", use_grep=True, top_k=3)
    for result in errors:
        print(f"Content: {result.text.strip()}")
        print("-" * 40)

if __name__ == "__main__":
    demonstrate_grep_search()