150 lines
3.8 KiB
Markdown
150 lines
3.8 KiB
Markdown
# LEANN Grep Search Usage Guide
|
|
|
|
## Overview
|
|
|
|
LEANN's grep search functionality provides exact text matching for finding specific code patterns, error messages, function names, or exact phrases in your indexed documents.
|
|
|
|
## Basic Usage
|
|
|
|
### Simple Grep Search
|
|
|
|
```python
|
|
from leann.api import LeannSearcher
|
|
|
|
searcher = LeannSearcher("your_index_path")
|
|
|
|
# Exact text search
|
|
results = searcher.search("def authenticate_user", use_grep=True, top_k=5)
|
|
|
|
for result in results:
|
|
print(f"Score: {result.score}")
|
|
print(f"Text: {result.text[:100]}...")
|
|
print("-" * 40)
|
|
```
|
|
|
|
### Comparison: Semantic vs Grep Search
|
|
|
|
```python
|
|
# Semantic search - finds conceptually similar content
|
|
semantic_results = searcher.search("machine learning algorithms", top_k=3)
|
|
|
|
# Grep search - finds exact text matches
|
|
grep_results = searcher.search("def train_model", use_grep=True, top_k=3)
|
|
```
|
|
|
|
## When to Use Grep Search
|
|
|
|
### Use Cases
|
|
|
|
- **Code Search**: Finding specific function definitions, class names, or variable references
|
|
- **Error Debugging**: Locating exact error messages or stack traces
|
|
- **Documentation**: Finding specific API endpoints or exact terminology
|
|
|
|
### Examples
|
|
|
|
```python
|
|
# Find function definitions
|
|
functions = searcher.search("def __init__", use_grep=True)
|
|
|
|
# Find import statements
|
|
imports = searcher.search("from sklearn import", use_grep=True)
|
|
|
|
# Find specific error types
|
|
errors = searcher.search("FileNotFoundError", use_grep=True)
|
|
|
|
# Find TODO comments
|
|
todos = searcher.search("TODO:", use_grep=True)
|
|
|
|
# Find configuration entries
|
|
configs = searcher.search("server_port=", use_grep=True)
|
|
```
|
|
|
|
## Technical Details
|
|
|
|
### How It Works
|
|
|
|
1. **File Location**: Grep search operates on the raw text stored in `.jsonl` files
|
|
2. **Command Execution**: Uses the system `grep` command with case-insensitive search
|
|
3. **Result Processing**: Parses JSON lines and extracts text and metadata
|
|
4. **Scoring**: Simple frequency-based scoring based on query term occurrences
|
|
|
|
### Search Process
|
|
|
|
```
|
|
Query: "def train_model"
|
|
↓
|
|
grep -i -n "def train_model" documents.leann.passages.jsonl
|
|
↓
|
|
Parse matching JSON lines
|
|
↓
|
|
Calculate scores based on term frequency
|
|
↓
|
|
Return top_k results
|
|
```
|
|
|
|
### Scoring Algorithm
|
|
|
|
```python
|
|
# Term frequency in document
|
|
score = text.lower().count(query.lower())
|
|
```
|
|
|
|
Results are ranked by score (highest first), with higher scores indicating more occurrences of the search term.
|
|
|
|
## Error Handling
|
|
|
|
### Common Issues
|
|
|
|
#### Grep Command Not Found
|
|
```
|
|
RuntimeError: grep command not found. Please install grep or use semantic search.
|
|
```
|
|
|
|
**Solution**: Install grep on your system:
|
|
- **Ubuntu/Debian**: `sudo apt-get install grep`
|
|
- **macOS**: grep is pre-installed
|
|
- **Windows**: Use WSL or install grep via Git Bash/MSYS2
|
|
|
|
#### No Results Found
|
|
```python
|
|
# Check if your query exists in the raw data
|
|
results = searcher.search("your_query", use_grep=True)
|
|
if not results:
|
|
print("No exact matches found. Try:")
|
|
print("1. Check spelling and case")
|
|
print("2. Use partial terms")
|
|
print("3. Switch to semantic search")
|
|
```
|
|
|
|
## Complete Example
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
"""
|
|
Grep Search Example
|
|
Demonstrates grep search for exact text matching.
|
|
"""
|
|
|
|
from leann.api import LeannSearcher
|
|
|
|
def demonstrate_grep_search():
|
|
# Initialize searcher
|
|
searcher = LeannSearcher("my_index")
|
|
|
|
print("=== Function Search ===")
|
|
functions = searcher.search("def __init__", use_grep=True, top_k=5)
|
|
for i, result in enumerate(functions, 1):
|
|
print(f"{i}. Score: {result.score}")
|
|
print(f" Preview: {result.text[:60]}...")
|
|
print()
|
|
|
|
print("=== Error Search ===")
|
|
errors = searcher.search("FileNotFoundError", use_grep=True, top_k=3)
|
|
for result in errors:
|
|
print(f"Content: {result.text.strip()}")
|
|
print("-" * 40)
|
|
|
|
if __name__ == "__main__":
|
|
demonstrate_grep_search()
|
|
```
|