Files

History

Andy Lee db3c63c441 Docs/Core: Low-Resource Setups, SkyPilot Option, and No-Recompute (#45 )

* docs: add SkyPilot template and instructions for running embeddings/index build on cloud GPU

* docs: add low-resource note in README; point to config guide; suggest OpenAI embeddings, SkyPilot remote build, and --no-recompute

* docs: consolidate low-resource guidance into config guide; README points to it

* cli: add --no-recompute and --no-recompute-embeddings flags; docs: clarify HNSW requires --no-compact when disabling recompute

* docs: dedupe recomputation guidance; keep single Low-resource setups section

* sky: expand leann-build.yaml with configurable params and flags (backend, recompute, compact, embedding options)

* hnsw: auto-disable compact when --no-recompute is used; docs: expand SkyPilot with -e overrides and copy-back example

* docs+sky: simplify SkyPilot flow (auto-build on launch, rsync copy-back); clarify HNSW auto non-compact when no-recompute

* feat: auto compact for hnsw when recompute

* reader: non-destructive portability (relative hints + fallback); fix comments; sky: refine yaml

* cli: unify flags to --recompute/--no-recompute for build/search/ask; docs: update references

* chore: remove

* hnsw: move pruned/no-recompute assertion into backend; api: drop global assertion; docs: will adjust after benchmarking

* cli: use argparse.BooleanOptionalAction for paired flags (--recompute/--compact) across build/search/ask

* docs: a real example on recompute

* benchmarks: fix and extend HNSW+DiskANN recompute vs no-recompute; docs: add fresh numbers and DiskANN notes

* benchmarks: unify HNSW & DiskANN into one clean script; isolate groups, fixed ports, warm-up, param complexity

* docs: diskann recompute

* core: auto-cleanup for LeannSearcher/LeannChat (__enter__/__exit__/__del__); ensure server terminate/kill robustness; benchmarks: use searcher.cleanup(); docs: suggest uv run

* fix: hang on warnings

* docs: boolean flags

* docs: leann help

2025-08-15 12:03:19 -07:00

data

docs: config guidance (#17 )

2025-08-04 22:50:32 -07:00

benchmark_embeddings.py

refactor: Unify examples interface with BaseRAGExample (#12 )

2025-08-03 23:06:24 -07:00

benchmark_no_recompute.py

Docs/Core: Low-Resource Setups, SkyPilot Option, and No-Recompute (#45 )

2025-08-15 12:03:19 -07:00

compare_faiss_vs_leann.py

refactor: Unify examples interface with BaseRAGExample (#12 )

2025-08-03 23:06:24 -07:00

diskann_vs_hnsw_speed_comparison.py

Docs/Core: Low-Resource Setups, SkyPilot Option, and No-Recompute (#45 )

2025-08-15 12:03:19 -07:00

faiss_only.py

refactor: Unify examples interface with BaseRAGExample (#12 )

2025-08-03 23:06:24 -07:00

micro_tpt.py

refactor: Unify examples interface with BaseRAGExample (#12 )

2025-08-03 23:06:24 -07:00

README.md

feat(core,diskann): robust embedding server (no-hang) + DiskANN fast mode (graph partition) (#29 )

2025-08-14 01:02:24 -07:00

run_evaluation.py

refactor: Unify examples interface with BaseRAGExample (#12 )

2025-08-03 23:06:24 -07:00

simple_mac_tpt_test.py

refactor: Unify examples interface with BaseRAGExample (#12 )

2025-08-03 23:06:24 -07:00

README.md

🧪 LEANN Benchmarks & Testing

This directory contains performance benchmarks and comprehensive tests for the LEANN system, including backend comparisons and sanity checks across different configurations.

📁 Test Files

`diskann_vs_hnsw_speed_comparison.py`

Performance comparison between DiskANN and HNSW backends:

✅ Search latency comparison with both backends using recompute
✅ Index size and build time measurements
✅ Score validity testing (ensures no -inf scores)
✅ Configurable dataset sizes for different scales

# Quick comparison with 500 docs, 10 queries
python benchmarks/diskann_vs_hnsw_speed_comparison.py

# Large-scale comparison with 2000 docs, 20 queries
python benchmarks/diskann_vs_hnsw_speed_comparison.py 2000 20

`test_distance_functions.py`

Tests all supported distance functions across DiskANN backend:

✅ MIPS (Maximum Inner Product Search)
✅ L2 (Euclidean Distance)
✅ Cosine (Cosine Similarity)

uv run python tests/sanity_checks/test_distance_functions.py

`test_l2_verification.py`

Specifically verifies that L2 distance is correctly implemented by:

Building indices with L2 vs Cosine metrics
Comparing search results and score ranges
Validating that different metrics produce expected score patterns

uv run python tests/sanity_checks/test_l2_verification.py

`test_sanity_check.py`

Comprehensive end-to-end verification including:

Distance function testing
Embedding model compatibility
Search result correctness validation
Backend integration testing

uv run python tests/sanity_checks/test_sanity_check.py

🎯 What These Tests Verify

✅ Distance Function Support

All three distance metrics (MIPS, L2, Cosine) work correctly
Score ranges are appropriate for each metric type
Different metrics can produce different rankings (as expected)

✅ Backend Integration

DiskANN backend properly initializes and builds indices
Graph construction completes without errors
Search operations return valid results

✅ Embedding Pipeline

Real-time embedding computation works
Multiple embedding models are supported
ZMQ server communication functions correctly

✅ End-to-End Functionality

Index building → searching → result retrieval pipeline
Metadata preservation through the entire flow
Error handling and graceful degradation

🔍 Expected Output

When all tests pass, you should see:

📊 测试结果总结:
  mips      : ✅ 通过
  l2        : ✅ 通过
  cosine    : ✅ 通过

🎉 测试完成!

🐛 Troubleshooting

Common Issues

Import Errors: Ensure you're running from the project root:

cd /path/to/leann
uv run python tests/sanity_checks/test_distance_functions.py

Memory Issues: Reduce graph complexity for resource-constrained systems:

builder = LeannBuilder(
    backend_name="diskann",
    graph_degree=8,  # Reduced from 16
    complexity=16    # Reduced from 32
)

ZMQ Port Conflicts: The tests use different ports to avoid conflicts, but you may need to kill existing processes:

pkill -f "embedding_server"

📊 Performance Expectations

Typical Timing (3 documents, consumer hardware):

Index Building: 2-5 seconds per distance function
Search Query: 50-200ms
Recompute Mode: 5-15 seconds (higher accuracy)

Memory Usage:

Index Storage: ~1-2 MB per distance function
Runtime Memory: ~500MB (including model loading)

🔗 Integration with CI/CD

These tests are designed to be run in automated environments:

# GitHub Actions example
- name: Run Sanity Checks
  run: |
    uv run python tests/sanity_checks/test_distance_functions.py
    uv run python tests/sanity_checks/test_l2_verification.py

The tests are deterministic and should produce consistent results across different platforms.