Files

ww26 1ef9cba7de Feature/prompt templates and lmstudio sdk (#171 )

* Add prompt template support and LM Studio SDK integration

Features:

- Prompt template support for embedding models (via --embedding-prompt-template)

- LM Studio SDK integration for automatic context length detection

- Hybrid token limit discovery (Ollama → LM Studio → Registry → Default)

- Client-side token truncation to prevent silent failures

- Automatic persistence of embedding_options to .meta.json

Implementation:

- Added _query_lmstudio_context_limit() with Node.js subprocess bridge

- Modified compute_embeddings_openai() to apply prompt templates before truncation

- Extended CLI with --embedding-prompt-template flag for build and search

- URL detection for LM Studio (port 1234 or lmstudio/lm.studio keywords)

- HTTP→WebSocket URL conversion for SDK compatibility

Tests:

- 60 passing tests across 5 test files

- Comprehensive coverage of prompt templates, LM Studio integration, and token handling

- Parametrized tests for maintainability and clarity

* Add integration tests and fix LM Studio SDK bridge

Features:
- End-to-end integration tests for prompt template with EmbeddingGemma
- Integration tests for hybrid token limit discovery mechanism
- Tests verify real-world functionality with live services (LM Studio, Ollama)

Fixes:
- LM Studio SDK bridge now uses client.embedding.load() for embedding models
- Fixed NODE_PATH resolution to include npm global modules
- Fixed integration test to use WebSocket URL (ws://) for SDK bridge

Tests:
- test_prompt_template_e2e.py: 8 integration tests covering:
  - Prompt template prepending with LM Studio (EmbeddingGemma)
  - LM Studio SDK bridge for context length detection
  - Ollama dynamic token limit detection
  - Hybrid discovery fallback mechanism (registry, default)
- All tests marked with @pytest.mark.integration for selective execution
- Tests gracefully skip when services unavailable

Documentation:
- Updated tests/README.md with integration test section
- Added prerequisites and running instructions
- Documented that prompt templates are ONLY for EmbeddingGemma
- Added integration marker to pyproject.toml

Test Results:
- All 8 integration tests passing with live services
- Confirmed prompt templates work correctly with EmbeddingGemma
- Verified LM Studio SDK bridge auto-detects context length (2048)
- Validated hybrid token limit discovery across all backends

* Add prompt template support to Ollama mode

Extends prompt template functionality from OpenAI mode to Ollama for backend consistency.

Changes:
- Add provider_options parameter to compute_embeddings_ollama()
- Apply prompt template before token truncation (lines 1005-1011)
- Pass provider_options through compute_embeddings() call chain

Tests:
- test_ollama_embedding_with_prompt_template: Verifies templates work with Ollama
- test_ollama_prompt_template_affects_embeddings: Confirms embeddings differ with/without template
- Both tests pass with live Ollama service (2/2 passing)

Usage:
leann build --embedding-mode ollama --embedding-prompt-template "query: " ...

* Fix LM Studio SDK bridge to respect JIT auto-evict settings

Problem: SDK bridge called client.embedding.load() which loaded models into
LM Studio memory and bypassed JIT auto-evict settings, causing duplicate
model instances to accumulate.

Root cause analysis (from Perplexity research):
- Explicit SDK load() commands are treated as "pinned" models
- JIT auto-evict only applies to models loaded reactively via API requests
- SDK-loaded models remain in memory until explicitly unloaded

Solutions implemented:

1. Add model.unload() after metadata query (line 243)
   - Load model temporarily to get context length
   - Unload immediately to hand control back to JIT system
   - Subsequent API requests trigger JIT load with auto-evict

2. Add token limit caching to prevent repeated SDK calls
   - Cache discovered limits in _token_limit_cache dict (line 48)
   - Key: (model_name, base_url), Value: token_limit
   - Prevents duplicate load/unload cycles within same process
   - Cache shared across all discovery methods (Ollama, SDK, registry)

Tests:
- TestTokenLimitCaching: 5 tests for cache behavior (integrated into test_token_truncation.py)
- Manual testing confirmed no duplicate models in LM Studio after fix
- All existing tests pass

Impact:
- Respects user's LM Studio JIT and auto-evict settings
- Reduces model memory footprint
- Faster subsequent builds (cached limits)

* Document prompt template and LM Studio SDK features

Added comprehensive documentation for new optional embedding features:

Configuration Guide (docs/configuration-guide.md):
- New section: "Optional Embedding Features"
- Task-Specific Prompt Templates subsection:
  - Explains EmbeddingGemma use case with document/query prompts
  - CLI and Python API examples
  - Clear warnings about compatible vs incompatible models
  - References to GitHub issue #155 and HuggingFace blog
- LM Studio Auto-Detection subsection:
  - Prerequisites (Node.js + @lmstudio/sdk)
  - How auto-detection works (4-step process)
  - Benefits and optional nature clearly stated

FAQ (docs/faq.md):
- FAQ #2: When should I use prompt templates?
  - DO/DON'T guidance with examples
  - Links to detailed configuration guide
- FAQ #3: Why is LM Studio loading multiple copies?
  - Explains the JIT auto-evict fix
  - Troubleshooting steps if still seeing issues
- FAQ #4: Do I need Node.js and @lmstudio/sdk?
  - Clarifies it's completely optional
  - Lists benefits if installed
  - Installation instructions

Cross-references between documents for easy navigation between quick reference and detailed guides.

* Add separate build/query template support for task-specific models

Task-specific models like EmbeddingGemma require different templates for indexing vs searching. Store both templates at build time and auto-apply query template during search with backward compatibility.

* Consolidate prompt template tests from 44 to 37 tests

Merged redundant no-op tests, removed low-value implementation tests, consolidated parameterized CLI tests, and removed hanging over-mocked test. All tests pass with improved focus on behavioral testing.

* Fix query template application in compute_query_embedding

Query templates were only applied in the fallback code path, not when using the embedding server (default path). This meant stored query templates in index metadata were ignored during MCP and CLI searches.

Changes:

- Move template application to before any computation path (searcher_base.py:109-110)

- Add comprehensive tests for both server and fallback paths

- Consolidate tests into test_prompt_template_persistence.py

Tests verify:

- Template applied when using embedding server

- Template applied in fallback path

- Consistent behavior between both paths

* Apply ruff formatting and fix linting issues

- Remove unused imports

- Fix import ordering

- Remove unused variables

- Apply code formatting

* Fix CI test failures: mock OPENAI_API_KEY in tests

Tests were failing in CI because compute_embeddings_openai() checks for OPENAI_API_KEY before using the mocked client. Added monkeypatch to set fake API key in test fixture.

2025-11-14 15:25:17 -08:00

README.md

Feature/prompt templates and lmstudio sdk (#171 )

2025-11-14 15:25:17 -08:00

test_astchunk_integration.py

metadata reveal for ast-chunking; smart detection of seq length in ollama; auto adjust chunk length for ast to prevent silent truncation (#157 )

2025-11-08 17:37:31 -08:00

test_basic.py

feat(core,diskann): robust embedding server (no-hang) + DiskANN fast mode (graph partition) (#29 )

2025-08-14 01:02:24 -07:00

test_ci_minimal.py

refactor: Unify examples interface with BaseRAGExample (#12 )

2025-08-03 23:06:24 -07:00

test_cli_ask.py

Allow 'leann ask' to accept a positional question (#116 )

2025-09-23 21:18:57 -07:00

test_cli_prompt_template.py

Feature/prompt templates and lmstudio sdk (#171 )

2025-11-14 15:25:17 -08:00

test_diskann_partition.py

feat(core,diskann): robust embedding server (no-hang) + DiskANN fast mode (graph partition) (#29 )

2025-08-14 01:02:24 -07:00

test_document_rag.py

Add AST-aware code chunking for better code understanding (#58 )

2025-08-19 23:35:31 -07:00

test_embedding_prompt_template.py

Feature/prompt templates and lmstudio sdk (#171 )

2025-11-14 15:25:17 -08:00

test_embedding_server_manager.py

Fix restart embedding server when passages change (#117 )

2025-09-23 22:28:36 -07:00

test_lmstudio_bridge.py

Feature/prompt templates and lmstudio sdk (#171 )

2025-11-14 15:25:17 -08:00

test_mcp_integration.py

feat: Add MCP integration support for Slack and Twitter (#134 )

2025-10-07 02:18:32 -07:00

test_mcp_standalone.py

feat: Add MCP integration support for Slack and Twitter (#134 )

2025-10-07 02:18:32 -07:00

test_metadata_filtering.py

Metadata filtering feature (#75 )

2025-08-20 19:57:56 -07:00

test_prompt_template_e2e.py

Feature/prompt templates and lmstudio sdk (#171 )

2025-11-14 15:25:17 -08:00

test_prompt_template_persistence.py

Feature/prompt templates and lmstudio sdk (#171 )

2025-11-14 15:25:17 -08:00

test_readme_examples.py

feat(core,diskann): robust embedding server (no-hang) + DiskANN fast mode (graph partition) (#29 )

2025-08-14 01:02:24 -07:00

test_token_truncation.py

Feature/prompt templates and lmstudio sdk (#171 )

2025-11-14 15:25:17 -08:00

README.md

LEANN Tests

This directory contains automated tests for the LEANN project using pytest.

Test Files

`test_readme_examples.py`

Tests the examples shown in README.md:

The basic example code that users see first (parametrized for both HNSW and DiskANN backends)
Import statements work correctly
Different backend options (HNSW, DiskANN)
Different LLM configuration options (parametrized for both backends)
All main README examples are tested with both HNSW and DiskANN backends using pytest parametrization

`test_basic.py`

Basic functionality tests that verify:

All packages can be imported correctly
C++ extensions (FAISS, DiskANN) load properly
Basic index building and searching works for both HNSW and DiskANN backends
Uses parametrized tests to test both backends

`test_document_rag.py`

Tests the document RAG example functionality:

Tests with facebook/contriever embeddings
Tests with OpenAI embeddings (if API key is available)
Tests error handling with invalid parameters
Verifies that normalized embeddings are detected and cosine distance is used

`test_diskann_partition.py`

Tests DiskANN graph partitioning functionality:

Tests DiskANN index building without partitioning (baseline)
Tests automatic graph partitioning with is_recompute=True
Verifies that partition files are created and large files are cleaned up for storage saving
Tests search functionality with partitioned indices
Validates medoid and max_base_norm file generation and usage
Includes performance comparison between DiskANN (with partition) and HNSW
Note: These tests are skipped in CI due to hardware requirements and computation time

`test_prompt_template_e2e.py`

Integration tests for prompt template feature with live embedding services:

Tests prompt template prepending with EmbeddingGemma (OpenAI-compatible API via LM Studio)
Tests hybrid token limit discovery (Ollama dynamic detection, registry fallback, default)
Tests LM Studio SDK bridge for automatic context length detection (requires Node.js + @lmstudio/sdk)
Note: These tests require live services (LM Studio, Ollama) and are marked with @pytest.mark.integration
Important: Prompt templates are ONLY for EmbeddingGemma and similar task-specific models, NOT regular embedding models

Running Tests

Install test dependencies:

# Using uv dependency groups (tools only)
uv sync --only-group test

Run all tests:

pytest tests/

# Or with coverage
pytest tests/ --cov=leann --cov-report=html

# Run in parallel (faster)
pytest tests/ -n auto

Run specific tests:

# Only basic tests
pytest tests/test_basic.py

# Only tests that don't require OpenAI
pytest tests/ -m "not openai"

# Skip slow tests
pytest tests/ -m "not slow"

# Skip integration tests (that require live services)
pytest tests/ -m "not integration"

# Run only integration tests (requires LM Studio or Ollama running)
pytest tests/test_prompt_template_e2e.py -v -s

# Run DiskANN partition tests (requires local machine, not CI)
pytest tests/test_diskann_partition.py

Run with specific backend:

# Test only HNSW backend
pytest tests/test_basic.py::test_backend_basic[hnsw]
pytest tests/test_readme_examples.py::test_readme_basic_example[hnsw]

# Test only DiskANN backend
pytest tests/test_basic.py::test_backend_basic[diskann]
pytest tests/test_readme_examples.py::test_readme_basic_example[diskann]

# All DiskANN tests (parametrized + specialized partition tests)
pytest tests/ -k diskann

CI/CD Integration

Tests are automatically run in GitHub Actions:

After building wheel packages
On multiple Python versions (3.9 - 3.13)
On both Ubuntu and macOS
Using pytest with appropriate markers and flags

pytest.ini Configuration

The pytest.ini file configures:

Test discovery paths
Default timeout (600 seconds)
Environment variables (HF_HUB_DISABLE_SYMLINKS, TOKENIZERS_PARALLELISM)
Custom markers for slow and OpenAI tests
Verbose output with short tracebacks

Integration Test Prerequisites

Integration tests (test_prompt_template_e2e.py) require live services:

Required:

LM Studio running at http://localhost:1234 with EmbeddingGemma model loaded

Optional:

Ollama running at http://localhost:11434 for token limit detection tests
Node.js + @lmstudio/sdk installed (npm install -g @lmstudio/sdk) for SDK bridge tests

Tests gracefully skip if services are unavailable.

Known Issues

OpenAI tests are automatically skipped if no API key is provided
Integration tests require live embedding services and may fail due to proxy settings (set unset ALL_PROXY all_proxy if needed)