Files
LEANN/tests/test_prompt_template_persistence.py
ww26 1ef9cba7de Feature/prompt templates and lmstudio sdk (#171)
* Add prompt template support and LM Studio SDK integration

Features:

- Prompt template support for embedding models (via --embedding-prompt-template)

- LM Studio SDK integration for automatic context length detection

- Hybrid token limit discovery (Ollama → LM Studio → Registry → Default)

- Client-side token truncation to prevent silent failures

- Automatic persistence of embedding_options to .meta.json

Implementation:

- Added _query_lmstudio_context_limit() with Node.js subprocess bridge

- Modified compute_embeddings_openai() to apply prompt templates before truncation

- Extended CLI with --embedding-prompt-template flag for build and search

- URL detection for LM Studio (port 1234 or lmstudio/lm.studio keywords)

- HTTP→WebSocket URL conversion for SDK compatibility

Tests:

- 60 passing tests across 5 test files

- Comprehensive coverage of prompt templates, LM Studio integration, and token handling

- Parametrized tests for maintainability and clarity

* Add integration tests and fix LM Studio SDK bridge

Features:
- End-to-end integration tests for prompt template with EmbeddingGemma
- Integration tests for hybrid token limit discovery mechanism
- Tests verify real-world functionality with live services (LM Studio, Ollama)

Fixes:
- LM Studio SDK bridge now uses client.embedding.load() for embedding models
- Fixed NODE_PATH resolution to include npm global modules
- Fixed integration test to use WebSocket URL (ws://) for SDK bridge

Tests:
- test_prompt_template_e2e.py: 8 integration tests covering:
  - Prompt template prepending with LM Studio (EmbeddingGemma)
  - LM Studio SDK bridge for context length detection
  - Ollama dynamic token limit detection
  - Hybrid discovery fallback mechanism (registry, default)
- All tests marked with @pytest.mark.integration for selective execution
- Tests gracefully skip when services unavailable

Documentation:
- Updated tests/README.md with integration test section
- Added prerequisites and running instructions
- Documented that prompt templates are ONLY for EmbeddingGemma
- Added integration marker to pyproject.toml

Test Results:
- All 8 integration tests passing with live services
- Confirmed prompt templates work correctly with EmbeddingGemma
- Verified LM Studio SDK bridge auto-detects context length (2048)
- Validated hybrid token limit discovery across all backends

* Add prompt template support to Ollama mode

Extends prompt template functionality from OpenAI mode to Ollama for backend consistency.

Changes:
- Add provider_options parameter to compute_embeddings_ollama()
- Apply prompt template before token truncation (lines 1005-1011)
- Pass provider_options through compute_embeddings() call chain

Tests:
- test_ollama_embedding_with_prompt_template: Verifies templates work with Ollama
- test_ollama_prompt_template_affects_embeddings: Confirms embeddings differ with/without template
- Both tests pass with live Ollama service (2/2 passing)

Usage:
leann build --embedding-mode ollama --embedding-prompt-template "query: " ...

* Fix LM Studio SDK bridge to respect JIT auto-evict settings

Problem: SDK bridge called client.embedding.load() which loaded models into
LM Studio memory and bypassed JIT auto-evict settings, causing duplicate
model instances to accumulate.

Root cause analysis (from Perplexity research):
- Explicit SDK load() commands are treated as "pinned" models
- JIT auto-evict only applies to models loaded reactively via API requests
- SDK-loaded models remain in memory until explicitly unloaded

Solutions implemented:

1. Add model.unload() after metadata query (line 243)
   - Load model temporarily to get context length
   - Unload immediately to hand control back to JIT system
   - Subsequent API requests trigger JIT load with auto-evict

2. Add token limit caching to prevent repeated SDK calls
   - Cache discovered limits in _token_limit_cache dict (line 48)
   - Key: (model_name, base_url), Value: token_limit
   - Prevents duplicate load/unload cycles within same process
   - Cache shared across all discovery methods (Ollama, SDK, registry)

Tests:
- TestTokenLimitCaching: 5 tests for cache behavior (integrated into test_token_truncation.py)
- Manual testing confirmed no duplicate models in LM Studio after fix
- All existing tests pass

Impact:
- Respects user's LM Studio JIT and auto-evict settings
- Reduces model memory footprint
- Faster subsequent builds (cached limits)

* Document prompt template and LM Studio SDK features

Added comprehensive documentation for new optional embedding features:

Configuration Guide (docs/configuration-guide.md):
- New section: "Optional Embedding Features"
- Task-Specific Prompt Templates subsection:
  - Explains EmbeddingGemma use case with document/query prompts
  - CLI and Python API examples
  - Clear warnings about compatible vs incompatible models
  - References to GitHub issue #155 and HuggingFace blog
- LM Studio Auto-Detection subsection:
  - Prerequisites (Node.js + @lmstudio/sdk)
  - How auto-detection works (4-step process)
  - Benefits and optional nature clearly stated

FAQ (docs/faq.md):
- FAQ #2: When should I use prompt templates?
  - DO/DON'T guidance with examples
  - Links to detailed configuration guide
- FAQ #3: Why is LM Studio loading multiple copies?
  - Explains the JIT auto-evict fix
  - Troubleshooting steps if still seeing issues
- FAQ #4: Do I need Node.js and @lmstudio/sdk?
  - Clarifies it's completely optional
  - Lists benefits if installed
  - Installation instructions

Cross-references between documents for easy navigation between quick reference and detailed guides.

* Add separate build/query template support for task-specific models

Task-specific models like EmbeddingGemma require different templates for indexing vs searching. Store both templates at build time and auto-apply query template during search with backward compatibility.

* Consolidate prompt template tests from 44 to 37 tests

Merged redundant no-op tests, removed low-value implementation tests, consolidated parameterized CLI tests, and removed hanging over-mocked test. All tests pass with improved focus on behavioral testing.

* Fix query template application in compute_query_embedding

Query templates were only applied in the fallback code path, not when using the embedding server (default path). This meant stored query templates in index metadata were ignored during MCP and CLI searches.

Changes:

- Move template application to before any computation path (searcher_base.py:109-110)

- Add comprehensive tests for both server and fallback paths

- Consolidate tests into test_prompt_template_persistence.py

Tests verify:

- Template applied when using embedding server

- Template applied in fallback path

- Consistent behavior between both paths

* Apply ruff formatting and fix linting issues

- Remove unused imports

- Fix import ordering

- Remove unused variables

- Apply code formatting

* Fix CI test failures: mock OPENAI_API_KEY in tests

Tests were failing in CI because compute_embeddings_openai() checks for OPENAI_API_KEY before using the mocked client. Added monkeypatch to set fake API key in test fixture.
2025-11-14 15:25:17 -08:00

809 lines
32 KiB
Python

"""
Integration tests for prompt template metadata persistence and reuse.
These tests verify the complete lifecycle of prompt template persistence:
1. Template is saved to .meta.json during index build
2. Template is automatically loaded during search operations
3. Template can be overridden with explicit flag during search
4. Template is reused during chat/ask operations
These are integration tests that:
- Use real file system with temporary directories
- Run actual build and search operations
- Inspect .meta.json file contents directly
- Mock embedding servers to avoid external dependencies
- Use small test codebases for fast execution
Expected to FAIL in Red Phase because metadata persistence verification is not yet implemented.
"""
import json
import tempfile
from pathlib import Path
from unittest.mock import Mock, patch
import numpy as np
import pytest
from leann.api import LeannBuilder, LeannSearcher
class TestPromptTemplateMetadataPersistence:
"""Tests for prompt template storage in .meta.json during build."""
@pytest.fixture
def temp_index_dir(self):
"""Create temporary directory for test indexes."""
with tempfile.TemporaryDirectory() as tmpdir:
yield Path(tmpdir)
@pytest.fixture
def mock_embeddings(self):
"""Mock compute_embeddings to return dummy embeddings."""
with patch("leann.api.compute_embeddings") as mock_compute:
# Return dummy embeddings as numpy array
mock_compute.return_value = np.array([[0.1, 0.2, 0.3]], dtype=np.float32)
yield mock_compute
def test_prompt_template_saved_to_metadata(self, temp_index_dir, mock_embeddings):
"""
Verify that when build is run with embedding_options containing prompt_template,
the template value is saved to .meta.json file.
This is the core persistence requirement - templates must be saved to allow
reuse in subsequent search operations without re-specifying the flag.
Expected failure: .meta.json exists but doesn't contain embedding_options
with prompt_template, or the value is not persisted correctly.
"""
# Setup test data
index_path = temp_index_dir / "test_index.leann"
template = "search_document: "
# Build index with prompt template in embedding_options
builder = LeannBuilder(
backend_name="hnsw",
embedding_model="text-embedding-3-small",
embedding_mode="openai",
embedding_options={"prompt_template": template},
)
# Add a simple document
builder.add_text("This is a test document for indexing")
# Build the index
builder.build_index(str(index_path))
# Verify .meta.json was created and contains the template
meta_path = temp_index_dir / "test_index.leann.meta.json"
assert meta_path.exists(), ".meta.json file should be created during build"
# Read and parse metadata
with open(meta_path, encoding="utf-8") as f:
meta_data = json.load(f)
# Verify embedding_options exists in metadata
assert "embedding_options" in meta_data, (
"embedding_options should be saved to .meta.json when provided"
)
# Verify prompt_template is in embedding_options
embedding_options = meta_data["embedding_options"]
assert "prompt_template" in embedding_options, (
"prompt_template should be saved within embedding_options"
)
# Verify the template value matches what we provided
assert embedding_options["prompt_template"] == template, (
f"Template should be '{template}', got '{embedding_options.get('prompt_template')}'"
)
def test_prompt_template_absent_when_not_provided(self, temp_index_dir, mock_embeddings):
"""
Verify that when no prompt template is provided during build,
.meta.json either doesn't have embedding_options or prompt_template key.
This ensures clean metadata without unnecessary keys when features aren't used.
Expected behavior: Build succeeds, .meta.json doesn't contain prompt_template.
"""
index_path = temp_index_dir / "test_no_template.leann"
# Build index WITHOUT prompt template
builder = LeannBuilder(
backend_name="hnsw",
embedding_model="text-embedding-3-small",
embedding_mode="openai",
# No embedding_options provided
)
builder.add_text("Document without template")
builder.build_index(str(index_path))
# Verify metadata
meta_path = temp_index_dir / "test_no_template.leann.meta.json"
assert meta_path.exists()
with open(meta_path, encoding="utf-8") as f:
meta_data = json.load(f)
# If embedding_options exists, it should not contain prompt_template
if "embedding_options" in meta_data:
embedding_options = meta_data["embedding_options"]
assert "prompt_template" not in embedding_options, (
"prompt_template should not be in metadata when not provided"
)
class TestPromptTemplateAutoLoadOnSearch:
"""Tests for automatic loading of prompt template during search operations.
NOTE: Over-mocked test removed (test_prompt_template_auto_loaded_on_search).
This functionality is now comprehensively tested by TestQueryPromptTemplateAutoLoad
which uses simpler mocking and doesn't hang.
"""
@pytest.fixture
def temp_index_dir(self):
"""Create temporary directory for test indexes."""
with tempfile.TemporaryDirectory() as tmpdir:
yield Path(tmpdir)
@pytest.fixture
def mock_embeddings(self):
"""Mock compute_embeddings to capture calls and return dummy embeddings."""
with patch("leann.api.compute_embeddings") as mock_compute:
mock_compute.return_value = np.array([[0.1, 0.2, 0.3]], dtype=np.float32)
yield mock_compute
def test_search_without_template_in_metadata(self, temp_index_dir, mock_embeddings):
"""
Verify that searching an index built WITHOUT a prompt template
works correctly (backward compatibility).
The searcher should handle missing prompt_template gracefully.
Expected behavior: Search succeeds, no template is used.
"""
# Build index without template
index_path = temp_index_dir / "no_template.leann"
builder = LeannBuilder(
backend_name="hnsw",
embedding_model="text-embedding-3-small",
embedding_mode="openai",
)
builder.add_text("Document without template")
builder.build_index(str(index_path))
# Reset mocks
mock_embeddings.reset_mock()
# Create searcher and search
searcher = LeannSearcher(index_path=str(index_path))
# Verify no template in embedding_options
assert "prompt_template" not in searcher.embedding_options, (
"Searcher should not have prompt_template when not in metadata"
)
class TestQueryPromptTemplateAutoLoad:
"""Tests for automatic loading of separate query_prompt_template during search (R2).
These tests verify the new two-template system where:
- build_prompt_template: Applied during index building
- query_prompt_template: Applied during search operations
Expected to FAIL in Red Phase (R2) because query template extraction
and application is not yet implemented in LeannSearcher.search().
"""
@pytest.fixture
def temp_index_dir(self):
"""Create temporary directory for test indexes."""
with tempfile.TemporaryDirectory() as tmpdir:
yield Path(tmpdir)
@pytest.fixture
def mock_compute_embeddings(self):
"""Mock compute_embeddings to capture calls and return dummy embeddings."""
with patch("leann.embedding_compute.compute_embeddings") as mock_compute:
mock_compute.return_value = np.array([[0.1, 0.2, 0.3]], dtype=np.float32)
yield mock_compute
def test_search_auto_loads_query_template(self, temp_index_dir, mock_compute_embeddings):
"""
Verify that search() automatically loads and applies query_prompt_template from .meta.json.
Given: Index built with separate build_prompt_template and query_prompt_template
When: LeannSearcher.search("my query") is called
Then: Query embedding is computed with "query: my query" (query template applied)
This is the core R2 requirement - query templates must be auto-loaded and applied
during search without user intervention.
Expected failure: compute_embeddings called with raw "my query" instead of
"query: my query" because query template extraction is not implemented.
"""
# Setup: Build index with separate templates in new format
index_path = temp_index_dir / "query_template.leann"
builder = LeannBuilder(
backend_name="hnsw",
embedding_model="text-embedding-3-small",
embedding_mode="openai",
embedding_options={
"build_prompt_template": "doc: ",
"query_prompt_template": "query: ",
},
)
builder.add_text("Test document")
builder.build_index(str(index_path))
# Reset mock to ignore build calls
mock_compute_embeddings.reset_mock()
# Act: Search with query
searcher = LeannSearcher(index_path=str(index_path))
# Mock the backend search to avoid actual search
with patch.object(searcher.backend_impl, "search") as mock_backend_search:
mock_backend_search.return_value = {
"labels": [["test_id_0"]], # IDs (nested list for batch support)
"distances": [[0.9]], # Distances (nested list for batch support)
}
searcher.search("my query", top_k=1, recompute_embeddings=False)
# Assert: compute_embeddings was called with query template applied
assert mock_compute_embeddings.called, "compute_embeddings should be called during search"
# Get the actual text passed to compute_embeddings
call_args = mock_compute_embeddings.call_args
texts_arg = call_args[0][0] # First positional arg (list of texts)
assert len(texts_arg) == 1, "Should compute embedding for one query"
assert texts_arg[0] == "query: my query", (
f"Query template should be applied: expected 'query: my query', got '{texts_arg[0]}'"
)
def test_search_backward_compat_single_template(self, temp_index_dir, mock_compute_embeddings):
"""
Verify backward compatibility with old single prompt_template format.
Given: Index with old format (single prompt_template, no query_prompt_template)
When: LeannSearcher.search("my query") is called
Then: Query embedding computed with "doc: my query" (old template applied)
This ensures indexes built with the old single-template system continue
to work correctly with the new search implementation.
Expected failure: Old template not recognized/applied because backward
compatibility logic is not implemented.
"""
# Setup: Build index with old single-template format
index_path = temp_index_dir / "old_template.leann"
builder = LeannBuilder(
backend_name="hnsw",
embedding_model="text-embedding-3-small",
embedding_mode="openai",
embedding_options={"prompt_template": "doc: "}, # Old format
)
builder.add_text("Test document")
builder.build_index(str(index_path))
# Reset mock
mock_compute_embeddings.reset_mock()
# Act: Search
searcher = LeannSearcher(index_path=str(index_path))
with patch.object(searcher.backend_impl, "search") as mock_backend_search:
mock_backend_search.return_value = {"labels": [["test_id_0"]], "distances": [[0.9]]}
searcher.search("my query", top_k=1, recompute_embeddings=False)
# Assert: Old template was applied
call_args = mock_compute_embeddings.call_args
texts_arg = call_args[0][0]
assert texts_arg[0] == "doc: my query", (
f"Old prompt_template should be applied for backward compatibility: "
f"expected 'doc: my query', got '{texts_arg[0]}'"
)
def test_search_backward_compat_no_template(self, temp_index_dir, mock_compute_embeddings):
"""
Verify backward compatibility when no template is present in .meta.json.
Given: Index with no template in .meta.json (very old indexes)
When: LeannSearcher.search("my query") is called
Then: Query embedding computed with "my query" (no template, raw query)
This ensures the most basic backward compatibility - indexes without
any template support continue to work as before.
Expected failure: May fail if default template is incorrectly applied,
or if missing template causes error.
"""
# Setup: Build index without any template
index_path = temp_index_dir / "no_template.leann"
builder = LeannBuilder(
backend_name="hnsw",
embedding_model="text-embedding-3-small",
embedding_mode="openai",
# No embedding_options at all
)
builder.add_text("Test document")
builder.build_index(str(index_path))
# Reset mock
mock_compute_embeddings.reset_mock()
# Act: Search
searcher = LeannSearcher(index_path=str(index_path))
with patch.object(searcher.backend_impl, "search") as mock_backend_search:
mock_backend_search.return_value = {"labels": [["test_id_0"]], "distances": [[0.9]]}
searcher.search("my query", top_k=1, recompute_embeddings=False)
# Assert: No template applied (raw query)
call_args = mock_compute_embeddings.call_args
texts_arg = call_args[0][0]
assert texts_arg[0] == "my query", (
f"No template should be applied when missing from metadata: "
f"expected 'my query', got '{texts_arg[0]}'"
)
def test_search_override_via_provider_options(self, temp_index_dir, mock_compute_embeddings):
"""
Verify that explicit provider_options can override stored query template.
Given: Index with query_prompt_template: "query: "
When: search() called with provider_options={"prompt_template": "override: "}
Then: Query embedding computed with "override: test" (override takes precedence)
This enables users to experiment with different query templates without
rebuilding the index, or to handle special query types differently.
Expected failure: provider_options parameter is accepted via **kwargs but
not used. Query embedding computed with raw "test" instead of "override: test"
because override logic is not implemented.
"""
# Setup: Build index with query template
index_path = temp_index_dir / "override_template.leann"
builder = LeannBuilder(
backend_name="hnsw",
embedding_model="text-embedding-3-small",
embedding_mode="openai",
embedding_options={
"build_prompt_template": "doc: ",
"query_prompt_template": "query: ",
},
)
builder.add_text("Test document")
builder.build_index(str(index_path))
# Reset mock
mock_compute_embeddings.reset_mock()
# Act: Search with override
searcher = LeannSearcher(index_path=str(index_path))
with patch.object(searcher.backend_impl, "search") as mock_backend_search:
mock_backend_search.return_value = {"labels": [["test_id_0"]], "distances": [[0.9]]}
# This should accept provider_options parameter
searcher.search(
"test",
top_k=1,
recompute_embeddings=False,
provider_options={"prompt_template": "override: "},
)
# Assert: Override template was applied
call_args = mock_compute_embeddings.call_args
texts_arg = call_args[0][0]
assert texts_arg[0] == "override: test", (
f"Override template should take precedence: "
f"expected 'override: test', got '{texts_arg[0]}'"
)
class TestPromptTemplateReuseInChat:
"""Tests for prompt template reuse in chat/ask operations."""
@pytest.fixture
def temp_index_dir(self):
"""Create temporary directory for test indexes."""
with tempfile.TemporaryDirectory() as tmpdir:
yield Path(tmpdir)
@pytest.fixture
def mock_embeddings(self):
"""Mock compute_embeddings to return dummy embeddings."""
with patch("leann.api.compute_embeddings") as mock_compute:
mock_compute.return_value = np.array([[0.1, 0.2, 0.3]], dtype=np.float32)
yield mock_compute
@pytest.fixture
def mock_embedding_server_manager(self):
"""Mock EmbeddingServerManager for chat tests."""
with patch("leann.searcher_base.EmbeddingServerManager") as mock_manager_class:
mock_manager = Mock()
mock_manager.start_server.return_value = (True, 5557)
mock_manager_class.return_value = mock_manager
yield mock_manager
@pytest.fixture
def index_with_template(self, temp_index_dir, mock_embeddings):
"""Build an index with a prompt template."""
index_path = temp_index_dir / "chat_template_index.leann"
template = "document_query: "
builder = LeannBuilder(
backend_name="hnsw",
embedding_model="text-embedding-3-small",
embedding_mode="openai",
embedding_options={"prompt_template": template},
)
builder.add_text("Test document for chat")
builder.build_index(str(index_path))
return str(index_path), template
class TestPromptTemplateIntegrationWithEmbeddingModes:
"""Tests for prompt template compatibility with different embedding modes."""
@pytest.fixture
def temp_index_dir(self):
"""Create temporary directory for test indexes."""
with tempfile.TemporaryDirectory() as tmpdir:
yield Path(tmpdir)
@pytest.mark.parametrize(
"mode,model,template,filename_prefix",
[
(
"openai",
"text-embedding-3-small",
"Represent this for searching: ",
"openai_template",
),
("ollama", "nomic-embed-text", "search_query: ", "ollama_template"),
("sentence-transformers", "facebook/contriever", "query: ", "st_template"),
],
)
def test_prompt_template_metadata_with_embedding_modes(
self, temp_index_dir, mode, model, template, filename_prefix
):
"""Verify prompt template is saved correctly across different embedding modes.
Tests that prompt templates are persisted to .meta.json for:
- OpenAI mode (primary use case)
- Ollama mode (also supports templates)
- Sentence-transformers mode (saved for forward compatibility)
Expected behavior: Template is saved to .meta.json regardless of mode.
"""
with patch("leann.api.compute_embeddings") as mock_compute:
mock_compute.return_value = np.array([[0.1, 0.2, 0.3]], dtype=np.float32)
index_path = temp_index_dir / f"{filename_prefix}.leann"
builder = LeannBuilder(
backend_name="hnsw",
embedding_model=model,
embedding_mode=mode,
embedding_options={"prompt_template": template},
)
builder.add_text(f"{mode.capitalize()} test document")
builder.build_index(str(index_path))
# Verify metadata
meta_path = temp_index_dir / f"{filename_prefix}.leann.meta.json"
with open(meta_path, encoding="utf-8") as f:
meta_data = json.load(f)
assert meta_data["embedding_mode"] == mode
# Template should be saved for all modes (even if not used by some)
if "embedding_options" in meta_data:
assert meta_data["embedding_options"]["prompt_template"] == template
class TestQueryTemplateApplicationInComputeEmbedding:
"""Tests for query template application in compute_query_embedding() (Bug Fix).
These tests verify that query templates are applied consistently in BOTH
code paths (server and fallback) when computing query embeddings.
This addresses the bug where query templates were only applied in the
fallback path, not when using the embedding server (the default path).
Bug Context:
- Issue: Query templates were stored in metadata but only applied during
fallback (direct) computation, not when using embedding server
- Fix: Move template application to BEFORE any computation path in
compute_query_embedding() (searcher_base.py:107-110)
- Impact: Critical for models like EmbeddingGemma that require task-specific
templates for optimal performance
These tests ensure the fix works correctly and prevent regression.
"""
@pytest.fixture
def temp_index_with_template(self):
"""Create a temporary index with query template in metadata"""
with tempfile.TemporaryDirectory() as tmpdir:
index_dir = Path(tmpdir)
index_file = index_dir / "test.leann"
meta_file = index_dir / "test.leann.meta.json"
# Create minimal metadata with query template
metadata = {
"version": "1.0",
"backend_name": "hnsw",
"embedding_model": "text-embedding-embeddinggemma-300m-qat",
"dimensions": 768,
"embedding_mode": "openai",
"backend_kwargs": {
"graph_degree": 32,
"complexity": 64,
"distance_metric": "cosine",
},
"embedding_options": {
"base_url": "http://localhost:1234/v1",
"api_key": "test-key",
"build_prompt_template": "title: none | text: ",
"query_prompt_template": "task: search result | query: ",
},
}
meta_file.write_text(json.dumps(metadata, indent=2))
# Create minimal HNSW index file (empty is okay for this test)
index_file.write_bytes(b"")
yield str(index_file)
def test_query_template_applied_in_fallback_path(self, temp_index_with_template):
"""Test that query template is applied when using fallback (direct) path"""
from leann.searcher_base import BaseSearcher
# Create a concrete implementation for testing
class TestSearcher(BaseSearcher):
def search(self, query_vectors, top_k, complexity, beam_width=1, **kwargs):
return {"labels": [], "distances": []}
searcher = object.__new__(TestSearcher)
searcher.index_path = Path(temp_index_with_template)
searcher.index_dir = searcher.index_path.parent
# Load metadata
meta_file = searcher.index_dir / f"{searcher.index_path.name}.meta.json"
with open(meta_file) as f:
searcher.meta = json.load(f)
searcher.embedding_model = searcher.meta["embedding_model"]
searcher.embedding_mode = searcher.meta.get("embedding_mode", "sentence-transformers")
searcher.embedding_options = searcher.meta.get("embedding_options", {})
# Mock compute_embeddings to capture the query text
captured_queries = []
def mock_compute_embeddings(texts, model, mode, provider_options=None):
captured_queries.extend(texts)
return np.random.rand(len(texts), 768).astype(np.float32)
with patch(
"leann.embedding_compute.compute_embeddings", side_effect=mock_compute_embeddings
):
# Call compute_query_embedding with template (fallback path)
result = searcher.compute_query_embedding(
query="vector database",
use_server_if_available=False, # Force fallback path
query_template="task: search result | query: ",
)
# Verify template was applied
assert len(captured_queries) == 1
assert captured_queries[0] == "task: search result | query: vector database"
assert result.shape == (1, 768)
def test_query_template_applied_in_server_path(self, temp_index_with_template):
"""Test that query template is applied when using server path"""
from leann.searcher_base import BaseSearcher
# Create a concrete implementation for testing
class TestSearcher(BaseSearcher):
def search(self, query_vectors, top_k, complexity, beam_width=1, **kwargs):
return {"labels": [], "distances": []}
searcher = object.__new__(TestSearcher)
searcher.index_path = Path(temp_index_with_template)
searcher.index_dir = searcher.index_path.parent
# Load metadata
meta_file = searcher.index_dir / f"{searcher.index_path.name}.meta.json"
with open(meta_file) as f:
searcher.meta = json.load(f)
searcher.embedding_model = searcher.meta["embedding_model"]
searcher.embedding_mode = searcher.meta.get("embedding_mode", "sentence-transformers")
searcher.embedding_options = searcher.meta.get("embedding_options", {})
# Mock the server methods to capture the query text
captured_queries = []
def mock_ensure_server_running(passages_file, port):
return port
def mock_compute_embedding_via_server(chunks, port):
captured_queries.extend(chunks)
return np.random.rand(len(chunks), 768).astype(np.float32)
searcher._ensure_server_running = mock_ensure_server_running
searcher._compute_embedding_via_server = mock_compute_embedding_via_server
# Call compute_query_embedding with template (server path)
result = searcher.compute_query_embedding(
query="vector database",
use_server_if_available=True, # Use server path
query_template="task: search result | query: ",
)
# Verify template was applied BEFORE calling server
assert len(captured_queries) == 1
assert captured_queries[0] == "task: search result | query: vector database"
assert result.shape == (1, 768)
def test_query_template_without_template_parameter(self, temp_index_with_template):
"""Test that query is unchanged when no template is provided"""
from leann.searcher_base import BaseSearcher
class TestSearcher(BaseSearcher):
def search(self, query_vectors, top_k, complexity, beam_width=1, **kwargs):
return {"labels": [], "distances": []}
searcher = object.__new__(TestSearcher)
searcher.index_path = Path(temp_index_with_template)
searcher.index_dir = searcher.index_path.parent
meta_file = searcher.index_dir / f"{searcher.index_path.name}.meta.json"
with open(meta_file) as f:
searcher.meta = json.load(f)
searcher.embedding_model = searcher.meta["embedding_model"]
searcher.embedding_mode = searcher.meta.get("embedding_mode", "sentence-transformers")
searcher.embedding_options = searcher.meta.get("embedding_options", {})
captured_queries = []
def mock_compute_embeddings(texts, model, mode, provider_options=None):
captured_queries.extend(texts)
return np.random.rand(len(texts), 768).astype(np.float32)
with patch(
"leann.embedding_compute.compute_embeddings", side_effect=mock_compute_embeddings
):
searcher.compute_query_embedding(
query="vector database",
use_server_if_available=False,
query_template=None, # No template
)
# Verify query is unchanged
assert len(captured_queries) == 1
assert captured_queries[0] == "vector database"
def test_query_template_consistency_between_paths(self, temp_index_with_template):
"""Test that both paths apply template identically"""
from leann.searcher_base import BaseSearcher
class TestSearcher(BaseSearcher):
def search(self, query_vectors, top_k, complexity, beam_width=1, **kwargs):
return {"labels": [], "distances": []}
searcher = object.__new__(TestSearcher)
searcher.index_path = Path(temp_index_with_template)
searcher.index_dir = searcher.index_path.parent
meta_file = searcher.index_dir / f"{searcher.index_path.name}.meta.json"
with open(meta_file) as f:
searcher.meta = json.load(f)
searcher.embedding_model = searcher.meta["embedding_model"]
searcher.embedding_mode = searcher.meta.get("embedding_mode", "sentence-transformers")
searcher.embedding_options = searcher.meta.get("embedding_options", {})
query_template = "task: search result | query: "
original_query = "vector database"
# Capture queries from fallback path
fallback_queries = []
def mock_compute_embeddings(texts, model, mode, provider_options=None):
fallback_queries.extend(texts)
return np.random.rand(len(texts), 768).astype(np.float32)
with patch(
"leann.embedding_compute.compute_embeddings", side_effect=mock_compute_embeddings
):
searcher.compute_query_embedding(
query=original_query,
use_server_if_available=False,
query_template=query_template,
)
# Capture queries from server path
server_queries = []
def mock_ensure_server_running(passages_file, port):
return port
def mock_compute_embedding_via_server(chunks, port):
server_queries.extend(chunks)
return np.random.rand(len(chunks), 768).astype(np.float32)
searcher._ensure_server_running = mock_ensure_server_running
searcher._compute_embedding_via_server = mock_compute_embedding_via_server
searcher.compute_query_embedding(
query=original_query,
use_server_if_available=True,
query_template=query_template,
)
# Verify both paths produced identical templated queries
assert len(fallback_queries) == 1
assert len(server_queries) == 1
assert fallback_queries[0] == server_queries[0]
assert fallback_queries[0] == f"{query_template}{original_query}"
def test_query_template_with_empty_string(self, temp_index_with_template):
"""Test behavior with empty template string"""
from leann.searcher_base import BaseSearcher
class TestSearcher(BaseSearcher):
def search(self, query_vectors, top_k, complexity, beam_width=1, **kwargs):
return {"labels": [], "distances": []}
searcher = object.__new__(TestSearcher)
searcher.index_path = Path(temp_index_with_template)
searcher.index_dir = searcher.index_path.parent
meta_file = searcher.index_dir / f"{searcher.index_path.name}.meta.json"
with open(meta_file) as f:
searcher.meta = json.load(f)
searcher.embedding_model = searcher.meta["embedding_model"]
searcher.embedding_mode = searcher.meta.get("embedding_mode", "sentence-transformers")
searcher.embedding_options = searcher.meta.get("embedding_options", {})
captured_queries = []
def mock_compute_embeddings(texts, model, mode, provider_options=None):
captured_queries.extend(texts)
return np.random.rand(len(texts), 768).astype(np.float32)
with patch(
"leann.embedding_compute.compute_embeddings", side_effect=mock_compute_embeddings
):
searcher.compute_query_embedding(
query="vector database",
use_server_if_available=False,
query_template="", # Empty string
)
# Empty string is falsy, so no template should be applied
assert captured_queries[0] == "vector database"