Compare commits

...

7 Commits

Author SHA1 Message Date
GitHub Actions
738f1dbab8 chore: release v0.3.1 2025-08-19 05:56:45 +00:00
yichuan520030910320
37d990d51c [feature] fix cli 2025-08-18 22:55:43 -07:00
Andy Lee
a6f07a54f1 fix: Use uv venv for Arch Linux CI wheel installation (#69)
- Use astral-sh/setup-uv@v4 action for consistency with other jobs
- Create virtual environment with uv venv to bypass PEP 668 restrictions
- Install wheels using uv pip install for faster dependency resolution
- Maintain tool consistency across the entire CI pipeline

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-16 21:32:19 -07:00
Andy Lee
46905e0687 feat: Improve DiskANN cross-platform compatibility and add Arch Linux support (#66)
* feat: Enhance CLI with improved list and smart remove commands

##  New Features

### 🏠 Enhanced `leann list` command
- **Better UX**: Current project shown first with clear separation
- **Visual improvements**: Icons (🏠/📂), better formatting, size info
- **Smart guidance**: Context-aware usage examples and getting started tips

### 🛡️ Smart `leann remove` command
- **Safety first**: Always shows ALL matching indexes across projects
- **Intelligent handling**:
  - Single match: Clear location display with cross-project warnings
  - Multiple matches: Interactive selection with final confirmation
- **Prevents accidents**: No more deleting wrong indexes due to name conflicts
- **User-friendly**: 'c' to cancel, clear visual hierarchy, detailed info

### 🔧 Technical improvements
- **Clean logging**: Hide debug messages for better CLI experience
- **Comprehensive search**: Always scan all projects for transparency
- **Error handling**: Graceful handling of edge cases and user input

## 🎯 Impact
- **Safer**: Eliminates risk of accidental index deletion
- **Clearer**: Users always know what they're operating on
- **Smarter**: Automatic detection and handling of common scenarios

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: vscode ruff, and format

* fix: Update DiskANN submodule with MKL linking improvements

Updates DiskANN submodule to include fix for MKL linking issues:
- Replaces global link_libraries() with target-specific linking
- Uses dynamic MKL linking (mkl_rt) for better cross-platform compatibility
- Prevents MKL contamination of unrelated targets (like zlib tests)
- Resolves build failures on strict linkers (Arch Linux) while maintaining Ubuntu compatibility

DiskANN commit: c593831 - fix: Replace global MKL linking with target-specific approach

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: all linux deps

* fix: Update Intel MKL download link to avoid 403 error

- Replace problematic Intel download URL that returns 403 Forbidden
- Use general Intel oneAPI MKL page instead of specific download parameters
- This fixes the lychee link checker CI failure

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Configure lychee to use browser User-Agent for Intel links

- Replace domain exclusion with browser User-Agent to properly check Intel links
- Intel website blocks automated tools but allows browser-like requests
- This enables proper link validation while avoiding 403 Forbidden errors

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: Use curl User-Agent for lychee link checking

Intel website has specific anti-bot logic:
- Blocks browser User-Agents (returns 403)
- Blocks lychee default User-Agent (returns 403)
- Allows curl User-Agent (returns 200)

This enables proper link validation for Intel documentation.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-16 14:42:20 -07:00
Andy Lee
838ade231e 🔗 Auto-register apps: Universal index discovery (#64)
* feat: Enhance CLI with improved list and smart remove commands

##  New Features

### 🏠 Enhanced `leann list` command
- **Better UX**: Current project shown first with clear separation
- **Visual improvements**: Icons (🏠/📂), better formatting, size info
- **Smart guidance**: Context-aware usage examples and getting started tips

### 🛡️ Smart `leann remove` command
- **Safety first**: Always shows ALL matching indexes across projects
- **Intelligent handling**:
  - Single match: Clear location display with cross-project warnings
  - Multiple matches: Interactive selection with final confirmation
- **Prevents accidents**: No more deleting wrong indexes due to name conflicts
- **User-friendly**: 'c' to cancel, clear visual hierarchy, detailed info

### 🔧 Technical improvements
- **Clean logging**: Hide debug messages for better CLI experience
- **Comprehensive search**: Always scan all projects for transparency
- **Error handling**: Graceful handling of edge cases and user input

## 🎯 Impact
- **Safer**: Eliminates risk of accidental index deletion
- **Clearer**: Users always know what they're operating on
- **Smarter**: Automatic detection and handling of common scenarios

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* chore: vscode ruff, and format

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-16 11:50:25 -07:00
Andy Lee
da6540decd feat: Enhance CLI with improved list and smart remove commands (#63)
- **Better UX**: Current project shown first with clear separation
- **Visual improvements**: Icons (🏠/📂), better formatting, size info
- **Smart guidance**: Context-aware usage examples and getting started tips

- **Safety first**: Always shows ALL matching indexes across projects
- **Intelligent handling**:
  - Single match: Clear location display with cross-project warnings
  - Multiple matches: Interactive selection with final confirmation
- **Prevents accidents**: No more deleting wrong indexes due to name conflicts
- **User-friendly**: 'c' to cancel, clear visual hierarchy, detailed info

- **Clean logging**: Hide debug messages for better CLI experience
- **Comprehensive search**: Always scan all projects for transparency
- **Error handling**: Graceful handling of edge cases and user input

- **Safer**: Eliminates risk of accidental index deletion
- **Clearer**: Users always know what they're operating on
- **Smarter**: Automatic detection and handling of common scenarios
2025-08-15 23:49:47 -07:00
yichuan520030910320
39e18a7c11 [chore] remove gitattribute 2025-08-15 23:12:24 -07:00
20 changed files with 4291 additions and 3885 deletions

1
.gitattributes vendored
View File

@@ -1 +0,0 @@
paper_plot/data/big_graph_degree_data.npz filter=lfs diff=lfs merge=lfs -text

View File

@@ -87,7 +87,7 @@ jobs:
runs-on: ${{ matrix.os }} runs-on: ${{ matrix.os }}
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v5
with: with:
ref: ${{ inputs.ref }} ref: ${{ inputs.ref }}
submodules: recursive submodules: recursive
@@ -98,21 +98,23 @@ jobs:
python-version: ${{ matrix.python }} python-version: ${{ matrix.python }}
- name: Install uv - name: Install uv
uses: astral-sh/setup-uv@v4 uses: astral-sh/setup-uv@v6
- name: Install system dependencies (Ubuntu) - name: Install system dependencies (Ubuntu)
if: runner.os == 'Linux' if: runner.os == 'Linux'
run: | run: |
sudo apt-get update sudo apt-get update
sudo apt-get install -y libomp-dev libboost-all-dev protobuf-compiler libzmq3-dev \ sudo apt-get install -y libomp-dev libboost-all-dev protobuf-compiler libzmq3-dev \
pkg-config libopenblas-dev patchelf libabsl-dev libaio-dev libprotobuf-dev pkg-config libabsl-dev libaio-dev libprotobuf-dev \
patchelf
# Install Intel MKL for DiskANN # Install Intel MKL for DiskANN
wget -q https://registrationcenter-download.intel.com/akdlm/IRC_NAS/79153e0f-74d7-45af-b8c2-258941adf58a/intel-onemkl-2025.0.0.940.sh wget -q https://registrationcenter-download.intel.com/akdlm/IRC_NAS/79153e0f-74d7-45af-b8c2-258941adf58a/intel-onemkl-2025.0.0.940.sh
sudo sh intel-onemkl-2025.0.0.940.sh -a --components intel.oneapi.lin.mkl.devel --action install --eula accept -s sudo sh intel-onemkl-2025.0.0.940.sh -a --components intel.oneapi.lin.mkl.devel --action install --eula accept -s
source /opt/intel/oneapi/setvars.sh source /opt/intel/oneapi/setvars.sh
echo "MKLROOT=/opt/intel/oneapi/mkl/latest" >> $GITHUB_ENV echo "MKLROOT=/opt/intel/oneapi/mkl/latest" >> $GITHUB_ENV
echo "LD_LIBRARY_PATH=/opt/intel/oneapi/mkl/latest/lib/intel64:$LD_LIBRARY_PATH" >> $GITHUB_ENV echo "LD_LIBRARY_PATH=/opt/intel/oneapi/compiler/latest/linux/compiler/lib/intel64_lin" >> $GITHUB_ENV
echo "LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/intel/oneapi/mkl/latest/lib/intel64" >> $GITHUB_ENV
- name: Install system dependencies (macOS) - name: Install system dependencies (macOS)
if: runner.os == 'macOS' if: runner.os == 'macOS'
@@ -304,3 +306,53 @@ jobs:
with: with:
name: packages-${{ matrix.os }}-py${{ matrix.python }} name: packages-${{ matrix.os }}-py${{ matrix.python }}
path: packages/*/dist/ path: packages/*/dist/
arch-smoke:
name: Arch Linux smoke test (install & import)
needs: build
runs-on: ubuntu-latest
container:
image: archlinux:latest
steps:
- name: Prepare system
run: |
pacman -Syu --noconfirm
pacman -S --noconfirm python python-pip gcc git zlib openssl
- name: Download ALL wheel artifacts from this run
uses: actions/download-artifact@v5
with:
# Don't specify name, download all artifacts
path: ./wheels
- name: Install uv
uses: astral-sh/setup-uv@v6
- name: Create virtual environment and install wheels
run: |
uv venv
source .venv/bin/activate || source .venv/Scripts/activate
uv pip install --find-links wheels leann-core
uv pip install --find-links wheels leann-backend-hnsw
uv pip install --find-links wheels leann-backend-diskann
uv pip install --find-links wheels leann
- name: Import & tiny runtime check
env:
OMP_NUM_THREADS: 1
MKL_NUM_THREADS: 1
run: |
source .venv/bin/activate || source .venv/Scripts/activate
python - <<'PY'
import leann
import leann_backend_hnsw as h
import leann_backend_diskann as d
from leann import LeannBuilder, LeannSearcher
b = LeannBuilder(backend_name="hnsw")
b.add_text("hello arch")
b.build_index("arch_demo.leann")
s = LeannSearcher("arch_demo.leann")
print("search:", s.search("hello", top_k=1))
PY

View File

@@ -14,6 +14,6 @@ jobs:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
- uses: lycheeverse/lychee-action@v2 - uses: lycheeverse/lychee-action@v2
with: with:
args: --no-progress --insecure README.md docs/ apps/ examples/ benchmarks/ args: --no-progress --insecure --user-agent 'curl/7.68.0' README.md docs/ apps/ examples/ benchmarks/
env: env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

1
.gitignore vendored
View File

@@ -18,6 +18,7 @@ demo/experiment_results/**/*.json
*.eml *.eml
*.emlx *.emlx
*.json *.json
!.vscode/*.json
*.sh *.sh
*.txt *.txt
!CMakeLists.txt !CMakeLists.txt

5
.vscode/extensions.json vendored Normal file
View File

@@ -0,0 +1,5 @@
{
"recommendations": [
"charliermarsh.ruff",
]
}

22
.vscode/settings.json vendored Normal file
View File

@@ -0,0 +1,22 @@
{
"python.defaultInterpreterPath": ".venv/bin/python",
"python.terminal.activateEnvironment": true,
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.organizeImports": "explicit",
"source.fixAll": "explicit"
},
"editor.insertSpaces": true,
"editor.tabSize": 4
},
"ruff.enable": true,
"files.watcherExclude": {
"**/.venv/**": true,
"**/__pycache__/**": true,
"**/*.egg-info/**": true,
"**/build/**": true,
"**/dist/**": true
}
}

View File

@@ -87,17 +87,60 @@ git submodule update --init --recursive
``` ```
**macOS:** **macOS:**
Note: DiskANN requires MacOS 13.3 or later.
```bash ```bash
brew install llvm libomp boost protobuf zeromq pkgconf brew install libomp boost protobuf zeromq pkgconf
CC=$(brew --prefix llvm)/bin/clang CXX=$(brew --prefix llvm)/bin/clang++ uv sync uv sync --extra diskann
``` ```
**Linux:** **Linux (Ubuntu/Debian):**
```bash
# Ubuntu/Debian (For Arch Linux: sudo pacman -S blas lapack openblas libaio boost protobuf abseil-cpp zeromq)
sudo apt-get update && sudo apt-get install -y libomp-dev libboost-all-dev protobuf-compiler libabsl-dev libmkl-full-dev libaio-dev libzmq3-dev
uv sync Note: On Ubuntu 20.04, you may need to build a newer Abseil and pin Protobuf (e.g., v3.20.x) for building DiskANN. See [Issue #30](https://github.com/yichuan-w/LEANN/issues/30) for a step-by-step note.
You can manually install [Intel oneAPI MKL](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html) instead of `libmkl-full-dev` for DiskANN. You can also use `libopenblas-dev` for building HNSW only, by removing `--extra diskann` in the command below.
```bash
sudo apt-get update && sudo apt-get install -y \
libomp-dev libboost-all-dev protobuf-compiler libzmq3-dev \
pkg-config libabsl-dev libaio-dev libprotobuf-dev \
libmkl-full-dev
uv sync --extra diskann
```
**Linux (Arch Linux):**
```bash
sudo pacman -Syu && sudo pacman -S --needed base-devel cmake pkgconf git gcc \
boost boost-libs protobuf abseil-cpp libaio zeromq
# For MKL in DiskANN
sudo pacman -S --needed base-devel git
git clone https://aur.archlinux.org/paru-bin.git
cd paru-bin && makepkg -si
paru -S intel-oneapi-mkl intel-oneapi-compiler
source /opt/intel/oneapi/setvars.sh
uv sync --extra diskann
```
**Linux (RHEL / CentOS Stream / Oracle / Rocky / AlmaLinux):**
See [Issue #50](https://github.com/yichuan-w/LEANN/issues/50) for more details.
```bash
sudo dnf groupinstall -y "Development Tools"
sudo dnf install -y libomp-devel boost-devel protobuf-compiler protobuf-devel \
abseil-cpp-devel libaio-devel zeromq-devel pkgconf-pkg-config
# For MKL in DiskANN
sudo dnf install -y intel-oneapi-mkl intel-oneapi-mkl-devel \
intel-oneapi-openmp || sudo dnf install -y intel-oneapi-compiler
source /opt/intel/oneapi/setvars.sh
uv sync --extra diskann
``` ```
</details> </details>
@@ -484,6 +527,9 @@ leann ask my-docs --interactive
# List all your indexes # List all your indexes
leann list leann list
# Remove an index
leann remove my-docs
``` ```
**Key CLI features:** **Key CLI features:**
@@ -496,7 +542,7 @@ leann list
<details> <details>
<summary><strong>📋 Click to expand: Complete CLI Reference</strong></summary> <summary><strong>📋 Click to expand: Complete CLI Reference</strong></summary>
You can use `leann --help`, or `leann build --help`, `leann search --help`, `leann ask --help` to get the complete CLI reference. You can use `leann --help`, or `leann build --help`, `leann search --help`, `leann ask --help`, `leann list --help`, `leann remove --help` to get the complete CLI reference.
**Build Command:** **Build Command:**
```bash ```bash
@@ -534,6 +580,31 @@ Options:
--top-k N Retrieval count (default: 20) --top-k N Retrieval count (default: 20)
``` ```
**List Command:**
```bash
leann list
# Lists all indexes across all projects with status indicators:
# ✅ - Index is complete and ready to use
# ❌ - Index is incomplete or corrupted
# 📁 - CLI-created index (in .leann/indexes/)
# 📄 - App-created index (*.leann.meta.json files)
```
**Remove Command:**
```bash
leann remove INDEX_NAME [OPTIONS]
Options:
--force, -f Force removal without confirmation
# Smart removal: automatically finds and safely removes indexes
# - Shows all matching indexes across projects
# - Requires confirmation for cross-project removal
# - Interactive selection when multiple matches found
# - Supports both CLI and app-created indexes
```
</details> </details>
## 🏗️ Architecture & How It Works ## 🏗️ Architecture & How It Works

View File

@@ -10,6 +10,7 @@ from typing import Any
import dotenv import dotenv
from leann.api import LeannBuilder, LeannChat from leann.api import LeannBuilder, LeannChat
from leann.registry import register_project_directory
from llama_index.core.node_parser import SentenceSplitter from llama_index.core.node_parser import SentenceSplitter
dotenv.load_dotenv() dotenv.load_dotenv()
@@ -214,6 +215,11 @@ class BaseRAGExample(ABC):
builder.build_index(index_path) builder.build_index(index_path)
print(f"Index saved to: {index_path}") print(f"Index saved to: {index_path}")
# Register project directory so leann list can discover this index
# The index is saved as args.index_dir/index_name.leann
# We want to register the current working directory where the app is run
register_project_directory(Path.cwd())
return index_path return index_path
async def run_interactive_chat(self, args, index_path: str): async def run_interactive_chat(self, args, index_path: str):

View File

@@ -183,6 +183,9 @@ class Benchmark:
start_time = time.time() start_time = time.time()
with torch.no_grad(): with torch.no_grad():
self.model(input_ids=input_ids, attention_mask=attention_mask) self.model(input_ids=input_ids, attention_mask=attention_mask)
# mps sync
if torch.backends.mps.is_available():
torch.mps.synchronize()
end_time = time.time() end_time = time.time()
return end_time - start_time return end_time - start_time

View File

@@ -4,8 +4,8 @@ build-backend = "scikit_build_core.build"
[project] [project]
name = "leann-backend-diskann" name = "leann-backend-diskann"
version = "0.3.0" version = "0.3.1"
dependencies = ["leann-core==0.3.0", "numpy", "protobuf>=3.19.0"] dependencies = ["leann-core==0.3.1", "numpy", "protobuf>=3.19.0"]
[tool.scikit-build] [tool.scikit-build]
# Key: simplified CMake path # Key: simplified CMake path

View File

@@ -6,10 +6,10 @@ build-backend = "scikit_build_core.build"
[project] [project]
name = "leann-backend-hnsw" name = "leann-backend-hnsw"
version = "0.3.0" version = "0.3.1"
description = "Custom-built HNSW (Faiss) backend for the Leann toolkit." description = "Custom-built HNSW (Faiss) backend for the Leann toolkit."
dependencies = [ dependencies = [
"leann-core==0.3.0", "leann-core==0.3.1",
"numpy", "numpy",
"pyzmq>=23.0.0", "pyzmq>=23.0.0",
"msgpack>=1.0.0", "msgpack>=1.0.0",

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "leann-core" name = "leann-core"
version = "0.3.0" version = "0.3.1"
description = "Core API and plugin system for LEANN" description = "Core API and plugin system for LEANN"
readme = "README.md" readme = "README.md"
requires-python = ">=3.9" requires-python = ">=3.9"

View File

@@ -614,7 +614,7 @@ class LeannSearcher:
zmq_port=zmq_port, zmq_port=zmq_port,
) )
# logger.info(f" Generated embedding shape: {query_embedding.shape}") # logger.info(f" Generated embedding shape: {query_embedding.shape}")
time.time() - start_time # time.time() - start_time
# logger.info(f" Embedding time: {embedding_time} seconds") # logger.info(f" Embedding time: {embedding_time} seconds")
start_time = time.time() start_time = time.time()
@@ -680,8 +680,9 @@ class LeannSearcher:
This method should be called after you're done using the searcher, This method should be called after you're done using the searcher,
especially in test environments or batch processing scenarios. especially in test environments or batch processing scenarios.
""" """
if hasattr(self.backend_impl, "embedding_server_manager"): backend = getattr(self.backend_impl, "embedding_server_manager", None)
self.backend_impl.embedding_server_manager.stop_server() if backend is not None:
backend.stop_server()
# Enable automatic cleanup patterns # Enable automatic cleanup patterns
def __enter__(self): def __enter__(self):

View File

@@ -707,20 +707,28 @@ class GeminiChat(LLMInterface):
logger.info(f"Sending request to Gemini with model {self.model}") logger.info(f"Sending request to Gemini with model {self.model}")
try: try:
# Set generation configuration from google.genai.types import GenerateContentConfig
generation_config = {
"temperature": kwargs.get("temperature", 0.7), generation_config = GenerateContentConfig(
"max_output_tokens": kwargs.get("max_tokens", 1000), temperature=kwargs.get("temperature", 0.7),
} max_output_tokens=kwargs.get("max_tokens", 1000),
)
# Handle top_p parameter # Handle top_p parameter
if "top_p" in kwargs: if "top_p" in kwargs:
generation_config["top_p"] = kwargs["top_p"] generation_config.top_p = kwargs["top_p"]
response = self.client.models.generate_content( response = self.client.models.generate_content(
model=self.model, contents=prompt, config=generation_config model=self.model,
contents=prompt,
config=generation_config,
) )
return response.text.strip() # Handle potential None response text
response_text = response.text
if response_text is None:
logger.warning("Gemini returned None response text")
return ""
return response_text.strip()
except Exception as e: except Exception as e:
logger.error(f"Error communicating with Gemini: {e}") logger.error(f"Error communicating with Gemini: {e}")
return f"Error: Could not get a response from Gemini. Details: {e}" return f"Error: Could not get a response from Gemini. Details: {e}"

View File

@@ -1,13 +1,14 @@
import argparse import argparse
import asyncio import asyncio
from pathlib import Path from pathlib import Path
from typing import Union from typing import Optional, Union
from llama_index.core import SimpleDirectoryReader from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter from llama_index.core.node_parser import SentenceSplitter
from tqdm import tqdm from tqdm import tqdm
from .api import LeannBuilder, LeannChat, LeannSearcher from .api import LeannBuilder, LeannChat, LeannSearcher
from .registry import register_project_directory
def extract_pdf_text_with_pymupdf(file_path: str) -> str: def extract_pdf_text_with_pymupdf(file_path: str) -> str:
@@ -84,6 +85,7 @@ Examples:
leann search my-docs "query" # Search in my-docs index leann search my-docs "query" # Search in my-docs index
leann ask my-docs "question" # Ask my-docs index leann ask my-docs "question" # Ask my-docs index
leann list # List all stored indexes leann list # List all stored indexes
leann remove my-docs # Remove an index (local first, then global)
""", """,
) )
@@ -251,35 +253,18 @@ Examples:
# List command # List command
subparsers.add_parser("list", help="List all indexes") subparsers.add_parser("list", help="List all indexes")
# Remove command
remove_parser = subparsers.add_parser("remove", help="Remove an index")
remove_parser.add_argument("index_name", help="Index name to remove")
remove_parser.add_argument(
"--force", "-f", action="store_true", help="Force removal without confirmation"
)
return parser return parser
def register_project_dir(self): def register_project_dir(self):
"""Register current project directory in global registry""" """Register current project directory in global registry"""
global_registry = Path.home() / ".leann" / "projects.json" register_project_directory()
global_registry.parent.mkdir(exist_ok=True)
current_dir = str(Path.cwd())
# Load existing registry
projects = []
if global_registry.exists():
try:
import json
with open(global_registry) as f:
projects = json.load(f)
except Exception:
projects = []
# Add current directory if not already present
if current_dir not in projects:
projects.append(current_dir)
# Save registry
import json
with open(global_registry, "w") as f:
json.dump(projects, f, indent=2)
def _build_gitignore_parser(self, docs_dir: str): def _build_gitignore_parser(self, docs_dir: str):
"""Build gitignore parser using gitignore-parser library.""" """Build gitignore parser using gitignore-parser library."""
@@ -339,8 +324,6 @@ Examples:
return False return False
def list_indexes(self): def list_indexes(self):
print("Stored LEANN indexes:")
# Get all project directories with .leann # Get all project directories with .leann
global_registry = Path.home() / ".leann" / "projects.json" global_registry = Path.home() / ".leann" / "projects.json"
all_projects = [] all_projects = []
@@ -366,55 +349,479 @@ Examples:
if (current_path / ".leann" / "indexes").exists() and current_path not in valid_projects: if (current_path / ".leann" / "indexes").exists() and current_path not in valid_projects:
valid_projects.append(current_path) valid_projects.append(current_path)
if not valid_projects: # Separate current and other projects
print( other_projects = []
"No indexes found. Use 'leann build <name> --docs <dir> [<dir2> ...]' to create one."
)
return
total_indexes = 0
current_dir = Path.cwd()
for project_path in valid_projects: for project_path in valid_projects:
indexes_dir = project_path / ".leann" / "indexes" if project_path != current_path:
if not indexes_dir.exists(): other_projects.append(project_path)
continue
index_dirs = [d for d in indexes_dir.iterdir() if d.is_dir()] print("📚 LEANN Indexes")
if not index_dirs: print("=" * 50)
continue
# Show project header total_indexes = 0
if project_path == current_dir: current_indexes_count = 0
print(f"\n📁 Current project ({project_path}):")
else:
print(f"\n📂 {project_path}:")
for index_dir in index_dirs: # Show current project first (most important)
print("\n🏠 Current Project")
print(f" {current_path}")
print(" " + "" * 45)
current_indexes = self._discover_indexes_in_project(current_path)
if current_indexes:
for idx in current_indexes:
total_indexes += 1 total_indexes += 1
index_name = index_dir.name current_indexes_count += 1
meta_file = index_dir / "documents.leann.meta.json" type_icon = "📁" if idx["type"] == "cli" else "📄"
status = "" if meta_file.exists() else "" print(f" {current_indexes_count}. {type_icon} {idx['name']} {idx['status']}")
if idx["size_mb"] > 0:
print(f" 📦 Size: {idx['size_mb']:.1f} MB")
else:
print(" 📭 No indexes in current project")
print(f" {total_indexes}. {index_name} [{status}]") # Show other projects (reference information)
if status == "": if other_projects:
size_mb = sum(f.stat().st_size for f in index_dir.iterdir() if f.is_file()) / ( print("\n\n🗂️ Other Projects")
1024 * 1024 print(" " + "" * 45)
for project_path in other_projects:
project_indexes = self._discover_indexes_in_project(project_path)
if not project_indexes:
continue
print(f"\n 📂 {project_path.name}")
print(f" {project_path}")
for idx in project_indexes:
total_indexes += 1
type_icon = "📁" if idx["type"] == "cli" else "📄"
print(f"{type_icon} {idx['name']} {idx['status']}")
if idx["size_mb"] > 0:
print(f" 📦 {idx['size_mb']:.1f} MB")
# Summary and usage info
print("\n" + "=" * 50)
if total_indexes == 0:
print("💡 Get started:")
print(" leann build my-docs --docs ./documents")
else:
# Count only projects that have at least one discoverable index
projects_count = sum(
1 for p in valid_projects if len(self._discover_indexes_in_project(p)) > 0
)
print(f"📊 Total: {total_indexes} indexes across {projects_count} projects")
if current_indexes_count > 0:
print("\n💫 Quick start (current project):")
# Get first index from current project for example
current_indexes_dir = current_path / ".leann" / "indexes"
if current_indexes_dir.exists():
current_index_dirs = [d for d in current_indexes_dir.iterdir() if d.is_dir()]
if current_index_dirs:
example_name = current_index_dirs[0].name
print(f' leann search {example_name} "your query"')
print(f" leann ask {example_name} --interactive")
else:
print("\n💡 Create your first index:")
print(" leann build my-docs --docs ./documents")
def _discover_indexes_in_project(self, project_path: Path):
"""Discover all indexes in a project directory (both CLI and apps formats)"""
indexes = []
# 1. CLI format: .leann/indexes/index_name/
cli_indexes_dir = project_path / ".leann" / "indexes"
if cli_indexes_dir.exists():
for index_dir in cli_indexes_dir.iterdir():
if index_dir.is_dir():
meta_file = index_dir / "documents.leann.meta.json"
status = "" if meta_file.exists() else ""
size_mb = 0
if meta_file.exists():
try:
size_mb = sum(
f.stat().st_size for f in index_dir.iterdir() if f.is_file()
) / (1024 * 1024)
except (OSError, PermissionError):
pass
indexes.append(
{
"name": index_dir.name,
"type": "cli",
"status": status,
"size_mb": size_mb,
"path": index_dir,
}
) )
print(f" Size: {size_mb:.1f} MB")
if total_indexes > 0: # 2. Apps format: *.leann.meta.json files anywhere in the project
print(f"\nTotal: {total_indexes} indexes across {len(valid_projects)} projects") cli_indexes_dir = project_path / ".leann" / "indexes"
print("\nUsage (current project only):") for meta_file in project_path.rglob("*.leann.meta.json"):
if meta_file.is_file():
# Skip CLI-built indexes (which store meta under .leann/indexes/<name>/)
try:
if cli_indexes_dir.exists() and cli_indexes_dir in meta_file.parents:
continue
except Exception:
pass
# Use the parent directory name as the app index display name
display_name = meta_file.parent.name
# Extract file base used to store files
file_base = meta_file.name.replace(".leann.meta.json", "")
# Show example from current project # Apps indexes are considered complete if the .leann.meta.json file exists
current_indexes_dir = current_dir / ".leann" / "indexes" status = ""
if current_indexes_dir.exists():
current_index_dirs = [d for d in current_indexes_dir.iterdir() if d.is_dir()] # Calculate total size of all related files (use file base)
if current_index_dirs: size_mb = 0
example_name = current_index_dirs[0].name try:
print(f' leann search {example_name} "your query"') index_dir = meta_file.parent
print(f" leann ask {example_name} --interactive") for related_file in index_dir.glob(f"{file_base}.leann*"):
size_mb += related_file.stat().st_size / (1024 * 1024)
except (OSError, PermissionError):
pass
indexes.append(
{
"name": display_name,
"type": "app",
"status": status,
"size_mb": size_mb,
"path": meta_file,
}
)
return indexes
def remove_index(self, index_name: str, force: bool = False):
"""Safely remove an index - always show all matches for transparency"""
# Always do a comprehensive search for safety
print(f"🔍 Searching for all indexes named '{index_name}'...")
all_matches = self._find_all_matching_indexes(index_name)
if not all_matches:
print(f"❌ Index '{index_name}' not found in any project.")
return False
if len(all_matches) == 1:
return self._remove_single_match(all_matches[0], index_name, force)
else:
return self._remove_from_multiple_matches(all_matches, index_name, force)
def _find_all_matching_indexes(self, index_name: str):
"""Find all indexes with the given name across all projects"""
matches = []
# Get all registered projects
global_registry = Path.home() / ".leann" / "projects.json"
all_projects = []
if global_registry.exists():
try:
import json
with open(global_registry) as f:
all_projects = json.load(f)
except Exception:
pass
# Always include current project
current_path = Path.cwd()
if str(current_path) not in all_projects:
all_projects.append(str(current_path))
# Search across all projects
for project_dir in all_projects:
project_path = Path(project_dir)
if not project_path.exists():
continue
# 1) CLI-format index under .leann/indexes/<name>
index_dir = project_path / ".leann" / "indexes" / index_name
if index_dir.exists():
is_current = project_path == current_path
matches.append(
{
"project_path": project_path,
"index_dir": index_dir,
"is_current": is_current,
"kind": "cli",
}
)
# 2) App-format indexes
# We support two ways of addressing apps:
# a) by the file base (e.g., `pdf_documents`)
# b) by the parent directory name (e.g., `new_txt`)
seen_app_meta = set()
# 2a) by file base
for meta_file in project_path.rglob(f"{index_name}.leann.meta.json"):
if meta_file.is_file():
# Skip CLI-built indexes' meta under .leann/indexes
try:
cli_indexes_dir = project_path / ".leann" / "indexes"
if cli_indexes_dir.exists() and cli_indexes_dir in meta_file.parents:
continue
except Exception:
pass
is_current = project_path == current_path
key = (str(project_path), str(meta_file))
if key in seen_app_meta:
continue
seen_app_meta.add(key)
matches.append(
{
"project_path": project_path,
"files_dir": meta_file.parent,
"meta_file": meta_file,
"is_current": is_current,
"kind": "app",
"display_name": meta_file.parent.name,
"file_base": meta_file.name.replace(".leann.meta.json", ""),
}
)
# 2b) by parent directory name
for meta_file in project_path.rglob("*.leann.meta.json"):
if meta_file.is_file() and meta_file.parent.name == index_name:
# Skip CLI-built indexes' meta under .leann/indexes
try:
cli_indexes_dir = project_path / ".leann" / "indexes"
if cli_indexes_dir.exists() and cli_indexes_dir in meta_file.parents:
continue
except Exception:
pass
is_current = project_path == current_path
key = (str(project_path), str(meta_file))
if key in seen_app_meta:
continue
seen_app_meta.add(key)
matches.append(
{
"project_path": project_path,
"files_dir": meta_file.parent,
"meta_file": meta_file,
"is_current": is_current,
"kind": "app",
"display_name": meta_file.parent.name,
"file_base": meta_file.name.replace(".leann.meta.json", ""),
}
)
# Sort: current project first, then by project name
matches.sort(key=lambda x: (not x["is_current"], x["project_path"].name))
return matches
def _remove_single_match(self, match, index_name: str, force: bool):
"""Handle removal when only one match is found"""
project_path = match["project_path"]
is_current = match["is_current"]
kind = match.get("kind", "cli")
if is_current:
location_info = "current project"
emoji = "🏠"
else:
location_info = f"other project '{project_path.name}'"
emoji = "📂"
print(f"✅ Found 1 index named '{index_name}':")
print(f" {emoji} Location: {location_info}")
if kind == "cli":
print(f" 📍 Path: {project_path / '.leann' / 'indexes' / index_name}")
else:
print(f" 📍 Meta: {match['meta_file']}")
if not force:
if not is_current:
print("\n⚠️ CROSS-PROJECT REMOVAL!")
print(" This will delete the index from another project.")
response = input(f" ❓ Confirm removal from {location_info}? (y/N): ").strip().lower()
if response not in ["y", "yes"]:
print(" ❌ Removal cancelled.")
return False
if kind == "cli":
return self._delete_index_directory(
match["index_dir"],
index_name,
project_path if not is_current else None,
is_app=False,
)
else:
return self._delete_index_directory(
match["files_dir"],
match.get("display_name", index_name),
project_path if not is_current else None,
is_app=True,
meta_file=match.get("meta_file"),
app_file_base=match.get("file_base"),
)
def _remove_from_multiple_matches(self, matches, index_name: str, force: bool):
"""Handle removal when multiple matches are found"""
print(f"⚠️ Found {len(matches)} indexes named '{index_name}':")
print(" " + "" * 50)
for i, match in enumerate(matches, 1):
project_path = match["project_path"]
is_current = match["is_current"]
kind = match.get("kind", "cli")
if is_current:
print(f" {i}. 🏠 Current project ({'CLI' if kind == 'cli' else 'APP'})")
else:
print(f" {i}. 📂 {project_path.name} ({'CLI' if kind == 'cli' else 'APP'})")
# Show path details
if kind == "cli":
print(f" 📍 {project_path / '.leann' / 'indexes' / index_name}")
else:
print(f" 📍 {match['meta_file']}")
# Show size info
try:
if kind == "cli":
size_mb = sum(
f.stat().st_size for f in match["index_dir"].iterdir() if f.is_file()
) / (1024 * 1024)
else:
file_base = match.get("file_base")
size_mb = 0.0
if file_base:
size_mb = sum(
f.stat().st_size
for f in match["files_dir"].glob(f"{file_base}.leann*")
if f.is_file()
) / (1024 * 1024)
print(f" 📦 Size: {size_mb:.1f} MB")
except (OSError, PermissionError):
pass
print(" " + "" * 50)
if force:
print(" ❌ Multiple matches found, but --force specified.")
print(" Please run without --force to choose which one to remove.")
return False
try:
choice = input(
f" ❓ Which one to remove? (1-{len(matches)}, or 'c' to cancel): "
).strip()
if choice.lower() == "c":
print(" ❌ Removal cancelled.")
return False
choice_idx = int(choice) - 1
if 0 <= choice_idx < len(matches):
selected_match = matches[choice_idx]
project_path = selected_match["project_path"]
is_current = selected_match["is_current"]
kind = selected_match.get("kind", "cli")
location = "current project" if is_current else f"'{project_path.name}' project"
print(f" 🎯 Selected: Remove from {location}")
# Final confirmation for safety
confirm = input(
f" ❓ FINAL CONFIRMATION - Type '{index_name}' to proceed: "
).strip()
if confirm != index_name:
print(" ❌ Confirmation failed. Removal cancelled.")
return False
if kind == "cli":
return self._delete_index_directory(
selected_match["index_dir"],
index_name,
project_path if not is_current else None,
is_app=False,
)
else:
return self._delete_index_directory(
selected_match["files_dir"],
selected_match.get("display_name", index_name),
project_path if not is_current else None,
is_app=True,
meta_file=selected_match.get("meta_file"),
app_file_base=selected_match.get("file_base"),
)
else:
print(" ❌ Invalid choice. Removal cancelled.")
return False
except (ValueError, KeyboardInterrupt):
print("\n ❌ Invalid input. Removal cancelled.")
return False
def _delete_index_directory(
self,
index_dir: Path,
index_display_name: str,
project_path: Optional[Path] = None,
is_app: bool = False,
meta_file: Optional[Path] = None,
app_file_base: Optional[str] = None,
):
"""Delete a CLI index directory or APP index files safely."""
try:
if is_app:
removed = 0
errors = 0
# Delete only files that belong to this app index (based on file base)
pattern_base = app_file_base or ""
for f in index_dir.glob(f"{pattern_base}.leann*"):
try:
f.unlink()
removed += 1
except Exception:
errors += 1
# Best-effort: also remove the meta file if specified and still exists
if meta_file and meta_file.exists():
try:
meta_file.unlink()
removed += 1
except Exception:
errors += 1
if removed > 0 and errors == 0:
if project_path:
print(
f"✅ App index '{index_display_name}' removed from {project_path.name}"
)
else:
print(f"✅ App index '{index_display_name}' removed successfully")
return True
elif removed > 0 and errors > 0:
print(
f"⚠️ App index '{index_display_name}' partially removed (some files couldn't be deleted)"
)
return True
else:
print(
f"❌ No files found to remove for app index '{index_display_name}' in {index_dir}"
)
return False
else:
import shutil
shutil.rmtree(index_dir)
if project_path:
print(f"✅ Index '{index_display_name}' removed from {project_path.name}")
else:
print(f"✅ Index '{index_display_name}' removed successfully")
return True
except Exception as e:
print(f"❌ Error removing index '{index_display_name}': {e}")
return False
def load_documents( def load_documents(
self, self,
@@ -942,6 +1349,8 @@ Examples:
if args.command == "list": if args.command == "list":
self.list_indexes() self.list_indexes()
elif args.command == "remove":
self.remove_index(args.index_name, args.force)
elif args.command == "build": elif args.command == "build":
await self.build_index(args) await self.build_index(args)
elif args.command == "search": elif args.command == "search":
@@ -953,10 +1362,15 @@ Examples:
def main(): def main():
import logging
import dotenv import dotenv
dotenv.load_dotenv() dotenv.load_dotenv()
# Set clean logging for CLI usage
logging.getLogger().setLevel(logging.WARNING) # Only show warnings and errors
cli = LeannCLI() cli = LeannCLI()
asyncio.run(cli.run()) asyncio.run(cli.run())

View File

@@ -2,11 +2,17 @@
import importlib import importlib
import importlib.metadata import importlib.metadata
from typing import TYPE_CHECKING import json
import logging
from pathlib import Path
from typing import TYPE_CHECKING, Optional, Union
if TYPE_CHECKING: if TYPE_CHECKING:
from leann.interface import LeannBackendFactoryInterface from leann.interface import LeannBackendFactoryInterface
# Set up logger for this module
logger = logging.getLogger(__name__)
BACKEND_REGISTRY: dict[str, "LeannBackendFactoryInterface"] = {} BACKEND_REGISTRY: dict[str, "LeannBackendFactoryInterface"] = {}
@@ -14,7 +20,7 @@ def register_backend(name: str):
"""A decorator to register a new backend class.""" """A decorator to register a new backend class."""
def decorator(cls): def decorator(cls):
print(f"INFO: Registering backend '{name}'") logger.debug(f"Registering backend '{name}'")
BACKEND_REGISTRY[name] = cls BACKEND_REGISTRY[name] = cls
return cls return cls
@@ -39,3 +45,54 @@ def autodiscover_backends():
# print(f"WARN: Could not import backend module '{backend_module_name}': {e}") # print(f"WARN: Could not import backend module '{backend_module_name}': {e}")
pass pass
# print("INFO: Backend auto-discovery finished.") # print("INFO: Backend auto-discovery finished.")
def register_project_directory(project_dir: Optional[Union[str, Path]] = None):
"""
Register a project directory in the global LEANN registry.
This allows `leann list` to discover indexes created by apps or other tools.
Args:
project_dir: Directory to register. If None, uses current working directory.
"""
if project_dir is None:
project_dir = Path.cwd()
else:
project_dir = Path(project_dir)
# Only register directories that have some kind of LEANN content
# Either .leann/indexes/ (CLI format) or *.leann.meta.json files (apps format)
has_cli_indexes = (project_dir / ".leann" / "indexes").exists()
has_app_indexes = any(project_dir.rglob("*.leann.meta.json"))
if not (has_cli_indexes or has_app_indexes):
# Don't register if there are no LEANN indexes
return
global_registry = Path.home() / ".leann" / "projects.json"
global_registry.parent.mkdir(exist_ok=True)
project_str = str(project_dir.resolve())
# Load existing registry
projects = []
if global_registry.exists():
try:
with open(global_registry) as f:
projects = json.load(f)
except Exception:
logger.debug("Could not load existing project registry")
projects = []
# Add project if not already present
if project_str not in projects:
projects.append(project_str)
# Save updated registry
try:
with open(global_registry, "w") as f:
json.dump(projects, f, indent=2)
logger.debug(f"Registered project directory: {project_str}")
except Exception as e:
logger.warning(f"Could not save project registry: {e}")

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project] [project]
name = "leann" name = "leann"
version = "0.3.0" version = "0.3.1"
description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!" description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!"
readme = "README.md" readme = "README.md"
requires-python = ">=3.9" requires-python = ">=3.9"

View File

@@ -14,8 +14,6 @@ dependencies = [
"numpy>=1.26.0", "numpy>=1.26.0",
"torch", "torch",
"tqdm", "tqdm",
"flask",
"flask_compress",
"datasets>=2.15.0", "datasets>=2.15.0",
"evaluate", "evaluate",
"colorama", "colorama",
@@ -66,9 +64,7 @@ test = [
"pytest>=7.0", "pytest>=7.0",
"pytest-timeout>=2.0", "pytest-timeout>=2.0",
"llama-index-core>=0.12.0", "llama-index-core>=0.12.0",
"llama-index-readers-file>=0.4.0",
"python-dotenv>=1.0.0", "python-dotenv>=1.0.0",
"sentence-transformers>=2.2.0",
] ]
diskann = [ diskann = [
@@ -100,13 +96,8 @@ leann-backend-hnsw = { path = "packages/leann-backend-hnsw", editable = true }
[tool.ruff] [tool.ruff]
target-version = "py39" target-version = "py39"
line-length = 100 line-length = 100
extend-exclude = [ extend-exclude = ["third_party"]
"third_party",
"*.egg-info",
"__pycache__",
".git",
".venv",
]
[tool.ruff.lint] [tool.ruff.lint]
select = [ select = [
@@ -129,21 +120,12 @@ ignore = [
"RUF012", # mutable class attributes should be annotated with typing.ClassVar "RUF012", # mutable class attributes should be annotated with typing.ClassVar
] ]
[tool.ruff.lint.per-file-ignores]
"test/**/*.py" = ["E402"] # module level import not at top of file (common in tests)
"examples/**/*.py" = ["E402"] # module level import not at top of file (common in examples)
[tool.ruff.format] [tool.ruff.format]
quote-style = "double" quote-style = "double"
indent-style = "space" indent-style = "space"
skip-magic-trailing-comma = false skip-magic-trailing-comma = false
line-ending = "auto" line-ending = "auto"
[dependency-groups]
dev = [
"ruff>=0.12.4",
]
[tool.lychee] [tool.lychee]
accept = ["200", "403", "429", "503"] accept = ["200", "403", "429", "503"]
timeout = 20 timeout = 20

7313
uv.lock generated
View File

File diff suppressed because it is too large Load Diff