docs

feat: support configurable local llm endpoints
2025-09-23 15:09:39 -07:00 · 2025-09-23 02:04:27 -07:00
18 changed files with 504 additions and 1341 deletions
@@ -176,7 +176,7 @@ response = chat.ask("How much storage does LEANN save?", top_k=1)

 ## RAG on Everything!

-LEANN supports RAG on various data sources including documents (`.pdf`, `.txt`, `.md`), Apple Mail, Google Search History, WeChat, Claude conversations, and more.
+LEANN supports RAG on various data sources including documents (`.pdf`, `.txt`, `.md`), Apple Mail, Google Search History, WeChat, and more.



@@ -477,80 +477,6 @@ Once the index is built, you can ask questions like:

 </details>

-### 🤖 Claude Chat History: Your Personal AI Conversation Archive!
-
-Transform your Claude conversations into a searchable knowledge base! Search through all your Claude discussions about coding, research, brainstorming, and more.
-
-```bash
-python -m apps.claude_rag --export-path claude_export.json --query "What did I ask about Python dictionaries?"
-```
-
-**Unlock your AI conversation history.** Never lose track of valuable insights from your Claude discussions again.
-
-<details>
-<summary><strong>📋 Click to expand: How to Export Claude Data</strong></summary>
-
-**Step-by-step export process:**
-
-1. **Open Claude** in your browser
-2. **Navigate to Settings** (look for gear icon or settings menu)
-3. **Find Export/Download** options in your account settings
-4. **Download conversation data** (usually in JSON format)
-5. **Place the file** in your project directory
-
-*Note: Claude export methods may vary depending on the interface you're using. Check Claude's help documentation for the most current export instructions.*
-
-**Supported formats:**
- `.json` files (recommended)
- `.zip` archives containing JSON data
- Directories with multiple export files
-
-</details>
-
-<details>
-<summary><strong>📋 Click to expand: Claude-Specific Arguments</strong></summary>
-
-#### Parameters
-```bash
--export-path PATH           # Path to Claude export file (.json/.zip) or directory (default: ./claude_export)
--separate-messages         # Process each message separately instead of concatenated conversations
--chunk-size N              # Text chunk size (default: 512)
--chunk-overlap N           # Overlap between chunks (default: 128)
-```
-
-#### Example Commands
-```bash
-# Basic usage with JSON export
-python -m apps.claude_rag --export-path my_claude_conversations.json
-
-# Process ZIP archive from Claude
-python -m apps.claude_rag --export-path claude_export.zip
-
-# Search with specific query
-python -m apps.claude_rag --export-path claude_data.json --query "machine learning advice"
-
-# Process individual messages for fine-grained search
-python -m apps.claude_rag --separate-messages --export-path claude_export.json
-
-# Process directory containing multiple exports
-python -m apps.claude_rag --export-path ./claude_exports/ --max-items 1000
-```
-
-</details>
-
-<details>
-<summary><strong>💡 Click to expand: Example queries you can try</strong></summary>
-
-Once your Claude conversations are indexed, you can search with queries like:
- "What did I ask Claude about Python programming?"
- "Show me conversations about machine learning algorithms"
- "Find discussions about software architecture patterns"
- "What debugging advice did Claude give me?"
- "Search for conversations about data structures"
- "Find Claude's recommendations for learning resources"
-
-</details>
-
 ### 🚀 Claude Code Integration: Transform Your Development Workflow!

 <details>
@@ -11,6 +11,7 @@ from typing import Any
 import dotenv
 from leann.api import LeannBuilder, LeannChat
 from leann.registry import register_project_directory
+from leann.settings import resolve_ollama_host, resolve_openai_api_key, resolve_openai_base_url

 dotenv.load_dotenv()

@@ -78,6 +79,24 @@ class BaseRAGExample(ABC):
            choices=["sentence-transformers", "openai", "mlx", "ollama"],
            help="Embedding backend mode (default: sentence-transformers), we provide sentence-transformers, openai, mlx, or ollama",
        )
+        embedding_group.add_argument(
+            "--embedding-host",
+            type=str,
+            default=None,
+            help="Override Ollama-compatible embedding host",
+        )
+        embedding_group.add_argument(
+            "--embedding-api-base",
+            type=str,
+            default=None,
+            help="Base URL for OpenAI-compatible embedding services",
+        )
+        embedding_group.add_argument(
+            "--embedding-api-key",
+            type=str,
+            default=None,
+            help="API key for embedding service (defaults to OPENAI_API_KEY)",
+        )

        # LLM parameters
        llm_group = parser.add_argument_group("LLM Parameters")
@@ -97,8 +116,8 @@ class BaseRAGExample(ABC):
        llm_group.add_argument(
            "--llm-host",
            type=str,
-            default="http://localhost:11434",
-            help="Host for Ollama API (default: http://localhost:11434)",
+            default=None,
+            help="Host for Ollama-compatible APIs (defaults to LEANN_OLLAMA_HOST/OLLAMA_HOST)",
        )
        llm_group.add_argument(
            "--thinking-budget",
@@ -107,6 +126,18 @@ class BaseRAGExample(ABC):
            default=None,
            help="Thinking budget for reasoning models (low/medium/high). Supported by GPT-Oss:20b and other reasoning models.",
        )
+        llm_group.add_argument(
+            "--llm-api-base",
+            type=str,
+            default=None,
+            help="Base URL for OpenAI-compatible APIs",
+        )
+        llm_group.add_argument(
+            "--llm-api-key",
+            type=str,
+            default=None,
+            help="API key for OpenAI-compatible APIs (defaults to OPENAI_API_KEY)",
+        )

        # AST Chunking parameters
        ast_group = parser.add_argument_group("AST Chunking Parameters")
@@ -205,9 +236,13 @@ class BaseRAGExample(ABC):

        if args.llm == "openai":
            config["model"] = args.llm_model or "gpt-4o"
+            config["base_url"] = resolve_openai_base_url(args.llm_api_base)
+            resolved_key = resolve_openai_api_key(args.llm_api_key)
+            if resolved_key:
+                config["api_key"] = resolved_key
        elif args.llm == "ollama":
            config["model"] = args.llm_model or "llama3.2:1b"
-            config["host"] = args.llm_host
+            config["host"] = resolve_ollama_host(args.llm_host)
        elif args.llm == "hf":
            config["model"] = args.llm_model or "Qwen/Qwen2.5-1.5B-Instruct"
        elif args.llm == "simulated":
@@ -223,10 +258,20 @@ class BaseRAGExample(ABC):
        print(f"\n[Building Index] Creating {self.name} index...")
        print(f"Total text chunks: {len(texts)}")

+        embedding_options: dict[str, Any] = {}
+        if args.embedding_mode == "ollama":
+            embedding_options["host"] = resolve_ollama_host(args.embedding_host)
+        elif args.embedding_mode == "openai":
+            embedding_options["base_url"] = resolve_openai_base_url(args.embedding_api_base)
+            resolved_embedding_key = resolve_openai_api_key(args.embedding_api_key)
+            if resolved_embedding_key:
+                embedding_options["api_key"] = resolved_embedding_key
+
        builder = LeannBuilder(
            backend_name=args.backend_name,
            embedding_model=args.embedding_model,
            embedding_mode=args.embedding_mode,
+            embedding_options=embedding_options or None,
            graph_degree=args.graph_degree,
            complexity=args.build_complexity,
            is_compact=not args.no_compact,
@@ -1,413 +0,0 @@
-"""
-ChatGPT export data reader.
-
-Reads and processes ChatGPT export data from chat.html files.
-"""
-
-import re
-from pathlib import Path
-from typing import Any
-from zipfile import ZipFile
-
-from bs4 import BeautifulSoup
-from llama_index.core import Document
-from llama_index.core.readers.base import BaseReader
-
-
-class ChatGPTReader(BaseReader):
-    """
-    ChatGPT export data reader.
-
-    Reads ChatGPT conversation data from exported chat.html files or zip archives.
-    Processes conversations into structured documents with metadata.
-    """
-
-    def __init__(self, concatenate_conversations: bool = True) -> None:
-        """
-        Initialize.
-
-        Args:
-            concatenate_conversations: Whether to concatenate messages within conversations for better context
-        """
-        try:
-            from bs4 import BeautifulSoup  # noqa
-        except ImportError:
-            raise ImportError("`beautifulsoup4` package not found: `pip install beautifulsoup4`")
-
-        self.concatenate_conversations = concatenate_conversations
-
-    def _extract_html_from_zip(self, zip_path: Path) -> str | None:
-        """
-        Extract chat.html from ChatGPT export zip file.
-
-        Args:
-            zip_path: Path to the ChatGPT export zip file
-
-        Returns:
-            HTML content as string, or None if not found
-        """
-        try:
-            with ZipFile(zip_path, "r") as zip_file:
-                # Look for chat.html or conversations.html
-                html_files = [
-                    f
-                    for f in zip_file.namelist()
-                    if f.endswith(".html") and ("chat" in f.lower() or "conversation" in f.lower())
-                ]
-
-                if not html_files:
-                    print(f"No HTML chat file found in {zip_path}")
-                    return None
-
-                # Use the first HTML file found
-                html_file = html_files[0]
-                print(f"Found HTML file: {html_file}")
-
-                with zip_file.open(html_file) as f:
-                    return f.read().decode("utf-8", errors="ignore")
-
-        except Exception as e:
-            print(f"Error extracting HTML from zip {zip_path}: {e}")
-            return None
-
-    def _parse_chatgpt_html(self, html_content: str) -> list[dict]:
-        """
-        Parse ChatGPT HTML export to extract conversations.
-
-        Args:
-            html_content: HTML content from ChatGPT export
-
-        Returns:
-            List of conversation dictionaries
-        """
-        soup = BeautifulSoup(html_content, "html.parser")
-        conversations = []
-
-        # Try different possible structures for ChatGPT exports
-        # Structure 1: Look for conversation containers
-        conversation_containers = soup.find_all(
-            ["div", "section"], class_=re.compile(r"conversation|chat", re.I)
-        )
-
-        if not conversation_containers:
-            # Structure 2: Look for message containers directly
-            conversation_containers = [soup]  # Use the entire document as one conversation
-
-        for container in conversation_containers:
-            conversation = self._extract_conversation_from_container(container)
-            if conversation and conversation.get("messages"):
-                conversations.append(conversation)
-
-        # If no structured conversations found, try to extract all text as one conversation
-        if not conversations:
-            all_text = soup.get_text(separator="\n", strip=True)
-            if all_text:
-                conversations.append(
-                    {
-                        "title": "ChatGPT Conversation",
-                        "messages": [{"role": "mixed", "content": all_text, "timestamp": None}],
-                        "timestamp": None,
-                    }
-                )
-
-        return conversations
-
-    def _extract_conversation_from_container(self, container) -> dict | None:
-        """
-        Extract conversation data from a container element.
-
-        Args:
-            container: BeautifulSoup element containing conversation
-
-        Returns:
-            Dictionary with conversation data or None
-        """
-        messages = []
-
-        # Look for message elements with various possible structures
-        message_selectors = ['[class*="message"]', '[class*="chat"]', "[data-message]", "p", "div"]
-
-        for selector in message_selectors:
-            message_elements = container.select(selector)
-            if message_elements:
-                break
-        else:
-            message_elements = []
-
-        # If no structured messages found, treat the entire container as one message
-        if not message_elements:
-            text_content = container.get_text(separator="\n", strip=True)
-            if text_content:
-                messages.append({"role": "mixed", "content": text_content, "timestamp": None})
-        else:
-            for element in message_elements:
-                message = self._extract_message_from_element(element)
-                if message:
-                    messages.append(message)
-
-        if not messages:
-            return None
-
-        # Try to extract conversation title
-        title_element = container.find(["h1", "h2", "h3", "title"])
-        title = title_element.get_text(strip=True) if title_element else "ChatGPT Conversation"
-
-        # Try to extract timestamp from various possible locations
-        timestamp = self._extract_timestamp_from_container(container)
-
-        return {"title": title, "messages": messages, "timestamp": timestamp}
-
-    def _extract_message_from_element(self, element) -> dict | None:
-        """
-        Extract message data from an element.
-
-        Args:
-            element: BeautifulSoup element containing message
-
-        Returns:
-            Dictionary with message data or None
-        """
-        text_content = element.get_text(separator=" ", strip=True)
-
-        # Skip empty or very short messages
-        if not text_content or len(text_content.strip()) < 3:
-            return None
-
-        # Try to determine role (user/assistant) from class names or content
-        role = "mixed"  # Default role
-
-        class_names = " ".join(element.get("class", [])).lower()
-        if "user" in class_names or "human" in class_names:
-            role = "user"
-        elif "assistant" in class_names or "ai" in class_names or "gpt" in class_names:
-            role = "assistant"
-        elif text_content.lower().startswith(("you:", "user:", "me:")):
-            role = "user"
-            text_content = re.sub(r"^(you|user|me):\s*", "", text_content, flags=re.IGNORECASE)
-        elif text_content.lower().startswith(("chatgpt:", "assistant:", "ai:")):
-            role = "assistant"
-            text_content = re.sub(
-                r"^(chatgpt|assistant|ai):\s*", "", text_content, flags=re.IGNORECASE
-            )
-
-        # Try to extract timestamp
-        timestamp = self._extract_timestamp_from_element(element)
-
-        return {"role": role, "content": text_content, "timestamp": timestamp}
-
-    def _extract_timestamp_from_element(self, element) -> str | None:
-        """Extract timestamp from element."""
-        # Look for timestamp in various attributes and child elements
-        timestamp_attrs = ["data-timestamp", "timestamp", "datetime"]
-        for attr in timestamp_attrs:
-            if element.get(attr):
-                return element.get(attr)
-
-        # Look for time elements
-        time_element = element.find("time")
-        if time_element:
-            return time_element.get("datetime") or time_element.get_text(strip=True)
-
-        # Look for date-like text patterns
-        text = element.get_text()
-        date_patterns = [r"\d{4}-\d{2}-\d{2}", r"\d{1,2}/\d{1,2}/\d{4}", r"\w+ \d{1,2}, \d{4}"]
-
-        for pattern in date_patterns:
-            match = re.search(pattern, text)
-            if match:
-                return match.group()
-
-        return None
-
-    def _extract_timestamp_from_container(self, container) -> str | None:
-        """Extract timestamp from conversation container."""
-        return self._extract_timestamp_from_element(container)
-
-    def _create_concatenated_content(self, conversation: dict) -> str:
-        """
-        Create concatenated content from conversation messages.
-
-        Args:
-            conversation: Dictionary containing conversation data
-
-        Returns:
-            Formatted concatenated content
-        """
-        title = conversation.get("title", "ChatGPT Conversation")
-        messages = conversation.get("messages", [])
-        timestamp = conversation.get("timestamp", "Unknown")
-
-        # Build message content
-        message_parts = []
-        for message in messages:
-            role = message.get("role", "mixed")
-            content = message.get("content", "")
-            msg_timestamp = message.get("timestamp", "")
-
-            if role == "user":
-                prefix = "[You]"
-            elif role == "assistant":
-                prefix = "[ChatGPT]"
-            else:
-                prefix = "[Message]"
-
-            # Add timestamp if available
-            if msg_timestamp:
-                prefix += f" ({msg_timestamp})"
-
-            message_parts.append(f"{prefix}: {content}")
-
-        concatenated_text = "\n\n".join(message_parts)
-
-        # Create final document content
-        doc_content = f"""Conversation: {title}
-Date: {timestamp}
-Messages ({len(messages)} messages):
-
-{concatenated_text}
-"""
-        return doc_content
-
-    def load_data(self, input_dir: str | None = None, **load_kwargs: Any) -> list[Document]:
-        """
-        Load ChatGPT export data.
-
-        Args:
-            input_dir: Directory containing ChatGPT export files or path to specific file
-            **load_kwargs:
-                max_count (int): Maximum number of conversations to process
-                chatgpt_export_path (str): Specific path to ChatGPT export file/directory
-                include_metadata (bool): Whether to include metadata in documents
-        """
-        docs: list[Document] = []
-        max_count = load_kwargs.get("max_count", -1)
-        chatgpt_export_path = load_kwargs.get("chatgpt_export_path", input_dir)
-        include_metadata = load_kwargs.get("include_metadata", True)
-
-        if not chatgpt_export_path:
-            print("No ChatGPT export path provided")
-            return docs
-
-        export_path = Path(chatgpt_export_path)
-
-        if not export_path.exists():
-            print(f"ChatGPT export path not found: {export_path}")
-            return docs
-
-        html_content = None
-
-        # Handle different input types
-        if export_path.is_file():
-            if export_path.suffix.lower() == ".zip":
-                # Extract HTML from zip file
-                html_content = self._extract_html_from_zip(export_path)
-            elif export_path.suffix.lower() == ".html":
-                # Read HTML file directly
-                try:
-                    with open(export_path, encoding="utf-8", errors="ignore") as f:
-                        html_content = f.read()
-                except Exception as e:
-                    print(f"Error reading HTML file {export_path}: {e}")
-                    return docs
-            else:
-                print(f"Unsupported file type: {export_path.suffix}")
-                return docs
-
-        elif export_path.is_dir():
-            # Look for HTML files in directory
-            html_files = list(export_path.glob("*.html"))
-            zip_files = list(export_path.glob("*.zip"))
-
-            if html_files:
-                # Use first HTML file found
-                html_file = html_files[0]
-                print(f"Found HTML file: {html_file}")
-                try:
-                    with open(html_file, encoding="utf-8", errors="ignore") as f:
-                        html_content = f.read()
-                except Exception as e:
-                    print(f"Error reading HTML file {html_file}: {e}")
-                    return docs
-
-            elif zip_files:
-                # Use first zip file found
-                zip_file = zip_files[0]
-                print(f"Found zip file: {zip_file}")
-                html_content = self._extract_html_from_zip(zip_file)
-
-            else:
-                print(f"No HTML or zip files found in {export_path}")
-                return docs
-
-        if not html_content:
-            print("No HTML content found to process")
-            return docs
-
-        # Parse conversations from HTML
-        print("Parsing ChatGPT conversations from HTML...")
-        conversations = self._parse_chatgpt_html(html_content)
-
-        if not conversations:
-            print("No conversations found in HTML content")
-            return docs
-
-        print(f"Found {len(conversations)} conversations")
-
-        # Process conversations into documents
-        count = 0
-        for conversation in conversations:
-            if max_count > 0 and count >= max_count:
-                break
-
-            if self.concatenate_conversations:
-                # Create one document per conversation with concatenated messages
-                doc_content = self._create_concatenated_content(conversation)
-
-                metadata = {}
-                if include_metadata:
-                    metadata = {
-                        "title": conversation.get("title", "ChatGPT Conversation"),
-                        "timestamp": conversation.get("timestamp", "Unknown"),
-                        "message_count": len(conversation.get("messages", [])),
-                        "source": "ChatGPT Export",
-                    }
-
-                doc = Document(text=doc_content, metadata=metadata)
-                docs.append(doc)
-                count += 1
-
-            else:
-                # Create separate documents for each message
-                for message in conversation.get("messages", []):
-                    if max_count > 0 and count >= max_count:
-                        break
-
-                    role = message.get("role", "mixed")
-                    content = message.get("content", "")
-                    msg_timestamp = message.get("timestamp", "")
-
-                    if not content.strip():
-                        continue
-
-                    # Create document content with context
-                    doc_content = f"""Conversation: {conversation.get("title", "ChatGPT Conversation")}
-Role: {role}
-Timestamp: {msg_timestamp or conversation.get("timestamp", "Unknown")}
-Message: {content}
-"""
-
-                    metadata = {}
-                    if include_metadata:
-                        metadata = {
-                            "conversation_title": conversation.get("title", "ChatGPT Conversation"),
-                            "role": role,
-                            "timestamp": msg_timestamp or conversation.get("timestamp", "Unknown"),
-                            "source": "ChatGPT Export",
-                        }
-
-                    doc = Document(text=doc_content, metadata=metadata)
-                    docs.append(doc)
-                    count += 1
-
-        print(f"Created {len(docs)} documents from ChatGPT export")
-        return docs
@@ -1,186 +0,0 @@
-"""
-ChatGPT RAG example using the unified interface.
-Supports ChatGPT export data from chat.html files.
-"""
-
-import sys
-from pathlib import Path
-
-# Add parent directory to path for imports
-sys.path.insert(0, str(Path(__file__).parent))
-
-from base_rag_example import BaseRAGExample
-from chunking import create_text_chunks
-
-from .chatgpt_data.chatgpt_reader import ChatGPTReader
-
-
-class ChatGPTRAG(BaseRAGExample):
-    """RAG example for ChatGPT conversation data."""
-
-    def __init__(self):
-        # Set default values BEFORE calling super().__init__
-        self.max_items_default = -1  # Process all conversations by default
-        self.embedding_model_default = (
-            "sentence-transformers/all-MiniLM-L6-v2"  # Fast 384-dim model
-        )
-
-        super().__init__(
-            name="ChatGPT",
-            description="Process and query ChatGPT conversation exports with LEANN",
-            default_index_name="chatgpt_conversations_index",
-        )
-
-    def _add_specific_arguments(self, parser):
-        """Add ChatGPT-specific arguments."""
-        chatgpt_group = parser.add_argument_group("ChatGPT Parameters")
-        chatgpt_group.add_argument(
-            "--export-path",
-            type=str,
-            default="./chatgpt_export",
-            help="Path to ChatGPT export file (.zip or .html) or directory containing exports (default: ./chatgpt_export)",
-        )
-        chatgpt_group.add_argument(
-            "--concatenate-conversations",
-            action="store_true",
-            default=True,
-            help="Concatenate messages within conversations for better context (default: True)",
-        )
-        chatgpt_group.add_argument(
-            "--separate-messages",
-            action="store_true",
-            help="Process each message as a separate document (overrides --concatenate-conversations)",
-        )
-        chatgpt_group.add_argument(
-            "--chunk-size", type=int, default=512, help="Text chunk size (default: 512)"
-        )
-        chatgpt_group.add_argument(
-            "--chunk-overlap", type=int, default=128, help="Text chunk overlap (default: 128)"
-        )
-
-    def _find_chatgpt_exports(self, export_path: Path) -> list[Path]:
-        """
-        Find ChatGPT export files in the given path.
-
-        Args:
-            export_path: Path to search for exports
-
-        Returns:
-            List of paths to ChatGPT export files
-        """
-        export_files = []
-
-        if export_path.is_file():
-            if export_path.suffix.lower() in [".zip", ".html"]:
-                export_files.append(export_path)
-        elif export_path.is_dir():
-            # Look for zip and html files
-            export_files.extend(export_path.glob("*.zip"))
-            export_files.extend(export_path.glob("*.html"))
-
-        return export_files
-
-    async def load_data(self, args) -> list[str]:
-        """Load ChatGPT export data and convert to text chunks."""
-        export_path = Path(args.export_path)
-
-        if not export_path.exists():
-            print(f"ChatGPT export path not found: {export_path}")
-            print(
-                "Please ensure you have exported your ChatGPT data and placed it in the correct location."
-            )
-            print("\nTo export your ChatGPT data:")
-            print("1. Sign in to ChatGPT")
-            print("2. Click on your profile icon → Settings → Data Controls")
-            print("3. Click 'Export' under Export Data")
-            print("4. Download the zip file from the email link")
-            print("5. Extract or place the file/directory at the specified path")
-            return []
-
-        # Find export files
-        export_files = self._find_chatgpt_exports(export_path)
-
-        if not export_files:
-            print(f"No ChatGPT export files (.zip or .html) found in: {export_path}")
-            return []
-
-        print(f"Found {len(export_files)} ChatGPT export files")
-
-        # Create reader with appropriate settings
-        concatenate = args.concatenate_conversations and not args.separate_messages
-        reader = ChatGPTReader(concatenate_conversations=concatenate)
-
-        # Process each export file
-        all_documents = []
-        total_processed = 0
-
-        for i, export_file in enumerate(export_files):
-            print(f"\nProcessing export file {i + 1}/{len(export_files)}: {export_file.name}")
-
-            try:
-                # Apply max_items limit per file
-                max_per_file = -1
-                if args.max_items > 0:
-                    remaining = args.max_items - total_processed
-                    if remaining <= 0:
-                        break
-                    max_per_file = remaining
-
-                # Load conversations
-                documents = reader.load_data(
-                    chatgpt_export_path=str(export_file),
-                    max_count=max_per_file,
-                    include_metadata=True,
-                )
-
-                if documents:
-                    all_documents.extend(documents)
-                    total_processed += len(documents)
-                    print(f"Processed {len(documents)} conversations from this file")
-                else:
-                    print(f"No conversations loaded from {export_file}")
-
-            except Exception as e:
-                print(f"Error processing {export_file}: {e}")
-                continue
-
-        if not all_documents:
-            print("No conversations found to process!")
-            print("\nTroubleshooting:")
-            print("- Ensure the export file is a valid ChatGPT export")
-            print("- Check that the HTML file contains conversation data")
-            print("- Try extracting the zip file and pointing to the HTML file directly")
-            return []
-
-        print(f"\nTotal conversations processed: {len(all_documents)}")
-        print("Now starting to split into text chunks... this may take some time")
-
-        # Convert to text chunks
-        all_texts = create_text_chunks(
-            all_documents, chunk_size=args.chunk_size, chunk_overlap=args.chunk_overlap
-        )
-
-        print(f"Created {len(all_texts)} text chunks from {len(all_documents)} conversations")
-        return all_texts
-
-
-if __name__ == "__main__":
-    import asyncio
-
-    # Example queries for ChatGPT RAG
-    print("\n🤖 ChatGPT RAG Example")
-    print("=" * 50)
-    print("\nExample queries you can try:")
-    print("- 'What did I ask about Python programming?'")
-    print("- 'Show me conversations about machine learning'")
-    print("- 'Find discussions about travel planning'")
-    print("- 'What advice did ChatGPT give me about career development?'")
-    print("- 'Search for conversations about cooking recipes'")
-    print("\nTo get started:")
-    print("1. Export your ChatGPT data from Settings → Data Controls → Export")
-    print("2. Place the downloaded zip file or extracted HTML in ./chatgpt_export/")
-    print("3. Run this script to build your personal ChatGPT knowledge base!")
-    print("\nOr run without --query for interactive mode\n")
-
-    rag = ChatGPTRAG()
-    asyncio.run(rag.run())
@@ -1,420 +0,0 @@
-"""
-Claude export data reader.
-
-Reads and processes Claude conversation data from exported JSON files.
-"""
-
-import json
-from pathlib import Path
-from typing import Any
-from zipfile import ZipFile
-
-from llama_index.core import Document
-from llama_index.core.readers.base import BaseReader
-
-
-class ClaudeReader(BaseReader):
-    """
-    Claude export data reader.
-
-    Reads Claude conversation data from exported JSON files or zip archives.
-    Processes conversations into structured documents with metadata.
-    """
-
-    def __init__(self, concatenate_conversations: bool = True) -> None:
-        """
-        Initialize.
-
-        Args:
-            concatenate_conversations: Whether to concatenate messages within conversations for better context
-        """
-        self.concatenate_conversations = concatenate_conversations
-
-    def _extract_json_from_zip(self, zip_path: Path) -> list[str]:
-        """
-        Extract JSON files from Claude export zip file.
-
-        Args:
-            zip_path: Path to the Claude export zip file
-
-        Returns:
-            List of JSON content strings, or empty list if not found
-        """
-        json_contents = []
-        try:
-            with ZipFile(zip_path, "r") as zip_file:
-                # Look for JSON files
-                json_files = [f for f in zip_file.namelist() if f.endswith(".json")]
-
-                if not json_files:
-                    print(f"No JSON files found in {zip_path}")
-                    return []
-
-                print(f"Found {len(json_files)} JSON files in archive")
-
-                for json_file in json_files:
-                    with zip_file.open(json_file) as f:
-                        content = f.read().decode("utf-8", errors="ignore")
-                        json_contents.append(content)
-
-        except Exception as e:
-            print(f"Error extracting JSON from zip {zip_path}: {e}")
-
-        return json_contents
-
-    def _parse_claude_json(self, json_content: str) -> list[dict]:
-        """
-        Parse Claude JSON export to extract conversations.
-
-        Args:
-            json_content: JSON content from Claude export
-
-        Returns:
-            List of conversation dictionaries
-        """
-        try:
-            data = json.loads(json_content)
-        except json.JSONDecodeError as e:
-            print(f"Error parsing JSON: {e}")
-            return []
-
-        conversations = []
-
-        # Handle different possible JSON structures
-        if isinstance(data, list):
-            # If data is a list of conversations
-            for item in data:
-                conversation = self._extract_conversation_from_json(item)
-                if conversation:
-                    conversations.append(conversation)
-        elif isinstance(data, dict):
-            # Check for common structures
-            if "conversations" in data:
-                # Structure: {"conversations": [...]}
-                for item in data["conversations"]:
-                    conversation = self._extract_conversation_from_json(item)
-                    if conversation:
-                        conversations.append(conversation)
-            elif "messages" in data:
-                # Single conversation with messages
-                conversation = self._extract_conversation_from_json(data)
-                if conversation:
-                    conversations.append(conversation)
-            else:
-                # Try to treat the whole object as a conversation
-                conversation = self._extract_conversation_from_json(data)
-                if conversation:
-                    conversations.append(conversation)
-
-        return conversations
-
-    def _extract_conversation_from_json(self, conv_data: dict) -> dict | None:
-        """
-        Extract conversation data from a JSON object.
-
-        Args:
-            conv_data: Dictionary containing conversation data
-
-        Returns:
-            Dictionary with conversation data or None
-        """
-        if not isinstance(conv_data, dict):
-            return None
-
-        messages = []
-
-        # Look for messages in various possible structures
-        message_sources = []
-        if "messages" in conv_data:
-            message_sources = conv_data["messages"]
-        elif "chat" in conv_data:
-            message_sources = conv_data["chat"]
-        elif "conversation" in conv_data:
-            message_sources = conv_data["conversation"]
-        else:
-            # If no clear message structure, try to extract from the object itself
-            if "content" in conv_data and "role" in conv_data:
-                message_sources = [conv_data]
-
-        for msg_data in message_sources:
-            message = self._extract_message_from_json(msg_data)
-            if message:
-                messages.append(message)
-
-        if not messages:
-            return None
-
-        # Extract conversation metadata
-        title = self._extract_title_from_conversation(conv_data, messages)
-        timestamp = self._extract_timestamp_from_conversation(conv_data)
-
-        return {"title": title, "messages": messages, "timestamp": timestamp}
-
-    def _extract_message_from_json(self, msg_data: dict) -> dict | None:
-        """
-        Extract message data from a JSON message object.
-
-        Args:
-            msg_data: Dictionary containing message data
-
-        Returns:
-            Dictionary with message data or None
-        """
-        if not isinstance(msg_data, dict):
-            return None
-
-        # Extract content from various possible fields
-        content = ""
-        content_fields = ["content", "text", "message", "body"]
-        for field in content_fields:
-            if msg_data.get(field):
-                content = str(msg_data[field])
-                break
-
-        if not content or len(content.strip()) < 3:
-            return None
-
-        # Extract role (user/assistant/human/ai/claude)
-        role = "mixed"  # Default role
-        role_fields = ["role", "sender", "from", "author", "type"]
-        for field in role_fields:
-            if msg_data.get(field):
-                role_value = str(msg_data[field]).lower()
-                if role_value in ["user", "human", "person"]:
-                    role = "user"
-                elif role_value in ["assistant", "ai", "claude", "bot"]:
-                    role = "assistant"
-                break
-
-        # Extract timestamp
-        timestamp = self._extract_timestamp_from_message(msg_data)
-
-        return {"role": role, "content": content, "timestamp": timestamp}
-
-    def _extract_timestamp_from_message(self, msg_data: dict) -> str | None:
-        """Extract timestamp from message data."""
-        timestamp_fields = ["timestamp", "created_at", "date", "time"]
-        for field in timestamp_fields:
-            if msg_data.get(field):
-                return str(msg_data[field])
-        return None
-
-    def _extract_timestamp_from_conversation(self, conv_data: dict) -> str | None:
-        """Extract timestamp from conversation data."""
-        timestamp_fields = ["timestamp", "created_at", "date", "updated_at", "last_updated"]
-        for field in timestamp_fields:
-            if conv_data.get(field):
-                return str(conv_data[field])
-        return None
-
-    def _extract_title_from_conversation(self, conv_data: dict, messages: list) -> str:
-        """Extract or generate title for conversation."""
-        # Try to find explicit title
-        title_fields = ["title", "name", "subject", "topic"]
-        for field in title_fields:
-            if conv_data.get(field):
-                return str(conv_data[field])
-
-        # Generate title from first user message
-        for message in messages:
-            if message.get("role") == "user":
-                content = message.get("content", "")
-                if content:
-                    # Use first 50 characters as title
-                    title = content[:50].strip()
-                    if len(content) > 50:
-                        title += "..."
-                    return title
-
-        return "Claude Conversation"
-
-    def _create_concatenated_content(self, conversation: dict) -> str:
-        """
-        Create concatenated content from conversation messages.
-
-        Args:
-            conversation: Dictionary containing conversation data
-
-        Returns:
-            Formatted concatenated content
-        """
-        title = conversation.get("title", "Claude Conversation")
-        messages = conversation.get("messages", [])
-        timestamp = conversation.get("timestamp", "Unknown")
-
-        # Build message content
-        message_parts = []
-        for message in messages:
-            role = message.get("role", "mixed")
-            content = message.get("content", "")
-            msg_timestamp = message.get("timestamp", "")
-
-            if role == "user":
-                prefix = "[You]"
-            elif role == "assistant":
-                prefix = "[Claude]"
-            else:
-                prefix = "[Message]"
-
-            # Add timestamp if available
-            if msg_timestamp:
-                prefix += f" ({msg_timestamp})"
-
-            message_parts.append(f"{prefix}: {content}")
-
-        concatenated_text = "\n\n".join(message_parts)
-
-        # Create final document content
-        doc_content = f"""Conversation: {title}
-Date: {timestamp}
-Messages ({len(messages)} messages):
-
-{concatenated_text}
-"""
-        return doc_content
-
-    def load_data(self, input_dir: str | None = None, **load_kwargs: Any) -> list[Document]:
-        """
-        Load Claude export data.
-
-        Args:
-            input_dir: Directory containing Claude export files or path to specific file
-            **load_kwargs:
-                max_count (int): Maximum number of conversations to process
-                claude_export_path (str): Specific path to Claude export file/directory
-                include_metadata (bool): Whether to include metadata in documents
-        """
-        docs: list[Document] = []
-        max_count = load_kwargs.get("max_count", -1)
-        claude_export_path = load_kwargs.get("claude_export_path", input_dir)
-        include_metadata = load_kwargs.get("include_metadata", True)
-
-        if not claude_export_path:
-            print("No Claude export path provided")
-            return docs
-
-        export_path = Path(claude_export_path)
-
-        if not export_path.exists():
-            print(f"Claude export path not found: {export_path}")
-            return docs
-
-        json_contents = []
-
-        # Handle different input types
-        if export_path.is_file():
-            if export_path.suffix.lower() == ".zip":
-                # Extract JSON from zip file
-                json_contents = self._extract_json_from_zip(export_path)
-            elif export_path.suffix.lower() == ".json":
-                # Read JSON file directly
-                try:
-                    with open(export_path, encoding="utf-8", errors="ignore") as f:
-                        json_contents.append(f.read())
-                except Exception as e:
-                    print(f"Error reading JSON file {export_path}: {e}")
-                    return docs
-            else:
-                print(f"Unsupported file type: {export_path.suffix}")
-                return docs
-
-        elif export_path.is_dir():
-            # Look for JSON files in directory
-            json_files = list(export_path.glob("*.json"))
-            zip_files = list(export_path.glob("*.zip"))
-
-            if json_files:
-                print(f"Found {len(json_files)} JSON files in directory")
-                for json_file in json_files:
-                    try:
-                        with open(json_file, encoding="utf-8", errors="ignore") as f:
-                            json_contents.append(f.read())
-                    except Exception as e:
-                        print(f"Error reading JSON file {json_file}: {e}")
-                        continue
-
-            if zip_files:
-                print(f"Found {len(zip_files)} ZIP files in directory")
-                for zip_file in zip_files:
-                    zip_contents = self._extract_json_from_zip(zip_file)
-                    json_contents.extend(zip_contents)
-
-            if not json_files and not zip_files:
-                print(f"No JSON or ZIP files found in {export_path}")
-                return docs
-
-        if not json_contents:
-            print("No JSON content found to process")
-            return docs
-
-        # Parse conversations from JSON content
-        print("Parsing Claude conversations from JSON...")
-        all_conversations = []
-        for json_content in json_contents:
-            conversations = self._parse_claude_json(json_content)
-            all_conversations.extend(conversations)
-
-        if not all_conversations:
-            print("No conversations found in JSON content")
-            return docs
-
-        print(f"Found {len(all_conversations)} conversations")
-
-        # Process conversations into documents
-        count = 0
-        for conversation in all_conversations:
-            if max_count > 0 and count >= max_count:
-                break
-
-            if self.concatenate_conversations:
-                # Create one document per conversation with concatenated messages
-                doc_content = self._create_concatenated_content(conversation)
-
-                metadata = {}
-                if include_metadata:
-                    metadata = {
-                        "title": conversation.get("title", "Claude Conversation"),
-                        "timestamp": conversation.get("timestamp", "Unknown"),
-                        "message_count": len(conversation.get("messages", [])),
-                        "source": "Claude Export",
-                    }
-
-                doc = Document(text=doc_content, metadata=metadata)
-                docs.append(doc)
-                count += 1
-
-            else:
-                # Create separate documents for each message
-                for message in conversation.get("messages", []):
-                    if max_count > 0 and count >= max_count:
-                        break
-
-                    role = message.get("role", "mixed")
-                    content = message.get("content", "")
-                    msg_timestamp = message.get("timestamp", "")
-
-                    if not content.strip():
-                        continue
-
-                    # Create document content with context
-                    doc_content = f"""Conversation: {conversation.get("title", "Claude Conversation")}
-Role: {role}
-Timestamp: {msg_timestamp or conversation.get("timestamp", "Unknown")}
-Message: {content}
-"""
-
-                    metadata = {}
-                    if include_metadata:
-                        metadata = {
-                            "conversation_title": conversation.get("title", "Claude Conversation"),
-                            "role": role,
-                            "timestamp": msg_timestamp or conversation.get("timestamp", "Unknown"),
-                            "source": "Claude Export",
-                        }
-
-                    doc = Document(text=doc_content, metadata=metadata)
-                    docs.append(doc)
-                    count += 1
-
-        print(f"Created {len(docs)} documents from Claude export")
-        return docs
@@ -1,189 +0,0 @@
-"""
-Claude RAG example using the unified interface.
-Supports Claude export data from JSON files.
-"""
-
-import sys
-from pathlib import Path
-
-# Add parent directory to path for imports
-sys.path.insert(0, str(Path(__file__).parent))
-
-from base_rag_example import BaseRAGExample
-from chunking import create_text_chunks
-
-from .claude_data.claude_reader import ClaudeReader
-
-
-class ClaudeRAG(BaseRAGExample):
-    """RAG example for Claude conversation data."""
-
-    def __init__(self):
-        # Set default values BEFORE calling super().__init__
-        self.max_items_default = -1  # Process all conversations by default
-        self.embedding_model_default = (
-            "sentence-transformers/all-MiniLM-L6-v2"  # Fast 384-dim model
-        )
-
-        super().__init__(
-            name="Claude",
-            description="Process and query Claude conversation exports with LEANN",
-            default_index_name="claude_conversations_index",
-        )
-
-    def _add_specific_arguments(self, parser):
-        """Add Claude-specific arguments."""
-        claude_group = parser.add_argument_group("Claude Parameters")
-        claude_group.add_argument(
-            "--export-path",
-            type=str,
-            default="./claude_export",
-            help="Path to Claude export file (.json or .zip) or directory containing exports (default: ./claude_export)",
-        )
-        claude_group.add_argument(
-            "--concatenate-conversations",
-            action="store_true",
-            default=True,
-            help="Concatenate messages within conversations for better context (default: True)",
-        )
-        claude_group.add_argument(
-            "--separate-messages",
-            action="store_true",
-            help="Process each message as a separate document (overrides --concatenate-conversations)",
-        )
-        claude_group.add_argument(
-            "--chunk-size", type=int, default=512, help="Text chunk size (default: 512)"
-        )
-        claude_group.add_argument(
-            "--chunk-overlap", type=int, default=128, help="Text chunk overlap (default: 128)"
-        )
-
-    def _find_claude_exports(self, export_path: Path) -> list[Path]:
-        """
-        Find Claude export files in the given path.
-
-        Args:
-            export_path: Path to search for exports
-
-        Returns:
-            List of paths to Claude export files
-        """
-        export_files = []
-
-        if export_path.is_file():
-            if export_path.suffix.lower() in [".zip", ".json"]:
-                export_files.append(export_path)
-        elif export_path.is_dir():
-            # Look for zip and json files
-            export_files.extend(export_path.glob("*.zip"))
-            export_files.extend(export_path.glob("*.json"))
-
-        return export_files
-
-    async def load_data(self, args) -> list[str]:
-        """Load Claude export data and convert to text chunks."""
-        export_path = Path(args.export_path)
-
-        if not export_path.exists():
-            print(f"Claude export path not found: {export_path}")
-            print(
-                "Please ensure you have exported your Claude data and placed it in the correct location."
-            )
-            print("\nTo export your Claude data:")
-            print("1. Open Claude in your browser")
-            print("2. Look for export/download options in settings or conversation menu")
-            print("3. Download the conversation data (usually in JSON format)")
-            print("4. Place the file/directory at the specified path")
-            print(
-                "\nNote: Claude export methods may vary. Check Claude's help documentation for current instructions."
-            )
-            return []
-
-        # Find export files
-        export_files = self._find_claude_exports(export_path)
-
-        if not export_files:
-            print(f"No Claude export files (.json or .zip) found in: {export_path}")
-            return []
-
-        print(f"Found {len(export_files)} Claude export files")
-
-        # Create reader with appropriate settings
-        concatenate = args.concatenate_conversations and not args.separate_messages
-        reader = ClaudeReader(concatenate_conversations=concatenate)
-
-        # Process each export file
-        all_documents = []
-        total_processed = 0
-
-        for i, export_file in enumerate(export_files):
-            print(f"\nProcessing export file {i + 1}/{len(export_files)}: {export_file.name}")
-
-            try:
-                # Apply max_items limit per file
-                max_per_file = -1
-                if args.max_items > 0:
-                    remaining = args.max_items - total_processed
-                    if remaining <= 0:
-                        break
-                    max_per_file = remaining
-
-                # Load conversations
-                documents = reader.load_data(
-                    claude_export_path=str(export_file),
-                    max_count=max_per_file,
-                    include_metadata=True,
-                )
-
-                if documents:
-                    all_documents.extend(documents)
-                    total_processed += len(documents)
-                    print(f"Processed {len(documents)} conversations from this file")
-                else:
-                    print(f"No conversations loaded from {export_file}")
-
-            except Exception as e:
-                print(f"Error processing {export_file}: {e}")
-                continue
-
-        if not all_documents:
-            print("No conversations found to process!")
-            print("\nTroubleshooting:")
-            print("- Ensure the export file is a valid Claude export")
-            print("- Check that the JSON file contains conversation data")
-            print("- Try using a different export format or method")
-            print("- Check Claude's documentation for current export procedures")
-            return []
-
-        print(f"\nTotal conversations processed: {len(all_documents)}")
-        print("Now starting to split into text chunks... this may take some time")
-
-        # Convert to text chunks
-        all_texts = create_text_chunks(
-            all_documents, chunk_size=args.chunk_size, chunk_overlap=args.chunk_overlap
-        )
-
-        print(f"Created {len(all_texts)} text chunks from {len(all_documents)} conversations")
-        return all_texts
-
-
-if __name__ == "__main__":
-    import asyncio
-
-    # Example queries for Claude RAG
-    print("\n🤖 Claude RAG Example")
-    print("=" * 50)
-    print("\nExample queries you can try:")
-    print("- 'What did I ask Claude about Python programming?'")
-    print("- 'Show me conversations about machine learning'")
-    print("- 'Find discussions about code optimization'")
-    print("- 'What advice did Claude give me about software design?'")
-    print("- 'Search for conversations about debugging techniques'")
-    print("\nTo get started:")
-    print("1. Export your Claude conversation data")
-    print("2. Place the JSON/ZIP file in ./claude_export/")
-    print("3. Run this script to build your personal Claude knowledge base!")
-    print("\nOr run without --query for interactive mode\n")
-
-    rag = ClaudeRAG()
-    asyncio.run(rag.run())
@@ -83,6 +83,81 @@ ollama pull nomic-embed-text

 </details>

+## Local & Remote Inference Endpoints
+
+> Applies to both LLMs (`leann ask`) and embeddings (`leann build`).
+
+LEANN now treats Ollama, LM Studio, and other OpenAI-compatible runtimes as first-class providers. You can point LEANN at any compatible endpoint – either on the same machine or across the network – with a couple of flags or environment variables.
+
+### One-Time Environment Setup
+
+```bash
+# Works for OpenAI-compatible runtimes such as LM Studio, vLLM, SGLang, llamafile, etc.
+export OPENAI_API_KEY="your-key"            # or leave unset for local servers that do not check keys
+export OPENAI_BASE_URL="http://localhost:1234/v1"
+
+# Ollama-compatible runtimes (Ollama, Ollama on another host, llamacpp-server, etc.)
+export LEANN_OLLAMA_HOST="http://localhost:11434"   # falls back to OLLAMA_HOST or LOCAL_LLM_ENDPOINT
+```
+
+LEANN also recognises `LEANN_LOCAL_LLM_HOST` (highest priority), `LEANN_OPENAI_BASE_URL`, and `LOCAL_OPENAI_BASE_URL`, so existing scripts continue to work.
+
+### Passing Hosts Per Command
+
+```bash
+# Build an index with a remote embedding server
+leann build my-notes \
+  --docs ./notes \
+  --embedding-mode openai \
+  --embedding-model text-embedding-qwen3-embedding-0.6b \
+  --embedding-api-base http://192.168.1.50:1234/v1 \
+  --embedding-api-key local-dev-key
+
+# Query using a local LM Studio instance via OpenAI-compatible API
+leann ask my-notes \
+  --llm openai \
+  --llm-model qwen3-8b \
+  --api-base http://localhost:1234/v1 \
+  --api-key local-dev-key
+
+# Query an Ollama instance running on another box
+leann ask my-notes \
+  --llm ollama \
+  --llm-model qwen3:14b \
+  --host http://192.168.1.101:11434
+```
+
+⚠️ **Make sure the endpoint is reachable**: when your inference server runs on a home/workstation and the index/search job runs in the cloud, the server must be able to reach the host you configured. Typical options include:
+
+- Expose a public IP (and open the relevant port) on the machine that hosts LM Studio/Ollama.
+- Configure router or cloud provider port forwarding.
+- Tunnel traffic through tools like `tailscale`, `cloudflared`, or `ssh -R`.
+
+When you set these options while building an index, LEANN stores them in `meta.json`. Any subsequent `leann ask` or searcher process automatically reuses the same provider settings – even when we spawn background embedding servers. This makes the “server without GPU talking to my local workstation” workflow from [issue #80](https://github.com/yichuan-w/LEANN/issues/80#issuecomment-2287230548) work out-of-the-box.
+
+**Tip:** If your runtime does not require an API key (many local stacks don’t), leave `--api-key` unset. LEANN will skip injecting credentials.
+
+### Python API Usage
+
+You can pass the same configuration from Python:
+
+```python
+from leann.api import LeannBuilder
+
+builder = LeannBuilder(
+    backend_name="hnsw",
+    embedding_mode="openai",
+    embedding_model="text-embedding-qwen3-embedding-0.6b",
+    embedding_options={
+        "base_url": "http://192.168.1.50:1234/v1",
+        "api_key": "local-dev-key",
+    },
+)
+builder.build_index("./indexes/my-notes", chunks)
+```
+
+`embedding_options` is persisted to the index `meta.json`, so subsequent `LeannSearcher` or `LeannChat` sessions automatically reuse the same provider settings (the embedding server manager forwards them to the provider for you).
+
 ## Index Selection: Matching Your Scale

 ### HNSW (Hierarchical Navigable Small World)
@@ -10,7 +10,7 @@ import sys
 import threading
 import time
 from pathlib import Path
-from typing import Optional
+from typing import Any, Optional

 import numpy as np
 import zmq
@@ -32,6 +32,16 @@ if not logger.handlers:
    logger.propagate = False


+_RAW_PROVIDER_OPTIONS = os.getenv("LEANN_EMBEDDING_OPTIONS")
+try:
+    PROVIDER_OPTIONS: dict[str, Any] = (
+        json.loads(_RAW_PROVIDER_OPTIONS) if _RAW_PROVIDER_OPTIONS else {}
+    )
+except json.JSONDecodeError:
+    logger.warning("Failed to parse LEANN_EMBEDDING_OPTIONS; ignoring provider options")
+    PROVIDER_OPTIONS = {}
+
+
 def create_diskann_embedding_server(
    passages_file: Optional[str] = None,
    zmq_port: int = 5555,
@@ -181,7 +191,12 @@ def create_diskann_embedding_server(
                    logger.debug(f"Text lengths: {[len(t) for t in texts[:5]]}")  # Show first 5

                # Process embeddings using unified computation
-                embeddings = compute_embeddings(texts, model_name, mode=embedding_mode)
+                embeddings = compute_embeddings(
+                    texts,
+                    model_name,
+                    mode=embedding_mode,
+                    provider_options=PROVIDER_OPTIONS,
+                )
                logger.info(
                    f"Computed embeddings for {len(texts)} texts, shape: {embeddings.shape}"
                )
@@ -296,7 +311,12 @@ def create_diskann_embedding_server(
                            continue

                    # Process the request
-                    embeddings = compute_embeddings(texts, model_name, mode=embedding_mode)
+                    embeddings = compute_embeddings(
+                        texts,
+                        model_name,
+                        mode=embedding_mode,
+                        provider_options=PROVIDER_OPTIONS,
+                    )
                    logger.info(f"Computed embeddings shape: {embeddings.shape}")

                    # Validation
@@ -10,7 +10,7 @@ import sys
 import threading
 import time
 from pathlib import Path
-from typing import Optional
+from typing import Any, Optional

 import msgpack
 import numpy as np
@@ -45,6 +45,15 @@ if log_path:

 logger.propagate = False

+_RAW_PROVIDER_OPTIONS = os.getenv("LEANN_EMBEDDING_OPTIONS")
+try:
+    PROVIDER_OPTIONS: dict[str, Any] = (
+        json.loads(_RAW_PROVIDER_OPTIONS) if _RAW_PROVIDER_OPTIONS else {}
+    )
+except json.JSONDecodeError:
+    logger.warning("Failed to parse LEANN_EMBEDDING_OPTIONS; ignoring provider options")
+    PROVIDER_OPTIONS = {}
+

 def create_hnsw_embedding_server(
    passages_file: Optional[str] = None,
@@ -151,7 +160,12 @@ def create_hnsw_embedding_server(
                    ):
                        last_request_type = "text"
                        last_request_length = len(request)
-                        embeddings = compute_embeddings(request, model_name, mode=embedding_mode)
+                        embeddings = compute_embeddings(
+                            request,
+                            model_name,
+                            mode=embedding_mode,
+                            provider_options=PROVIDER_OPTIONS,
+                        )
                        rep_socket.send(msgpack.packb(embeddings.tolist()))
                        e2e_end = time.time()
                        logger.info(f"⏱️  Text embedding E2E time: {e2e_end - e2e_start:.6f}s")
@@ -200,7 +214,10 @@ def create_hnsw_embedding_server(
                        if texts:
                            try:
                                embeddings = compute_embeddings(
-                                    texts, model_name, mode=embedding_mode
+                                    texts,
+                                    model_name,
+                                    mode=embedding_mode,
+                                    provider_options=PROVIDER_OPTIONS,
                                )
                                logger.info(
                                    f"Computed embeddings for {len(texts)} texts, shape: {embeddings.shape}"
@@ -265,7 +282,12 @@ def create_hnsw_embedding_server(

                    if texts:
                        try:
-                            embeddings = compute_embeddings(texts, model_name, mode=embedding_mode)
+                            embeddings = compute_embeddings(
+                                texts,
+                                model_name,
+                                mode=embedding_mode,
+                                provider_options=PROVIDER_OPTIONS,
+                            )
                            logger.info(
                                f"Computed embeddings for {len(texts)} texts, shape: {embeddings.shape}"
                            )
@@ -39,6 +39,7 @@ def compute_embeddings(
    use_server: bool = True,
    port: Optional[int] = None,
    is_build=False,
+    provider_options: Optional[dict[str, Any]] = None,
 ) -> np.ndarray:
    """
    Computes embeddings using different backends.
@@ -72,6 +73,7 @@ def compute_embeddings(
            model_name,
            mode=mode,
            is_build=is_build,
+            provider_options=provider_options,
        )


@@ -278,6 +280,7 @@ class LeannBuilder:
        embedding_model: str = "facebook/contriever",
        dimensions: Optional[int] = None,
        embedding_mode: str = "sentence-transformers",
+        embedding_options: Optional[dict[str, Any]] = None,
        **backend_kwargs,
    ):
        self.backend_name = backend_name
@@ -300,6 +303,7 @@ class LeannBuilder:
        self.embedding_model = embedding_model
        self.dimensions = dimensions
        self.embedding_mode = embedding_mode
+        self.embedding_options = embedding_options or {}

        # Check if we need to use cosine distance for normalized embeddings
        normalized_embeddings_models = {
@@ -407,6 +411,7 @@ class LeannBuilder:
                    self.embedding_model,
                    self.embedding_mode,
                    use_server=False,
+                    provider_options=self.embedding_options,
                )[0]
            )
        path = Path(index_path)
@@ -446,6 +451,7 @@ class LeannBuilder:
            self.embedding_mode,
            use_server=False,
            is_build=True,
+            provider_options=self.embedding_options,
        )
        string_ids = [chunk["id"] for chunk in self.chunks]
        current_backend_kwargs = {**self.backend_kwargs, "dimensions": self.dimensions}
@@ -472,6 +478,9 @@ class LeannBuilder:
            ],
        }

+        if self.embedding_options:
+            meta_data["embedding_options"] = self.embedding_options
+
        # Add storage status flags for HNSW backend
        if self.backend_name == "hnsw":
            is_compact = self.backend_kwargs.get("is_compact", True)
@@ -592,6 +601,9 @@ class LeannBuilder:
            "embeddings_source": str(embeddings_file),
        }

+        if self.embedding_options:
+            meta_data["embedding_options"] = self.embedding_options
+
        # Add storage status flags for HNSW backend
        if self.backend_name == "hnsw":
            is_compact = self.backend_kwargs.get("is_compact", True)
@@ -673,6 +685,7 @@ class LeannBuilder:
            self.embedding_mode,
            use_server=False,
            is_build=True,
+            provider_options=self.embedding_options,
        )

        embedding_dim = embeddings.shape[1]
@@ -771,6 +784,7 @@ class LeannSearcher:
        self.embedding_model = self.meta_data["embedding_model"]
        # Support both old and new format
        self.embedding_mode = self.meta_data.get("embedding_mode", "sentence-transformers")
+        self.embedding_options = self.meta_data.get("embedding_options", {})
        # Delegate portability handling to PassageManager
        self.passage_manager = PassageManager(
            self.meta_data.get("passage_sources", []), metadata_file_path=self.meta_path_str
@@ -782,6 +796,8 @@ class LeannSearcher:
            raise ValueError(f"Backend '{backend_name}' not found.")
        final_kwargs = {**self.meta_data.get("backend_kwargs", {}), **backend_kwargs}
        final_kwargs["enable_warmup"] = enable_warmup
+        if self.embedding_options:
+            final_kwargs.setdefault("embedding_options", self.embedding_options)
        self.backend_impl: LeannBackendSearcherInterface = backend_factory.searcher(
            index_path, **final_kwargs
        )
@@ -12,6 +12,8 @@ from typing import Any, Optional

 import torch

+from .settings import resolve_ollama_host, resolve_openai_api_key, resolve_openai_base_url
+
 # Configure logging
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
@@ -310,11 +312,12 @@ def search_hf_models(query: str, limit: int = 10) -> list[str]:


 def validate_model_and_suggest(
-    model_name: str, llm_type: str, host: str = "http://localhost:11434"
+    model_name: str, llm_type: str, host: Optional[str] = None
 ) -> Optional[str]:
    """Validate model name and provide suggestions if invalid"""
    if llm_type == "ollama":
-        available_models = check_ollama_models(host)
+        resolved_host = resolve_ollama_host(host)
+        available_models = check_ollama_models(resolved_host)
        if available_models and model_name not in available_models:
            error_msg = f"Model '{model_name}' not found in your local Ollama installation."

@@ -457,19 +460,19 @@ class LLMInterface(ABC):
 class OllamaChat(LLMInterface):
    """LLM interface for Ollama models."""

-    def __init__(self, model: str = "llama3:8b", host: str = "http://localhost:11434"):
+    def __init__(self, model: str = "llama3:8b", host: Optional[str] = None):
        self.model = model
-        self.host = host
-        logger.info(f"Initializing OllamaChat with model='{model}' and host='{host}'")
+        self.host = resolve_ollama_host(host)
+        logger.info(f"Initializing OllamaChat with model='{model}' and host='{self.host}'")
        try:
            import requests

            # Check if the Ollama server is responsive
-            if host:
-                requests.get(host)
+            if self.host:
+                requests.get(self.host)

            # Pre-check model availability with helpful suggestions
-            model_error = validate_model_and_suggest(model, "ollama", host)
+            model_error = validate_model_and_suggest(model, "ollama", self.host)
            if model_error:
                raise ValueError(model_error)

@@ -478,9 +481,11 @@ class OllamaChat(LLMInterface):
                "The 'requests' library is required for Ollama. Please install it with 'pip install requests'."
            )
        except requests.exceptions.ConnectionError:
-            logger.error(f"Could not connect to Ollama at {host}. Please ensure Ollama is running.")
+            logger.error(
+                f"Could not connect to Ollama at {self.host}. Please ensure Ollama is running."
+            )
            raise ConnectionError(
-                f"Could not connect to Ollama at {host}. Please ensure Ollama is running."
+                f"Could not connect to Ollama at {self.host}. Please ensure Ollama is running."
            )

    def ask(self, prompt: str, **kwargs) -> str:
@@ -737,21 +742,31 @@ class GeminiChat(LLMInterface):
 class OpenAIChat(LLMInterface):
    """LLM interface for OpenAI models."""

-    def __init__(self, model: str = "gpt-4o", api_key: Optional[str] = None):
+    def __init__(
+        self,
+        model: str = "gpt-4o",
+        api_key: Optional[str] = None,
+        base_url: Optional[str] = None,
+    ):
        self.model = model
-        self.api_key = api_key or os.getenv("OPENAI_API_KEY")
+        self.base_url = resolve_openai_base_url(base_url)
+        self.api_key = resolve_openai_api_key(api_key)

        if not self.api_key:
            raise ValueError(
                "OpenAI API key is required. Set OPENAI_API_KEY environment variable or pass api_key parameter."
            )

-        logger.info(f"Initializing OpenAI Chat with model='{model}'")
+        logger.info(
+            "Initializing OpenAI Chat with model='%s' and base_url='%s'",
+            model,
+            self.base_url,
+        )

        try:
            import openai

-            self.client = openai.OpenAI(api_key=self.api_key)
+            self.client = openai.OpenAI(api_key=self.api_key, base_url=self.base_url)
        except ImportError:
            raise ImportError(
                "The 'openai' library is required for OpenAI models. Please install it with 'pip install openai'."
@@ -841,12 +856,16 @@ def get_llm(llm_config: Optional[dict[str, Any]] = None) -> LLMInterface:
    if llm_type == "ollama":
        return OllamaChat(
            model=model or "llama3:8b",
-            host=llm_config.get("host", "http://localhost:11434"),
+            host=llm_config.get("host"),
        )
    elif llm_type == "hf":
        return HFChat(model_name=model or "deepseek-ai/deepseek-llm-7b-chat")
    elif llm_type == "openai":
-        return OpenAIChat(model=model or "gpt-4o", api_key=llm_config.get("api_key"))
+        return OpenAIChat(
+            model=model or "gpt-4o",
+            api_key=llm_config.get("api_key"),
+            base_url=llm_config.get("base_url"),
+        )
    elif llm_type == "gemini":
        return GeminiChat(model=model or "gemini-2.5-flash", api_key=llm_config.get("api_key"))
    elif llm_type == "simulated":
@@ -9,6 +9,7 @@ from tqdm import tqdm

 from .api import LeannBuilder, LeannChat, LeannSearcher
 from .registry import register_project_directory
+from .settings import resolve_ollama_host, resolve_openai_api_key, resolve_openai_base_url


 def extract_pdf_text_with_pymupdf(file_path: str) -> str:
@@ -123,6 +124,24 @@ Examples:
            choices=["sentence-transformers", "openai", "mlx", "ollama"],
            help="Embedding backend mode (default: sentence-transformers)",
        )
+        build_parser.add_argument(
+            "--embedding-host",
+            type=str,
+            default=None,
+            help="Override Ollama-compatible embedding host",
+        )
+        build_parser.add_argument(
+            "--embedding-api-base",
+            type=str,
+            default=None,
+            help="Base URL for OpenAI-compatible embedding services",
+        )
+        build_parser.add_argument(
+            "--embedding-api-key",
+            type=str,
+            default=None,
+            help="API key for embedding service (defaults to OPENAI_API_KEY)",
+        )
        build_parser.add_argument(
            "--force", "-f", action="store_true", help="Force rebuild existing index"
        )
@@ -248,7 +267,12 @@ Examples:
        ask_parser.add_argument(
            "--model", type=str, default="qwen3:8b", help="Model name (default: qwen3:8b)"
        )
-        ask_parser.add_argument("--host", type=str, default="http://localhost:11434")
+        ask_parser.add_argument(
+            "--host",
+            type=str,
+            default=None,
+            help="Override Ollama-compatible host (defaults to LEANN_OLLAMA_HOST/OLLAMA_HOST)",
+        )
        ask_parser.add_argument(
            "--interactive", "-i", action="store_true", help="Interactive chat mode"
        )
@@ -277,6 +301,18 @@ Examples:
            default=None,
            help="Thinking budget for reasoning models (low/medium/high). Supported by GPT-Oss:20b and other reasoning models.",
        )
+        ask_parser.add_argument(
+            "--api-base",
+            type=str,
+            default=None,
+            help="Base URL for OpenAI-compatible APIs (e.g., http://localhost:10000/v1)",
+        )
+        ask_parser.add_argument(
+            "--api-key",
+            type=str,
+            default=None,
+            help="API key for OpenAI-compatible APIs (defaults to OPENAI_API_KEY)",
+        )

        # List command
        subparsers.add_parser("list", help="List all indexes")
@@ -1325,10 +1361,20 @@ Examples:

        print(f"Building index '{index_name}' with {args.backend} backend...")

+        embedding_options: dict[str, Any] = {}
+        if args.embedding_mode == "ollama":
+            embedding_options["host"] = resolve_ollama_host(args.embedding_host)
+        elif args.embedding_mode == "openai":
+            embedding_options["base_url"] = resolve_openai_base_url(args.embedding_api_base)
+            resolved_embedding_key = resolve_openai_api_key(args.embedding_api_key)
+            if resolved_embedding_key:
+                embedding_options["api_key"] = resolved_embedding_key
+
        builder = LeannBuilder(
            backend_name=args.backend,
            embedding_model=args.embedding_model,
            embedding_mode=args.embedding_mode,
+            embedding_options=embedding_options or None,
            graph_degree=args.graph_degree,
            complexity=args.complexity,
            is_compact=args.compact,
@@ -1476,7 +1522,12 @@ Examples:

        llm_config = {"type": args.llm, "model": args.model}
        if args.llm == "ollama":
-            llm_config["host"] = args.host
+            llm_config["host"] = resolve_ollama_host(args.host)
+        elif args.llm == "openai":
+            llm_config["base_url"] = resolve_openai_base_url(args.api_base)
+            resolved_api_key = resolve_openai_api_key(args.api_key)
+            if resolved_api_key:
+                llm_config["api_key"] = resolved_api_key

        chat = LeannChat(index_path=index_path, llm_config=llm_config)

@@ -7,11 +7,13 @@ Preserves all optimization parameters to ensure performance
 import logging
 import os
 import time
-from typing import Any
+from typing import Any, Optional

 import numpy as np
 import torch

+from .settings import resolve_ollama_host, resolve_openai_api_key, resolve_openai_base_url
+
 # Set up logger with proper level
 logger = logging.getLogger(__name__)
 LOG_LEVEL = os.getenv("LEANN_LOG_LEVEL", "WARNING").upper()
@@ -31,6 +33,7 @@ def compute_embeddings(
    adaptive_optimization: bool = True,
    manual_tokenize: bool = False,
    max_length: int = 512,
+    provider_options: Optional[dict[str, Any]] = None,
 ) -> np.ndarray:
    """
    Unified embedding computation entry point
@@ -46,6 +49,8 @@ def compute_embeddings(
    Returns:
        Normalized embeddings array, shape: (len(texts), embedding_dim)
    """
+    provider_options = provider_options or {}
+
    if mode == "sentence-transformers":
        return compute_embeddings_sentence_transformers(
            texts,
@@ -57,11 +62,21 @@ def compute_embeddings(
            max_length=max_length,
        )
    elif mode == "openai":
-        return compute_embeddings_openai(texts, model_name)
+        return compute_embeddings_openai(
+            texts,
+            model_name,
+            base_url=provider_options.get("base_url"),
+            api_key=provider_options.get("api_key"),
+        )
    elif mode == "mlx":
        return compute_embeddings_mlx(texts, model_name)
    elif mode == "ollama":
-        return compute_embeddings_ollama(texts, model_name, is_build=is_build)
+        return compute_embeddings_ollama(
+            texts,
+            model_name,
+            is_build=is_build,
+            host=provider_options.get("host"),
+        )
    elif mode == "gemini":
        return compute_embeddings_gemini(texts, model_name, is_build=is_build)
    else:
@@ -353,12 +368,15 @@ def compute_embeddings_sentence_transformers(
    return embeddings


-def compute_embeddings_openai(texts: list[str], model_name: str) -> np.ndarray:
+def compute_embeddings_openai(
+    texts: list[str],
+    model_name: str,
+    base_url: Optional[str] = None,
+    api_key: Optional[str] = None,
+) -> np.ndarray:
    # TODO: @yichuan-w add progress bar only in build mode
    """Compute embeddings using OpenAI API"""
    try:
-        import os
-
        import openai
    except ImportError as e:
        raise ImportError(f"OpenAI package not installed: {e}")
@@ -373,16 +391,18 @@ def compute_embeddings_openai(texts: list[str], model_name: str) -> np.ndarray:
            f"Found {invalid_count} empty/invalid text(s) in input. Upstream should filter before calling OpenAI."
        )

-    api_key = os.getenv("OPENAI_API_KEY")
-    if not api_key:
+    resolved_base_url = resolve_openai_base_url(base_url)
+    resolved_api_key = resolve_openai_api_key(api_key)
+
+    if not resolved_api_key:
        raise RuntimeError("OPENAI_API_KEY environment variable not set")

    # Cache OpenAI client
-    cache_key = "openai_client"
+    cache_key = f"openai_client::{resolved_base_url}"
    if cache_key in _model_cache:
        client = _model_cache[cache_key]
    else:
-        client = openai.OpenAI(api_key=api_key)
+        client = openai.OpenAI(api_key=resolved_api_key, base_url=resolved_base_url)
        _model_cache[cache_key] = client
        logger.info("OpenAI client cached")

@@ -507,7 +527,10 @@ def compute_embeddings_mlx(chunks: list[str], model_name: str, batch_size: int =


 def compute_embeddings_ollama(
-    texts: list[str], model_name: str, is_build: bool = False, host: str = "http://localhost:11434"
+    texts: list[str],
+    model_name: str,
+    is_build: bool = False,
+    host: Optional[str] = None,
 ) -> np.ndarray:
    """
    Compute embeddings using Ollama API with simplified batch processing.
@@ -518,7 +541,7 @@ def compute_embeddings_ollama(
        texts: List of texts to compute embeddings for
        model_name: Ollama model name (e.g., "nomic-embed-text", "mxbai-embed-large")
        is_build: Whether this is a build operation (shows progress bar)
-        host: Ollama host URL (default: http://localhost:11434)
+        host: Ollama host URL (defaults to environment or http://localhost:11434)

    Returns:
        Normalized embeddings array, shape: (len(texts), embedding_dim)
@@ -533,17 +556,19 @@ def compute_embeddings_ollama(
    if not texts:
        raise ValueError("Cannot compute embeddings for empty text list")

+    resolved_host = resolve_ollama_host(host)
+
    logger.info(
-        f"Computing embeddings for {len(texts)} texts using Ollama API, model: '{model_name}'"
+        f"Computing embeddings for {len(texts)} texts using Ollama API, model: '{model_name}', host: '{resolved_host}'"
    )

    # Check if Ollama is running
    try:
-        response = requests.get(f"{host}/api/version", timeout=5)
+        response = requests.get(f"{resolved_host}/api/version", timeout=5)
        response.raise_for_status()
    except requests.exceptions.ConnectionError:
        error_msg = (
-            f"❌ Could not connect to Ollama at {host}.\n\n"
+            f"❌ Could not connect to Ollama at {resolved_host}.\n\n"
            "Please ensure Ollama is running:\n"
            "  • macOS/Linux: ollama serve\n"
            "  • Windows: Make sure Ollama is running in the system tray\n\n"
@@ -555,7 +580,7 @@ def compute_embeddings_ollama(

    # Check if model exists and provide helpful suggestions
    try:
-        response = requests.get(f"{host}/api/tags", timeout=5)
+        response = requests.get(f"{resolved_host}/api/tags", timeout=5)
        response.raise_for_status()
        models = response.json()
        model_names = [model["name"] for model in models.get("models", [])]
@@ -618,7 +643,9 @@ def compute_embeddings_ollama(
        # Verify the model supports embeddings by testing it
        try:
            test_response = requests.post(
-                f"{host}/api/embeddings", json={"model": model_name, "prompt": "test"}, timeout=10
+                f"{resolved_host}/api/embeddings",
+                json={"model": model_name, "prompt": "test"},
+                timeout=10,
            )
            if test_response.status_code != 200:
                error_msg = (
@@ -665,7 +692,7 @@ def compute_embeddings_ollama(
            while retry_count < max_retries:
                try:
                    response = requests.post(
-                        f"{host}/api/embeddings",
+                        f"{resolved_host}/api/embeddings",
                        json={"model": model_name, "prompt": truncated_text},
                        timeout=30,
                    )
@@ -8,6 +8,8 @@ import time
 from pathlib import Path
 from typing import Optional

+from .settings import encode_provider_options
+
 # Lightweight, self-contained server manager with no cross-process inspection

 # Set up logging based on environment variable
@@ -82,16 +84,40 @@ class EmbeddingServerManager:
    ) -> tuple[bool, int]:
        """Start the embedding server."""
        # passages_file may be present in kwargs for server CLI, but we don't need it here
+        provider_options = kwargs.pop("provider_options", None)
+
+        config_signature = {
+            "model_name": model_name,
+            "passages_file": kwargs.get("passages_file", ""),
+            "embedding_mode": embedding_mode,
+            "provider_options": provider_options or {},
+        }

        # If this manager already has a live server, just reuse it
-        if self.server_process and self.server_process.poll() is None and self.server_port:
+        if (
+            self.server_process
+            and self.server_process.poll() is None
+            and self.server_port
+            and self._server_config == config_signature
+        ):
            logger.info("Reusing in-process server")
            return True, self.server_port

+        # Configuration changed, stop existing server before starting a new one
+        if self.server_process and self.server_process.poll() is None:
+            logger.info("Existing server configuration differs; restarting embedding server")
+            self.stop_server()
+
        # For Colab environment, use a different strategy
        if _is_colab_environment():
            logger.info("Detected Colab environment, using alternative startup strategy")
-            return self._start_server_colab(port, model_name, embedding_mode, **kwargs)
+            return self._start_server_colab(
+                port,
+                model_name,
+                embedding_mode,
+                provider_options=provider_options,
+                **kwargs,
+            )

        # Always pick a fresh available port
        try:
@@ -101,13 +127,21 @@ class EmbeddingServerManager:
            return False, port

        # Start a new server
-        return self._start_new_server(actual_port, model_name, embedding_mode, **kwargs)
+        return self._start_new_server(
+            actual_port,
+            model_name,
+            embedding_mode,
+            provider_options=provider_options,
+            config_signature=config_signature,
+            **kwargs,
+        )

    def _start_server_colab(
        self,
        port: int,
        model_name: str,
        embedding_mode: str = "sentence-transformers",
+        provider_options: Optional[dict] = None,
        **kwargs,
    ) -> tuple[bool, int]:
        """Start server with Colab-specific configuration."""
@@ -125,8 +159,20 @@ class EmbeddingServerManager:

        try:
            # In Colab, we'll use a more direct approach
-            self._launch_server_process_colab(command, actual_port)
-            return self._wait_for_server_ready_colab(actual_port)
+            self._launch_server_process_colab(
+                command,
+                actual_port,
+                provider_options=provider_options,
+            )
+            started, ready_port = self._wait_for_server_ready_colab(actual_port)
+            if started:
+                self._server_config = {
+                    "model_name": model_name,
+                    "passages_file": kwargs.get("passages_file", ""),
+                    "embedding_mode": embedding_mode,
+                    "provider_options": provider_options or {},
+                }
+            return started, ready_port
        except Exception as e:
            logger.error(f"Failed to start embedding server in Colab: {e}")
            return False, actual_port
@@ -134,7 +180,13 @@ class EmbeddingServerManager:
    # Note: No compatibility check needed; manager is per-searcher and configs are stable per instance

    def _start_new_server(
-        self, port: int, model_name: str, embedding_mode: str, **kwargs
+        self,
+        port: int,
+        model_name: str,
+        embedding_mode: str,
+        provider_options: Optional[dict] = None,
+        config_signature: Optional[dict] = None,
+        **kwargs,
    ) -> tuple[bool, int]:
        """Start a new embedding server on the given port."""
        logger.info(f"Starting embedding server on port {port}...")
@@ -142,8 +194,20 @@ class EmbeddingServerManager:
        command = self._build_server_command(port, model_name, embedding_mode, **kwargs)

        try:
-            self._launch_server_process(command, port)
-            return self._wait_for_server_ready(port)
+            self._launch_server_process(
+                command,
+                port,
+                provider_options=provider_options,
+            )
+            started, ready_port = self._wait_for_server_ready(port)
+            if started:
+                self._server_config = config_signature or {
+                    "model_name": model_name,
+                    "passages_file": kwargs.get("passages_file", ""),
+                    "embedding_mode": embedding_mode,
+                    "provider_options": provider_options or {},
+                }
+            return started, ready_port
        except Exception as e:
            logger.error(f"Failed to start embedding server: {e}")
            return False, port
@@ -173,7 +237,12 @@ class EmbeddingServerManager:

        return command

-    def _launch_server_process(self, command: list, port: int) -> None:
+    def _launch_server_process(
+        self,
+        command: list,
+        port: int,
+        provider_options: Optional[dict] = None,
+    ) -> None:
        """Launch the server process."""
        project_root = Path(__file__).parent.parent.parent.parent.parent
        logger.info(f"Command: {' '.join(command)}")
@@ -193,14 +262,20 @@ class EmbeddingServerManager:

        # Start embedding server subprocess
        logger.info(f"Starting server process with command: {' '.join(command)}")
+        env = os.environ.copy()
+        encoded_options = encode_provider_options(provider_options)
+        if encoded_options:
+            env["LEANN_EMBEDDING_OPTIONS"] = encoded_options
+
        self.server_process = subprocess.Popen(
            command,
            cwd=project_root,
            stdout=stdout_target,
            stderr=stderr_target,
+            env=env,
        )
        self.server_port = port
-        # Record config for in-process reuse
+        # Record config for in-process reuse (best effort; refined later when ready)
        try:
            self._server_config = {
                "model_name": command[command.index("--model-name") + 1]
@@ -212,12 +287,14 @@ class EmbeddingServerManager:
                "embedding_mode": command[command.index("--embedding-mode") + 1]
                if "--embedding-mode" in command
                else "sentence-transformers",
+                "provider_options": provider_options or {},
            }
        except Exception:
            self._server_config = {
                "model_name": "",
                "passages_file": "",
                "embedding_mode": "sentence-transformers",
+                "provider_options": provider_options or {},
            }
        logger.info(f"Server process started with PID: {self.server_process.pid}")

@@ -322,16 +399,27 @@ class EmbeddingServerManager:
        # Removed: cross-process adoption no longer supported
        return

-    def _launch_server_process_colab(self, command: list, port: int) -> None:
+    def _launch_server_process_colab(
+        self,
+        command: list,
+        port: int,
+        provider_options: Optional[dict] = None,
+    ) -> None:
        """Launch the server process with Colab-specific settings."""
        logger.info(f"Colab Command: {' '.join(command)}")

        # In Colab, we need to be more careful about process management
+        env = os.environ.copy()
+        encoded_options = encode_provider_options(provider_options)
+        if encoded_options:
+            env["LEANN_EMBEDDING_OPTIONS"] = encoded_options
+
        self.server_process = subprocess.Popen(
            command,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True,
+            env=env,
        )
        self.server_port = port
        logger.info(f"Colab server process started with PID: {self.server_process.pid}")
@@ -345,6 +433,7 @@ class EmbeddingServerManager:
            "model_name": "",
            "passages_file": "",
            "embedding_mode": "sentence-transformers",
+            "provider_options": provider_options or {},
        }

    def _wait_for_server_ready_colab(self, port: int) -> tuple[bool, int]:
@@ -41,6 +41,7 @@ class BaseSearcher(LeannBackendSearcherInterface, ABC):
            print("WARNING: embedding_model not found in meta.json. Recompute will fail.")

        self.embedding_mode = self.meta.get("embedding_mode", "sentence-transformers")
+        self.embedding_options = self.meta.get("embedding_options", {})

        self.embedding_server_manager = EmbeddingServerManager(
            backend_module_name=backend_module_name,
@@ -77,6 +78,7 @@ class BaseSearcher(LeannBackendSearcherInterface, ABC):
            passages_file=passages_source_file,
            distance_metric=distance_metric,
            enable_warmup=kwargs.get("enable_warmup", False),
+            provider_options=self.embedding_options,
        )
        if not server_started:
            raise RuntimeError(f"Failed to start embedding server on port {actual_port}")
@@ -125,7 +127,12 @@ class BaseSearcher(LeannBackendSearcherInterface, ABC):
        from .embedding_compute import compute_embeddings

        embedding_mode = self.meta.get("embedding_mode", "sentence-transformers")
-        return compute_embeddings([query], self.embedding_model, embedding_mode)
+        return compute_embeddings(
+            [query],
+            self.embedding_model,
+            embedding_mode,
+            provider_options=self.embedding_options,
+        )

    def _compute_embedding_via_server(self, chunks: list, zmq_port: int) -> np.ndarray:
        """Compute embeddings using the ZMQ embedding server."""
@@ -0,0 +1,74 @@
+"""Runtime configuration helpers for LEANN."""
+
+from __future__ import annotations
+
+import json
+import os
+from typing import Any
+
+# Default fallbacks to preserve current behaviour while keeping them in one place.
+_DEFAULT_OLLAMA_HOST = "http://localhost:11434"
+_DEFAULT_OPENAI_BASE_URL = "https://api.openai.com/v1"
+
+
+def _clean_url(value: str) -> str:
+    """Normalize URL strings by stripping trailing slashes."""
+
+    return value.rstrip("/") if value else value
+
+
+def resolve_ollama_host(explicit: str | None = None) -> str:
+    """Resolve the Ollama-compatible endpoint to use."""
+
+    candidates = (
+        explicit,
+        os.getenv("LEANN_LOCAL_LLM_HOST"),
+        os.getenv("LEANN_OLLAMA_HOST"),
+        os.getenv("OLLAMA_HOST"),
+        os.getenv("LOCAL_LLM_ENDPOINT"),
+    )
+
+    for candidate in candidates:
+        if candidate:
+            return _clean_url(candidate)
+
+    return _clean_url(_DEFAULT_OLLAMA_HOST)
+
+
+def resolve_openai_base_url(explicit: str | None = None) -> str:
+    """Resolve the base URL for OpenAI-compatible services."""
+
+    candidates = (
+        explicit,
+        os.getenv("LEANN_OPENAI_BASE_URL"),
+        os.getenv("OPENAI_BASE_URL"),
+        os.getenv("LOCAL_OPENAI_BASE_URL"),
+    )
+
+    for candidate in candidates:
+        if candidate:
+            return _clean_url(candidate)
+
+    return _clean_url(_DEFAULT_OPENAI_BASE_URL)
+
+
+def resolve_openai_api_key(explicit: str | None = None) -> str | None:
+    """Resolve the API key for OpenAI-compatible services."""
+
+    if explicit:
+        return explicit
+
+    return os.getenv("OPENAI_API_KEY")
+
+
+def encode_provider_options(options: dict[str, Any] | None) -> str | None:
+    """Serialize provider options for child processes."""
+
+    if not options:
+        return None
+
+    try:
+        return json.dumps(options)
+    except (TypeError, ValueError):
+        # Fall back to empty payload if serialization fails
+        return None
Author	SHA1	Message	Date
Andy Lee	effeb47e94	docs	2025-09-23 15:09:39 -07:00
Andy Lee	4115613b10	feat: support configurable local llm endpoints	2025-09-23 02:04:27 -07:00