Files

yichuan520030910320 f94ce63d51 add gpt oss! serve your RAG using ollama

2025-08-05 16:49:52 -07:00

4.4 KiB

Raw Blame History

Thinking Budget Feature Implementation

Overview

This document describes the implementation of the thinking budget feature for LEANN, which allows users to control the computational effort for reasoning models like GPT-Oss:20b.

Feature Description

The thinking budget feature provides three levels of computational effort for reasoning models:

low: Fast responses, basic reasoning (default for simple queries)
medium: Balanced speed and reasoning depth
high: Maximum reasoning effort, best for complex analytical questions

Implementation Details

1. Command Line Interface

Added --thinking-budget parameter to both CLI and RAG examples:

# LEANN CLI
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget high

# RAG Examples
python apps/email_rag.py --llm ollama --llm-model gpt-oss:20b --thinking-budget high
python apps/document_rag.py --llm openai --llm-model o3 --thinking-budget medium

2. LLM Backend Support

Ollama Backend (`packages/leann-core/src/leann/chat.py`)

def ask(self, prompt: str, **kwargs) -> str:
    # Handle thinking budget for reasoning models
    options = kwargs.copy()
    thinking_budget = kwargs.get("thinking_budget")
    if thinking_budget:
        options.pop("thinking_budget", None)
        if thinking_budget in ["low", "medium", "high"]:
            options["reasoning"] = {"effort": thinking_budget, "exclude": False}

API Format: Uses Ollama's reasoning parameter with effort and exclude fields.

OpenAI Backend (`packages/leann-core/src/leann/chat.py`)

def ask(self, prompt: str, **kwargs) -> str:
    # Handle thinking budget for reasoning models
    thinking_budget = kwargs.get("thinking_budget")
    if thinking_budget and thinking_budget in ["low", "medium", "high"]:
        # Check if this is an o-series model
        o_series_models = ["o3", "o3-mini", "o4-mini", "o1", "o3-pro", "o3-deep-research"]
        if any(model in self.model for model in o_series_models):
            params["reasoning_effort"] = thinking_budget

API Format: Uses OpenAI's reasoning_effort parameter for o-series models.

3. Parameter Propagation

The thinking budget parameter is properly propagated through the LEANN architecture:

CLI (packages/leann-core/src/leann/cli.py): Captures --thinking-budget argument
Base RAG (apps/base_rag_example.py): Adds parameter to argument parser
LeannChat (packages/leann-core/src/leann/api.py): Passes llm_kwargs to LLM
LLM Interface: Handles the parameter in backend-specific implementations

Files Modified

Core Implementation

packages/leann-core/src/leann/chat.py: Added thinking budget support to OllamaChat and OpenAIChat
packages/leann-core/src/leann/cli.py: Added --thinking-budget argument
apps/base_rag_example.py: Added thinking budget parameter to RAG examples

Documentation

README.md: Added thinking budget parameter to usage examples
docs/configuration-guide.md: Added detailed documentation and usage guidelines

Examples

examples/thinking_budget_demo.py: Comprehensive demo script with usage examples

Usage Examples

Basic Usage

# High reasoning effort for complex questions
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget high

# Medium reasoning for balanced performance
leann ask my-index --llm openai --model gpt-4o --thinking-budget medium

# Low reasoning for fast responses
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget low

RAG Examples

# Email RAG with high reasoning
python apps/email_rag.py --llm ollama --llm-model gpt-oss:20b --thinking-budget high

# Document RAG with medium reasoning
python apps/document_rag.py --llm openai --llm-model gpt-4o --thinking-budget medium

Supported Models

Ollama Models

GPT-Oss:20b: Primary target model with reasoning capabilities
Other reasoning models: Any Ollama model that supports the reasoning parameter

OpenAI Models

o3, o3-mini, o4-mini, o1: o-series reasoning models with reasoning_effort parameter
GPT-OSS models: Models that support reasoning capabilities

Testing

The implementation includes comprehensive testing:

Parameter handling verification
Backend-specific API format validation
CLI argument parsing tests
Integration with existing LEANN architecture

4.4 KiB Raw Blame History