Files
LEANN/docs/THINKING_BUDGET_FEATURE.md
2025-08-05 16:49:52 -07:00

4.4 KiB

Thinking Budget Feature Implementation

Overview

This document describes the implementation of the thinking budget feature for LEANN, which allows users to control the computational effort for reasoning models like GPT-Oss:20b.

Feature Description

The thinking budget feature provides three levels of computational effort for reasoning models:

  • low: Fast responses, basic reasoning (default for simple queries)
  • medium: Balanced speed and reasoning depth
  • high: Maximum reasoning effort, best for complex analytical questions

Implementation Details

1. Command Line Interface

Added --thinking-budget parameter to both CLI and RAG examples:

# LEANN CLI
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget high

# RAG Examples
python apps/email_rag.py --llm ollama --llm-model gpt-oss:20b --thinking-budget high
python apps/document_rag.py --llm openai --llm-model o3 --thinking-budget medium

2. LLM Backend Support

Ollama Backend (packages/leann-core/src/leann/chat.py)

def ask(self, prompt: str, **kwargs) -> str:
    # Handle thinking budget for reasoning models
    options = kwargs.copy()
    thinking_budget = kwargs.get("thinking_budget")
    if thinking_budget:
        options.pop("thinking_budget", None)
        if thinking_budget in ["low", "medium", "high"]:
            options["reasoning"] = {"effort": thinking_budget, "exclude": False}

API Format: Uses Ollama's reasoning parameter with effort and exclude fields.

OpenAI Backend (packages/leann-core/src/leann/chat.py)

def ask(self, prompt: str, **kwargs) -> str:
    # Handle thinking budget for reasoning models
    thinking_budget = kwargs.get("thinking_budget")
    if thinking_budget and thinking_budget in ["low", "medium", "high"]:
        # Check if this is an o-series model
        o_series_models = ["o3", "o3-mini", "o4-mini", "o1", "o3-pro", "o3-deep-research"]
        if any(model in self.model for model in o_series_models):
            params["reasoning_effort"] = thinking_budget

API Format: Uses OpenAI's reasoning_effort parameter for o-series models.

3. Parameter Propagation

The thinking budget parameter is properly propagated through the LEANN architecture:

  1. CLI (packages/leann-core/src/leann/cli.py): Captures --thinking-budget argument
  2. Base RAG (apps/base_rag_example.py): Adds parameter to argument parser
  3. LeannChat (packages/leann-core/src/leann/api.py): Passes llm_kwargs to LLM
  4. LLM Interface: Handles the parameter in backend-specific implementations

Files Modified

Core Implementation

  • packages/leann-core/src/leann/chat.py: Added thinking budget support to OllamaChat and OpenAIChat
  • packages/leann-core/src/leann/cli.py: Added --thinking-budget argument
  • apps/base_rag_example.py: Added thinking budget parameter to RAG examples

Documentation

  • README.md: Added thinking budget parameter to usage examples
  • docs/configuration-guide.md: Added detailed documentation and usage guidelines

Examples

  • examples/thinking_budget_demo.py: Comprehensive demo script with usage examples

Usage Examples

Basic Usage

# High reasoning effort for complex questions
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget high

# Medium reasoning for balanced performance
leann ask my-index --llm openai --model gpt-4o --thinking-budget medium

# Low reasoning for fast responses
leann ask my-index --llm ollama --model gpt-oss:20b --thinking-budget low

RAG Examples

# Email RAG with high reasoning
python apps/email_rag.py --llm ollama --llm-model gpt-oss:20b --thinking-budget high

# Document RAG with medium reasoning
python apps/document_rag.py --llm openai --llm-model gpt-4o --thinking-budget medium

Supported Models

Ollama Models

  • GPT-Oss:20b: Primary target model with reasoning capabilities
  • Other reasoning models: Any Ollama model that supports the reasoning parameter

OpenAI Models

  • o3, o3-mini, o4-mini, o1: o-series reasoning models with reasoning_effort parameter
  • GPT-OSS models: Models that support reasoning capabilities

Testing

The implementation includes comprehensive testing:

  • Parameter handling verification
  • Backend-specific API format validation
  • CLI argument parsing tests
  • Integration with existing LEANN architecture