* fix: Fix Twitter bookmarks anchor link
- Convert Twitter Bookmarks from collapsible details to proper header
- Update internal link to match new anchor format
- Ensures external links to #twitter-bookmarks-your-personal-tweet-library work correctly
Fixes broken link: https://github.com/yichuan-w/LEANN?tab=readme-ov-file#twitter-bookmarks-your-personal-tweet-library
* fix: Fix Slack messages anchor link as well
- Convert Slack Messages from collapsible details to proper header
- Update internal link to match new anchor format
- Ensures external links to #slack-messages-search-your-team-conversations work correctly
Both Twitter and Slack MCP sections now have reliable anchor links.
* fix: Point Slack and Twitter links to main MCP section
- Both Slack and Twitter are subsections under MCP Integration
- Links should point to #mcp-integration-rag-on-live-data-from-any-platform
- Users will land on the MCP section and can find both Slack and Twitter subsections there
This matches the actual document structure where Slack and Twitter are under the MCP Integration section.
* Improve Slack MCP integration with retry logic and comprehensive setup guide
- Add retry mechanism with exponential backoff for cache sync issues
- Handle 'users cache is not ready yet' errors gracefully
- Add max-retries and retry-delay CLI arguments for better control
- Create comprehensive Slack setup guide with troubleshooting
- Update README with link to detailed setup guide
- Improve error messages and user experience
* Fix trailing whitespace in slack setup guide
Pre-commit hooks formatting fixes
* Add comprehensive Slack setup guide with success screenshot
- Create detailed setup guide with step-by-step instructions
- Add troubleshooting section for common issues like cache sync errors
- Include real terminal output example from successful integration
- Add screenshot showing VS Code interface with Slack channel data
- Remove excessive emojis for more professional documentation
- Document retry logic improvements and CLI arguments
* Fix formatting issues in Slack setup guide
- Remove trailing whitespace
- Fix end of file formatting
- Pre-commit hooks formatting fixes
* Add real RAG example showing intelligent Slack query functionality
- Add detailed example of asking 'What is LEANN about?'
- Show retrieved messages from Slack channels
- Demonstrate intelligent answer generation based on context
- Add command example for running real RAG queries
- Explain the 4-step process: retrieve, index, generate, cite
* Update Slack setup guide with bot invitation requirements
- Add important section about inviting bot to channels before RAG queries
- Explain the 'not_in_channel' errors and their meaning
- Provide clear steps for bot invitation process
- Document realistic scenario where bot needs explicit channel access
- Update documentation to be more professional and less cursor-style
* Docs: add real RAG example for Sky Lab #random
- Embed screenshot videos/rag-sky-random.png
- Add step-by-step commands and notes
- Include helper test script tests/test_channel_by_id_or_name.py
- Redact example tokens from docs
* Docs/CI: fix broken image paths and ruff lint\n\n- Move screenshot to docs/videos and update references\n- Remove obsolete rag-query-results image\n- Rename variable to satisfy ruff
* Docs: fix image path for lychee (use videos/ relative under docs/)
* Docs: finalize Slack setup guide with Sky random RAG example and image path fixes\n\n- Redact example tokens from docs
* Fix Slack MCP integration and update documentation
- Fix SlackMCPReader to use conversations_history instead of channels_list
- Add fallback imports for leann.interactive_utils and leann.settings
- Update slack-setup-guide.md with real screenshots and improved text
- Remove old screenshot files
* Add Slack integration screenshots to docs/videos
- Add slack_integration.png showing RAG query results
- Add slack_integration_2.png showing additional demo functionality
- Fixes lychee link checker errors for missing image files
* Update Slack integration screenshot with latest changes
* Remove test_channel_by_id_or_name.py
- Clean up temporary test file that was used for debugging
- Keep only the main slack_rag.py application for production use
* Update Slack RAG example to show LEANN announcement retrieval
- Change query from 'PUBPOL 290' to 'What is LEANN about?' for more challenging retrieval
- Update command to use python -m apps.slack_rag instead of test script
- Add expected response showing Yichuan Wang's LEANN announcement message
- Emphasize this demonstrates ability to find specific announcements in conversation history
- Update description to highlight challenging query capabilities
* Update Slack RAG integration with improved CSV parsing and new screenshots
- Fixed CSV message parsing in slack_mcp_reader.py to properly handle individual messages
- Updated slack_rag.py to filter empty channel strings
- Enhanced slack-setup-guide.md with two new query examples:
- Advisor Models query: 'train black-box models to adopt to your personal data'
- Barbarians at the Gate query: 'AI-driven research systems ADRS'
- Replaced old screenshots with four new ones showing both query examples
- Updated documentation to use User OAuth Token (xoxp-) instead of Bot Token (xoxb-)
- Added proper command examples with --no-concatenate-conversations and --force-rebuild flags
* Update Slack RAG documentation with Ollama integration and new screenshots
- Updated slack-setup-guide.md with comprehensive Ollama setup instructions
- Added 6 new screenshots showing complete RAG workflow:
- Command setup, search results, and LLM responses for both queries
- Removed simulated LLM references, now uses real Ollama with llama3.2:1b
- Enhanced documentation with step-by-step Ollama installation
- Updated troubleshooting checklist to include Ollama-specific checks
- Fixed command syntax and added proper Ollama configuration
- Demonstrates working Slack RAG with real AI-generated responses
* Remove Key Features section from Slack RAG examples
- Simplified documentation by removing the bullet point list
- Keeps the focus on the actual examples and screenshots
* fix: Fix Twitter bookmarks anchor link
- Convert Twitter Bookmarks from collapsible details to proper header
- Update internal link to match new anchor format
- Ensures external links to #twitter-bookmarks-your-personal-tweet-library work correctly
Fixes broken link: https://github.com/yichuan-w/LEANN?tab=readme-ov-file#twitter-bookmarks-your-personal-tweet-library
* fix: Fix Slack messages anchor link as well
- Convert Slack Messages from collapsible details to proper header
- Update internal link to match new anchor format
- Ensures external links to #slack-messages-search-your-team-conversations work correctly
Both Twitter and Slack MCP sections now have reliable anchor links.
* fix: Point Slack and Twitter links to main MCP section
- Both Slack and Twitter are subsections under MCP Integration
- Links should point to #mcp-integration-rag-on-live-data-from-any-platform
- Users will land on the MCP section and can find both Slack and Twitter subsections there
This matches the actual document structure where Slack and Twitter are under the MCP Integration section.
BREAKING CHANGE: trust_remote_code now defaults to False for security
- Set trust_remote_code=False by default in HFChat class
- Add explicit trust_remote_code parameter to HFChat.__init__()
- Add security warning when trust_remote_code=True is used
- Update get_llm() function to support trust_remote_code parameter
- Update benchmark utilities (load_hf_model, load_vllm_model, load_qwen_vl_model)
- Add comprehensive documentation for security implications
Security Benefits:
- Prevents arbitrary code execution from compromised model repositories
- Requires explicit opt-in for models that need remote code execution
- Shows clear warnings when security is reduced
- Follows security-by-default principle
Migration Guide:
- Most users: No changes needed (more secure by default)
- Users with models requiring remote code: Add trust_remote_code=True explicitly
- Config users: Add 'trust_remote_code': true to LLM config if needed
Fixes#136
* feat: Add MCP integration support for Slack and Twitter
- Implement SlackMCPReader for connecting to Slack MCP servers
- Implement TwitterMCPReader for connecting to Twitter MCP servers
- Add SlackRAG and TwitterRAG applications with full CLI support
- Support live data fetching via Model Context Protocol (MCP)
- Add comprehensive documentation and usage examples
- Include connection testing capabilities with --test-connection flag
- Add standalone tests for core functionality
- Update README with detailed MCP integration guide
- Add Aakash Suresh to Active Contributors
Resolves#36
* fix: Resolve linting issues in MCP integration
- Replace deprecated typing.Dict/List with built-in dict/list
- Fix boolean comparisons (== True/False) to direct checks
- Remove unused variables in demo script
- Update type annotations to use modern Python syntax
All pre-commit hooks should now pass.
* fix: Apply final formatting fixes for pre-commit hooks
- Remove unused imports (asyncio, pathlib.Path)
- Remove unused class imports in demo script
- Ensure all files pass ruff format and pre-commit checks
This should resolve all remaining CI linting issues.
* fix: Apply pre-commit formatting changes
- Fix trailing whitespace in all files
- Apply ruff formatting to match project standards
- Ensure consistent code style across all MCP integration files
This commit applies the exact changes that pre-commit hooks expect.
* fix: Apply pre-commit hooks formatting fixes
- Remove trailing whitespace from all files
- Fix ruff formatting issues (2 errors resolved)
- Apply consistent code formatting across 3 files
- Ensure all files pass pre-commit validation
This resolves all CI formatting failures.
* fix: Update MCP RAG classes to match BaseRAGExample signature
- Fix SlackMCPRAG and TwitterMCPRAG __init__ methods to provide required parameters
- Add name, description, and default_index_name to super().__init__ calls
- Resolves test failures: test_slack_rag_initialization and test_twitter_rag_initialization
This fixes the TypeError caused by BaseRAGExample requiring additional parameters.
* style: Apply ruff formatting - add trailing commas
- Add trailing commas to super().__init__ calls in SlackMCPRAG and TwitterMCPRAG
- Fixes ruff format pre-commit hook requirements
* fix: Resolve SentenceTransformer model_kwargs parameter conflict
- Fix local_files_only parameter conflict in embedding_compute.py
- Create separate copies of model_kwargs and tokenizer_kwargs for local vs network loading
- Prevents parameter conflicts when falling back from local to network loading
- Resolves TypeError in test_readme_examples.py tests
This addresses the SentenceTransformer initialization issues in CI tests.
* fix: Add comprehensive SentenceTransformer version compatibility
- Handle both old and new sentence-transformers versions
- Gracefully fallback from advanced parameters to basic initialization
- Catch TypeError for model_kwargs/tokenizer_kwargs and use basic SentenceTransformer init
- Ensures compatibility across different CI environments and local setups
- Maintains optimization benefits where supported while ensuring broad compatibility
This resolves test failures in CI environments with older sentence-transformers versions.
* style: Apply ruff formatting to embedding_compute.py
- Break long logger.warning lines for better readability
- Fixes pre-commit hook formatting requirements
* docs: Comprehensive documentation improvements for better user experience
- Add clear step-by-step Getting Started Guide for new users
- Add comprehensive CLI Reference with all commands and options
- Improve installation instructions with clear steps and verification
- Add detailed troubleshooting section for common issues (Ollama, OpenAI, etc.)
- Clarify difference between CLI commands and specialized apps
- Add environment variables documentation
- Improve MCP integration documentation with CLI integration examples
- Address user feedback about confusing installation and setup process
This resolves documentation gaps that made LEANN difficult for non-specialists to use.
* style: Remove trailing whitespace from README.md
- Fix trailing whitespace issues found by pre-commit hooks
- Ensures consistent formatting across documentation
* docs: Simplify README by removing excessive documentation
- Remove overly complex CLI reference and getting started sections (lines 61-334)
- Remove emojis from section headers for cleaner appearance
- Keep README simple and focused as requested
- Maintain essential MCP integration documentation
This addresses feedback to keep documentation minimal and avoid auto-generated content.
* docs: Address maintainer feedback on README improvements
- Restore emojis in section headers (Prerequisites and Quick Install)
- Add MCP live data feature mention in line 23 with links to Slack and Twitter
- Add detailed API credential setup instructions for Slack:
- Step-by-step Slack App creation process
- Required OAuth scopes and permissions
- Clear token identification (xoxb- vs xapp-)
- Add detailed API credential setup instructions for Twitter:
- Twitter Developer Account application process
- API v2 requirements for bookmarks access
- Required permissions and scopes
This addresses maintainer feedback to make API setup more user-friendly.
* Add readline support to interactive command line interfaces
- Implement readline history, navigation, and editing for CLI, API, and RAG chat modes
- Create shared InteractiveSession class to consolidate readline functionality
- Add command history persistence across sessions with separate files per context
- Support built-in commands: help, clear, history, quit/exit
- Enable arrow key navigation and command editing in all interactive modes
* Improvements based on feedback
* system wide semantic file search with temporal awareness
* ruff checking passed
* graceful exit for empty dump
* error thrown for time only search
* fixes
* feat: finance bench
* docs: results
* chore: ignroe data README
* feat: fix financebench
* feat: laion, also required idmaps support
* style: format
* style: format
* fix: resolve ruff linting errors
- Remove unused variables in benchmark scripts
- Rename unused loop variables to follow convention
* feat: enron email bench
* experiments for running DiskANN & BM25 on Arch 4090
* style: format
* chore(ci): remove paru-bin submodule and config to fix checkout --recurse-submodules
* docs: data
* docs: data updated
* fix: as package
* fix(ci): only run pre-commit
* chore: use http url of astchunk; use group for some dev deps
* fix(ci): should checkout modules as well since `uv sync` checks
* fix(ci): run with lint only
* fix: find links to install wheels available
* CI: force local wheels in uv install step
* CI: install local wheels via file paths
* CI: pick wheels matching current Python tag
* CI: handle python tag mismatches for local wheels
* CI: use matrix python venv and set macOS deployment target
* CI: revert install step to match main
* CI: use uv group install with local wheel selection
* CI: rely on setup-uv for Python and tighten group install
* CI: install build deps with uv python interpreter
* CI: use temporary uv venv for build deps
* CI: add build venv scripts path for wheel repair
* feat: Add GitHub PR and issue templates for better contributor experience
* simplify: Make templates more concise and user-friendly
* fix: enable is_compact=False, is_recompute=True
* feat: update when recompute
* test
* fix: real recompute
* refactor
* fix: compare with no-recompute
* fix: test
* feat: Add ARM64 Linux wheel support for leann-backend-hnsw
* fix: Use OpenBLAS for ARM64 Linux builds instead of Intel MKL
* fix: Configure Faiss with SVE optimization for ARM64 builds
- Set FAISS_OPT_LEVEL to "sve" for ARM64 architecture
- Disable x86-specific SIMD instructions (AVX2, AVX512, SSE4.1)
- Use ARM64-native SVE optimization as per Faiss conda build scripts
- Add architecture detection and proper configuration messages
Fixes compilation error: "xmmintrin.h: No such file or directory"
on ubuntu-24.04-arm runners.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Apply ARM64 compatibility fix directly to Faiss submodule
- Modify faiss/impl/pq.cpp to use x86-specific preprocessor conditions
- Remove patch file approach in favor of direct submodule modification
- Update CMakeLists.txt to reflect the submodule changes
- Fixes ARM64 Linux compilation by preventing x86 SIMD header inclusion
This resolves the "xmmintrin.h: No such file or directory" error
when building ARM64 Linux wheels for Docker compatibility.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* chore: Update Faiss submodule to include ARM64 compatibility fix
- Points to commit ed96ff7d with x86-specific preprocessor conditions
- Enables successful ARM64 Linux wheel builds
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* retrigger ci
* fix: Use different optimization levels for ARM64 based on platform
- Use SVE optimization only for ARM64 Linux
- Use generic optimization for ARM64 macOS to avoid clang SVE issues
- Fixes macOS ARM64 compilation errors with SVE instructions
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* feat: Update DiskANN submodule with OpenBLAS fallback support
- Points to commit 5c396c4 with ARM64 Linux OpenBLAS support
- Enables DiskANN to build on ARM64 Linux using standard BLAS libraries
- Resolves Intel MKL dependency issues for Docker ARM64 deployments
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Update DiskANN submodule with ZeroMQ polling configuration
- Points to commit 3a1016e with explicit polling method setup
- Resolves ZeroMQ autodetection issues on ARM64 Linux
- Ensures stable cross-platform ZeroMQ builds
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* retrigger ci
* fix: Update DiskANN submodule with ARM64 compiler flags fix
- Points to commit a0dc600 with architecture-specific compiler flags
- Removes x86 SIMD flags on ARM64 Linux to fix compilation errors
- Enables successful ARM64 Linux wheel builds
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Update DiskANN submodule with ARM64 compiler flags fix
- Points to commit 0921664 with architecture-specific compiler flags
- Removes x86 SIMD flags on ARM64 Linux to fix compilation errors
- Enables successful ARM64 Linux wheel builds
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* retrigger ci
* fix: Update DiskANN submodule with cross-platform prefetch support
- Points to commit 39192d6 with unified prefetch macros
- Replaces all Intel-specific _mm_prefetch calls with cross-platform macros
- Enables ARM64 Linux compatibility while maintaining x86 performance
- Resolves all remaining compilation errors for ARM64 builds
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Update DiskANN submodule with corrected ARM64 compatibility fixes
- Points to commit 3cb87a8 with proper x86 platform detection
- Includes ARM64 fallback for AVXDistanceInnerProductFloat function
- Resolves all remaining '__m256 was not declared' compilation errors
- Enables successful ARM64 Linux wheel builds for Docker compatibility
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Update DiskANN submodule with template type handling fix
- Points to commit d396bc3 with corrected template type handling
- Fixes DistanceInnerProduct template instantiation for int8_t/uint8_t types
- Resolves 'cannot convert const signed char* to const float*' error
- Completes ARM64 Linux compilation compatibility
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Update DiskANN submodule with DistanceFastL2::norm template fix
- Points to commit 69d9a99 with corrected template type handling
- Fixes DistanceFastL2::norm template instantiation for int8_t/uint8_t types
- Resolves another 'cannot convert const signed char* to const float*' error
- Continues ARM64 Linux compilation compatibility improvements
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Update DiskANN submodule with LAPACKE header detection
- Points to commit 64a9e01 with LAPACKE header path configuration
- Adds pkg-config based detection for LAPACKE include directories
- Resolves 'lapacke.h: No such file or directory' compilation error
- Completes OpenBLAS integration for ARM64 Linux builds
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Update DiskANN submodule with enhanced LAPACKE header detection
- Points to commit 18d0721 with fallback LAPACKE header search paths
- Checks multiple standard locations for lapacke.h on various systems
- Improves ARM64 Linux compatibility for OpenBLAS builds
- Should resolve 'lapacke.h: No such file or directory' errors
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Add liblapacke-dev package for ARM64 Linux builds
- Add liblapacke-dev to ARM64 dependencies alongside libopenblas-dev
- Provides lapacke.h header file needed for LAPACK C interface
- Fixes 'lapacke.h: No such file or directory' compilation error
- Enables complete OpenBLAS + LAPACKE support for ARM64 wheel builds
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Update DiskANN submodule with cosine_similarity.h x86 intrinsics fix
- Points to commit dbb17eb with corrected conditional compilation
- Fixes immintrin.h inclusion for ARM64 compatibility in cosine_similarity.h
- Resolves 'immintrin.h: No such file or directory' error
- Continues systematic ARM64 Linux compilation fixes
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix: Update DiskANN submodule with LAPACKE library linking fix
- Points to commit 19f9603 with explicit LAPACKE library discovery and linking
- Resolves 'undefined symbol: LAPACKE_sgesdd' runtime error on ARM64 Linux
- Completes ARM64 Linux wheel build compatibility for Docker deployments
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Metadata filtering initial version
* Metadata filtering initial version
* Fixes linter issues
* Cleanup code
* Clean up and readme
* Fix after review
* Use UV in example
* Merge main into feature/metadata-filtering
* feat(core): Add AST-aware code chunking with astchunk integration
This PR introduces intelligent code chunking that preserves semantic boundaries
(functions, classes, methods) for better code understanding in RAG applications.
Key Features:
- AST-aware chunking for Python, Java, C#, TypeScript files
- Graceful fallback to traditional chunking for unsupported languages
- New specialized code RAG application for repositories
- Enhanced CLI with --use-ast-chunking flag
- Comprehensive test suite with integration tests
Technical Implementation:
- New chunking_utils.py module with enhanced chunking logic
- Extended base RAG framework with AST chunking arguments
- Updated document RAG with --enable-code-chunking flag
- CLI integration with proper error handling and fallback
Benefits:
- Better semantic understanding of code structure
- Improved search quality for code-related queries
- Maintains backward compatibility with existing workflows
- Supports mixed content (code + documentation) seamlessly
Dependencies:
- Added astchunk and tree-sitter parsers to pyproject.toml
- All dependencies are optional - fallback works without them
Testing:
- Comprehensive test suite in test_astchunk_integration.py
- Integration tests with document RAG
- Error handling and edge case coverage
Documentation:
- Updated README.md with AST chunking highlights
- Added ASTCHUNK_INTEGRATION.md with complete guide
- Updated features.md with new capabilities
* Refactored chunk utils
* Remove useless import
* Update README.md
* Update apps/chunking/utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update apps/code_rag.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Fix issue
* apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Fixes after pr review
* Fix tests not passing
* Fix linter error for documentation files
* Update .gitignore with unwanted files
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Andy Lee <andylizf@outlook.com>