- Add ty (Astral's fast Python type checker) to GitHub CI workflow
- Fix type annotations across all RAG apps:
- Update load_data return types from list[str] to list[dict[str, Any]]
- Fix base_rag_example.py to properly handle dict format from create_text_chunks
- Fix type errors in leann-core:
- chunking_utils.py: Add explicit type annotations
- cli.py: Fix return type annotations for PDF extraction functions
- interactive_utils.py: Fix readline import type handling
- Fix type errors in apps:
- wechat_history.py: Fix return type annotations
- document_rag.py, code_rag.py: Replace **kwargs with explicit arguments
- Add ty configuration to pyproject.toml
This resolves the bug introduced in PR #157 where create_text_chunks()
changed to return list[dict] but callers were not updated.
* feat(core): Add AST-aware code chunking with astchunk integration
This PR introduces intelligent code chunking that preserves semantic boundaries
(functions, classes, methods) for better code understanding in RAG applications.
Key Features:
- AST-aware chunking for Python, Java, C#, TypeScript files
- Graceful fallback to traditional chunking for unsupported languages
- New specialized code RAG application for repositories
- Enhanced CLI with --use-ast-chunking flag
- Comprehensive test suite with integration tests
Technical Implementation:
- New chunking_utils.py module with enhanced chunking logic
- Extended base RAG framework with AST chunking arguments
- Updated document RAG with --enable-code-chunking flag
- CLI integration with proper error handling and fallback
Benefits:
- Better semantic understanding of code structure
- Improved search quality for code-related queries
- Maintains backward compatibility with existing workflows
- Supports mixed content (code + documentation) seamlessly
Dependencies:
- Added astchunk and tree-sitter parsers to pyproject.toml
- All dependencies are optional - fallback works without them
Testing:
- Comprehensive test suite in test_astchunk_integration.py
- Integration tests with document RAG
- Error handling and edge case coverage
Documentation:
- Updated README.md with AST chunking highlights
- Added ASTCHUNK_INTEGRATION.md with complete guide
- Updated features.md with new capabilities
* Refactored chunk utils
* Remove useless import
* Update README.md
* Update apps/chunking/utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update apps/code_rag.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Fix issue
* apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Fixes after pr review
* Fix tests not passing
* Fix linter error for documentation files
* Update .gitignore with unwanted files
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Andy Lee <andylizf@outlook.com>