LEANN

Files

History

Aakash Suresh e268392d5b Fix: Prevent duplicate PDF processing when using --file-types .pdf (#179 )

Fixes #175

Problem:
When --file-types .pdf is specified, PDFs were being processed twice:
1. Separately with PyMuPDF/pdfplumber extractors
2. Again in the 'other file types' section via SimpleDirectoryReader

This caused duplicate processing and potential conflicts.

Solution:
- Exclude .pdf from other_file_extensions when PDFs are already
  processed separately
- Only load other file types if there are extensions to process
- Prevents duplicate PDF processing

Changes:
- Added logic to filter out .pdf from code_extensions when loading
  other file types if PDFs were processed separately
- Updated SimpleDirectoryReader to use filtered extensions
- Added check to skip loading if no other extensions to process

2025-12-01 13:48:44 -08:00

astchunk-leann @ ad9afa07b9

update submodule

2025-09-19 17:03:55 -07:00

leann

chore: release v0.3.5

2025-11-12 06:01:25 +00:00

leann-backend-diskann

chore: release v0.3.5

2025-11-12 06:01:25 +00:00

leann-backend-hnsw

chore: release v0.3.5