docs: update README to use proper module imports for apps

- Change from 'python apps/xxx.py' to 'python -m apps.xxx' - More professional and pythonic module calling - Ensures proper module resolution and imports - Better separation between apps/ (production tools) and examples/ (demos)
merge
2025-08-03 23:05:48 -07:00 · 2025-08-03 23:02:45 -07:00 · 2025-08-03 23:02:12 -07:00 · 2025-08-03 23:02:06 -07:00 · 2025-08-03 22:42:16 -07:00 · 2025-08-03 22:41:20 -07:00
19 changed files with 3605 additions and 3850 deletions
@@ -38,7 +38,7 @@ data/*
 !data/2501.14312v1 (1).pdf
 !data/2506.08276v1.pdf
 !data/PrideandPrejudice.txt
-!data/huawei_pangu.md
+!data/README.md
 !data/ground_truth/
 !data/indices/
 !data/queries/
@@ -170,8 +170,6 @@ ollama pull llama3.2:1b

 LEANN provides flexible parameters for embedding models, search strategies, and data processing to fit your specific needs.

-📚 **Need configuration best practices?** Check our [Configuration Guide](docs/configuration-guide.md) for detailed optimization tips, model selection advice, and solutions to common issues like slow embeddings or poor search quality.
-
 <details>
 <summary><strong>📋 Click to expand: Common Parameters (Available in All Examples)</strong></summary>

@@ -516,7 +514,7 @@ Options:
 - **Dynamic batching:** Efficiently batch embedding computations for GPU utilization
 - **Two-level search:** Smart graph traversal that prioritizes promising nodes

-**Backends:** HNSW (default) for most use cases, with optional DiskANN support for billion-scale datasets.
+**Backends:** DiskANN or HNSW - pick what works for your data size.

 ## Benchmarks

@@ -536,7 +534,8 @@ Options:

 ```bash
 uv pip install -e ".[dev]"  # Install dev dependencies
-python benchmarks/run_evaluation.py    # Will auto-download evaluation data and run benchmarks
+python benchmarks/run_evaluation.py data/indices/dpr/dpr_diskann      # DPR dataset
+python benchmarks/run_evaluation.py data/indices/rpj_wiki/rpj_wiki.index  # Wikipedia
 ```

 The evaluation script downloads data automatically on first run. The last three results were tested with partial personal data, and you can reproduce them with your own data!
@@ -99,9 +99,7 @@ if __name__ == "__main__":
    print("- 'What are the main techniques LEANN uses?'")
    print("- 'What is the technique DLPM?'")
    print("- 'Who does Elizabeth Bennet marry?'")
-    print(
-        "- 'What is the problem of developing pan gu model Huawei meets? (盘古大模型开发中遇到什么问题?)'"
-    )
+    print("- 'What is the problem of developing pan gu model? (盘古大模型开发中遇到什么问题?)'")
    print("\nOr run without --query for interactive mode\n")

    rag = DocumentRAG()
@@ -0,0 +1,82 @@
+# 盘古之殇：华为诺亚盘古大模型研发历程的心酸与黑暗
+
+各位好，
+
+我是一名盘古大模型团队，华为诺亚方舟实验室的员工。
+
+首先为自证身份，列举一些细节：
+
+1. 现诺亚主任，前算法应用部部长，后改名为小模型实验室的主任王云鹤。前诺亚主任：姚骏（大家称姚老师）。几个实验室主任：唐睿明（明哥，明队，已离职），尚利峰，张维（维哥），郝建业（郝老师），刘武龙（称呼为武龙所）等。其他骨干成员和专家陆续有很多人离职。
+2. 我们隶属于“四野”这个组织。四野下属有许多纵队，基础语言大模型是四纵。王云鹤的小模型是十六纵队。我们参加过苏州的集结，有各种月份的时间节点。在苏州攻关会颁发任务令，需要在节点前达成目标。苏州集结会把各地的人员都集中在苏州研究所，平常住宾馆，比如在甪直的酒店，与家人孩子天各一方。
+3. 在苏州集结的时候周六默认上班，非常辛苦，不过周六有下午茶，有一次还有小龙虾。在苏州研究所的工位搬迁过一次，从一栋楼换到了另一栋。苏州研究所楼栋都是欧式装修，门口有大坡，里面景色很不错。去苏州集结一般至少要去一周，甚至更久，多的人甚至一两个月都回不了家。
+4. 诺亚曾经传说是研究型的，但是来了之后因为在四野做大模型项目，项目成员完全变成了交付型的，且充满了例会，评审，汇报。很多时候做实验都要申请。团队需要对接终端小艺，华为云，ICT等诸多业务线，交付压力不小。
+5. 诺亚研发的盘古模型早期内部代号叫做“盘古智子”，一开始只有内部需要申请试用的网页版，到后续迫于压力在welink上接入和公测开放。
+
+这些天发生关于质疑盘古大模型抄袭千问的事情闹的沸沸扬扬。作为一个盘古团队的成员，我最近夜夜辗转反侧，难以入眠。盘古的品牌受到如此大的影响，一方面，我自私的为我的职业发展担忧，也为自己过去的努力工作感到不值。另一方面，由于有人开始揭露这些事情我内心又感到大快人心。在多少个日日夜夜，我们对内部某些人一次次靠着造假而又获得了无数利益的行为咬牙切齿而又无能为力。这种压抑和羞辱也逐渐消磨了我对华为的感情，让我在这里的时日逐渐浑浑噩噩，迷茫无措，时常怀疑自己的人生和自我价值。
+
+我承认我是一个懦弱的人，作为一个小小的打工人，我不仅不敢和王云鹤等内部手眼通天的人做对，更不敢和华为这样的庞然大物做对。我很怕失去我的工作，毕竟我也有家人和孩子，所以我打心眼里很佩服揭露者。但是，看到内部还在试图洗地掩盖事实，蒙蔽公众的时候，我实在不能容忍了。我也希望勇敢一次，顺从自己本心。就算自损八百，我也希望能伤敌一千。我决定把我在这里的所见所闻（部分来自于同事口述）公布出来，关于盘古大模型的“传奇故事”：
+
+华为确实主要在昇腾卡上训练大模型（小模型实验室有不少英伟达的卡，他们之前也会用来训练，后面转移到昇腾）。曾经我被华为“打造世界第二选择”的决心而折服，我本身也曾经对华为有深厚的感情。我们陪着昇腾一步步摸爬滚打，从充满bug到现在能训出模型，付出了巨大的心血和代价。
+
+最初我们的算力非常有限，在910A上训练模型。那会只支持fp16，训练的稳定性远不如bf16。盘古的moe开始很早，23年就主要是训练38Bmoe模型和后续的71B dense模型。71B的dense模型通过扩增变成了第一代的135Bdense模型，后面主力模型也逐渐在910B上训练。
+
+71B和135B模型都有一个巨大的硬伤就是tokenizer。当时使用的tokenizer编码效率极低，每个单个的符号，数字，空格，乃至汉字都会占用一个token。可想而知这会非常浪费算力，且使得模型的效果很差。这时候小模型实验室正好有个自己训的词表。姚老师当时怀疑是不是模型的tokenizer不好（虽然事后来看，他的怀疑是无疑正确的），于是就决定，让71B和135B换tokenizer，因为小模型实验室曾经尝试过。团队缝合了两个tokenizer，开始了tokenizer的更换。71B模型的更换失败了，而135B因为采用了更精细的embedding初始化策略，续训了至少1T的数据后词表总算更换成功，但可想而知，效果并不会变好。
+
+于此同期，阿里和智谱等国内其他公司在GPU上训练，且已经摸索出了正确的方法，盘古和竞品的差距越来越大。内部一个230B从头训练的dense模型又因为各种原因训练失败，导致项目的状况几乎陷入绝境。面临几个节点的压力以及内部对盘古的强烈质疑时，团队的士气低迷到了极点。团队在算力极其有限的时候，做出了很多努力和挣扎。比如，团队偶然发现当时的38B moe并没有预期moe的效果。于是去掉了moe参数，还原为了13B的dense模型。由于38B的moe源自很早的pangu alpha 13B，架构相对落后，团队进行了一系列的操作，比如切换绝对位置编码到rope，去掉bias，切换为rmsnorm。同时鉴于tokenizer的一些失败和换词表的经验，这个模型的词表也更换为了王云鹤的小模型实验室7B模型所使用的词表。后面这个13B模型进行了扩增续训，变成了第二代38B dense模型（在几个月内这个模型都是主要的盘古中档位模型），曾经具有一定的竞争力。但是，由于更大的135B模型架构落后，且更换词表模型损伤巨大（后续分析发现当时更换的缝合词表有更严重的bug），续训后也与千问等当时国内领先模型存在很大差距。这时由于内部的质疑声和领导的压力也越来越大。团队的状态几乎陷入了绝境。
+
+在这种情况下，王云鹤和他的小模型实验室出手了。他们声称是从旧的135B参数继承改造而来，通过训练短短的几百B数据，各项指标平均提升了十个点左右。实际上，这就是他们套壳应用到大模型的第一次杰作。华为的外行领导内行，使得领导完全对于这种扯淡的事情没有概念，他们只会觉得肯定是有什么算法创新。经过内部的分析，他们实际上是使用Qwen 1.5 110B续训而来，通过加层，扩增ffn维度，添加盘古pi论文的一些机制得来，凑够了大概135B的参数。实际上，旧的135B有107层，而这个模型只有82层，各种配置也都不一样。新的来路不明的135B训练完很多参数的分布也和Qwen 110B几乎一模一样。连模型代码的类名当时都是Qwen，甚至懒得改名。后续这个模型就是所谓的135B V2。而这个模型当时也提供给了很多下游，甚至包括外部客户。
+
+这件事对于我们这些认真诚实做事的同事们带来了巨大的冲击，内部很多人其实都知道这件事，甚至包括终端和华为云。我们都戏称以后别叫盘古模型了，叫千古吧。当时团队成员就想向bcg举报了，毕竟这已经是重大的业务造假了。但是后面据说被领导拦了下来，因为更高级别的领导（比如姚老师，以及可能熊总和查老）其实后面也知道了，但是并不管，因为通过套壳拿出好的结果，对他们也是有利的。这件事使得当时团队几位最强的同事开始心灰意冷，离职跑路也逐渐成为挂在嘴边的事。
+
+此时，盘古似乎迎来了转机。由于前面所述的这些盘古模型基本都是续训和改造而来，当时诺亚完全没有掌握从头训练的技术，何况还是在昇腾的NPU上进行训练。在当时团队的核心成员的极力争取下，盘古开始了第三代模型的训练，付出了巨大的努力后，在数据架构和训练算法方面都与业界逐渐接轨，而这其中的艰辛和小模型实验室的人一点关系都没有。
+
+一开始团队成员毫无信心，只从一个13B的模型开始训练，但是后面发现效果还不错，于是这个模型后续再次进行了一次参数扩增，变成了第三代的38B，代号38B V3。想必很多产品线的兄弟都对这个模型很熟悉。当时这个模型的tokenizer是基于llama的词表进行扩展的（也是业界常见的做法）。而当时王云鹤的实验室做出来了另一个词表（也就是后续pangu系列的词表）。当时两个词表还被迫进行了一次赛马，最终没有明显的好坏结论。于是，领导当即决定，应该统一词表，使用王云鹤他们的。于是，在后续从头训练的135B V3（也就是对外的Pangu Ultra），便是采用了这个tokenizer。这也解释了很多使用我们模型的兄弟的疑惑，为什么当时同为V3代的两个不同档位的模型，会使用不同的tokenizer。
+
+
+我们打心眼里觉得，135B V3是我们四纵团队当时的骄傲。这是第一个真正意义上的，华为全栈自研，正经从头训练的千亿级别的模型，且效果与24年同期竞品可比的。写到这里我已经热泪盈眶，太不容易了。当时为了稳定训练，团队做了大量实验对比，并且多次在模型梯度出现异常的时候进行及时回退重启。这个模型真正做到了后面技术报告所说的训练全程没有一个loss spike。我们克服了不知道多少困难，我们做到了，我们愿用生命和荣誉保证这个模型训练的真实性。多少个凌晨，我们为了它的训练而不眠。在被内部心声骂的一文不值的时候，我们有多么不甘，有多少的委屈，我们挺住了。
+
+我们这帮人是真的在为打磨国产算力底座燃烧自己的青春啊……客居他乡，我们放弃了家庭，放弃了假期，放弃了健康，放弃了娱乐，抛头颅洒热血，其中的艰辛与困苦，寥寥数笔不足以概括其万一。在各种动员大会上，当时口号中喊出的盘古必胜，华为必胜，我们心里是真的深深被感动。
+
+然而，我们的所有辛苦的成果，经常被小模型实验室轻飘飘的拿走了。数据，直接要走。代码，直接要走，还要求我们配合适配到能一键运行。我们当时戏称小模型实验室为点鼠标实验室。我们付出辛苦，他们取得荣耀。果然应了那句话，你在负重前行是因为有人替你岁月静好。在这种情况下，越来越多的战友再也坚持不下去了，选择了离开。看到身边那些优秀的同事一个个离职，我的内心又感叹又难过。在这种作战一样的环境下，我们比起同事来说更像是战友。他们在技术上也有无数值得我学习的地方，堪称良师。看到他们去了诸如字节Seed，Deepseek，月之暗面，腾讯和快手等等很多出色的团队，我打心眼里为他们高兴和祝福，脱离了这个辛苦却肮脏的地方。我至今还对一位离职同事的话记忆犹新，ta说：“来这里是我技术生涯中的耻辱，在这里再呆每一天都是浪费生命”。话虽难听却让我无言以对。我担心我自己技术方面的积累不足，以及没法适应互联网公司高淘汰的环境，让我多次想离职的心始终没有迈出这一步。
+
+盘古除了dense模型，后续也启动了moe的探索。一开始训练的是一个224B的moe模型。而与之平行的，小模型实验室也开启了第二次主要的套壳行动（次要的插曲可能还包括一些别的模型，比如math模型），即这次流传甚广的pangu pro moe 72B。这个模型内部自称是从小模型实验室的7B扩增上来的（就算如此，这也与技术报告不符，何况是套壳qwen 2.5的14b续训）。还记得他们训了没几天，内部的评测就立刻追上了当时的38B V3。AI系统实验室很多兄弟因为需要适配模型，都知道他们的套壳行动，只是迫于各种原因，无法伸张正义。实际上，对于后续训了很久很久的这个模型，Honestagi能够分析出这个量级的相似性我已经很诧异了，因为这个模型为了续训洗参数，所付出的算力甚至早就足够从头训一个同档位的模型了。听同事说他们为了洗掉千问的水印，采取了不少办法，甚至包括故意训了脏数据。这也为学术界研究模型血缘提供了一个前所未有的特殊模范吧。以后新的血缘方法提出可以拿出来溜溜。
+
+24年底和25年初，在Deepseek v3和r1发布之后，由于其惊艳的技术水平，团队受到了巨大的冲击，也受到了更大的质疑。于是为了紧跟潮流，盘古模仿Deepseek的模型尺寸，开启了718B moe的训练。这个时候，小模型实验室再次出手了。他们选择了套壳Deepseekv3续训。他们通过冻住Deepseek加载的参数，进行训练。连任务加载ckpt的目录都是deepseekv3，改都不改，何其嚣张？与之相反，一些有真正技术信仰的同事，在从头训练另一个718B的moe。但其中出现了各种各样的问题。但是很显然，这个模型怎么可能比直接套壳的好呢？如果不是团队leader坚持，早就被叫停了。
+
+华为的流程管理之繁重，严重拖累了大模型的研发节奏，例如版本管理，模型血缘，各种流程化，各种可追溯。讽刺的是，小模型实验室的模型似乎从来不受这些流程的约束，想套壳就套壳，想续训就续训，算力源源不断的伸手拿走。这种强烈到近乎魔幻的对比，说明了当前流程管理的情况：只许州官放火，不许百姓点灯。何其可笑？何其可悲？何其可恶？何其可耻！
+
+HonestAGI的事情出来后，内部让大家不停的研讨分析，如何公关和“回应”。诚然，这个原文的分析也许不够有力，给了王云鹤与小模型实验室他们狡辩和颠倒黑白的机会。为此，这两天我内心感到作呕，时时怀疑自己的人生意义以及苍天无眼。我不奉陪了，我要离职了，同时我也在申请从盘古部分技术报告的作者名单中移除。曾经在这些技术报告上署名是我一生都无法抹除的污点。当时我没想到，他们竟然猖狂到敢开源。我没想到，他们敢如此愚弄世人，大肆宣发。当时，我也许是存了侥幸心理，没有拒绝署名。我相信很多扎实做事的战友，也只是被迫上了贼船，或者不知情。但这件事已经无法挽回，我希望我的余生能够坚持扎实做真正有意义的事，为我当时的软弱和不坚定赎罪。
+
+深夜写到这里，我已经泪流满面，泣不成声。还记得一些出色的同事离职时，我苦笑问他们要不要发个长长的心声惯例帖，揭露一下现状。对方说：不了，浪费时间，而且我也怕揭露出来你们过的更糟。我当时一下黯然神伤，因为曾经共同为了理想奋斗过的战友已经彻底对华为彻底灰心了。当时大家调侃，我们用着当年共产党的小米加步枪，组织却有着堪比当年国民党的作风。
+
+曾几何时，我为我们用着小米加步枪打败洋枪洋炮而自豪。
+
+现在，我累了，我想投降。
+
+其实时至今日，我还是真心希望华为能认真吸取教训，能做好盘古，把盘古做到世界一流，把昇腾变成英伟达的水平。内部的劣币驱逐良币，使得诺亚乃至华为在短时间内急剧流失了大量出色的大模型人才。相信他们也正在如Deepseek等各个团队闪耀着，施展着他们的抱负才华，为中美在AI的激烈竞赛中奉献力量。我时常感叹，华为不是没有人才，而是根本不知道怎么留住人才。如果给这些人合适的环境，合适的资源，更少的枷锁，更少的政治斗争，盘古何愁不成？
+
+最后：我以生命，人格和荣誉发誓，我写的以上所有内容均为真实（至少在我有限的认知范围内）。我没有那么高的技术水平以及机会去做详尽扎实的分析，也不敢直接用内部记录举证，怕因为信息安全抓到。但是我相信我很多曾经的战友，会为我作证。在华为内部的兄弟，包括我们曾经服务过的产品线兄弟们，相信本文的无数细节能和你们的印象对照，印证我的说法。你们可能也曾经被蒙骗，但这些残酷的真相不会被尘封。我们奋战过的痕迹，也不应该被扭曲和埋葬。
+
+写了这么多，某些人肯定想把我找出来，抹杀掉。公司搞不好也想让我噤声乃至追责。如果真的这样，我，乃至我的家人的人身乃至生命安全可能都会受到威胁。为了自我保护，我近期每天会跟大家报平安。
+
+如果我消失了，就当是我为了真理和理想，为了华为乃至中国能够更好地发展算力和AI而牺牲了吧，我愿埋葬于那片曾经奋斗过的地方。
+
+诺亚，再见
+
+2025年7月6日凌晨      写于深圳
+
+---
+
+各位好，
+
+感谢大家的关心与祝福。我目前暂时安全，但公司应该在进行排查与某些名单收集，后续情况未知。
+
+我补充一些细节，以免某些人继续颠倒黑白。
+
+关于135B V2，小模型实验室在迅速地完成套壳并拿完所有套壳带来的好处后（比如任务令表彰和及时激励），因为不想继续支撑下游应用和模型迭代，又把这个烫手山芋甩给了四纵。确实技高一筹，直接把四纵的兄弟们拉下水。同事提供过去一个老旧的模型，最终拿回了一个当时一个魔改的先进的千问。做大模型的人，自己做的模型就像自己孩子一样熟悉，不要把别人都当傻子。就像自家儿子出门一趟，回来个别人家孩子。
+
+盘古report的署名是不符合学术规范的。例如，135B V3有不少有技术贡献的人，因为作者名额数量限制，劳动成果没有得到应有的回报，团队内曾经有不小的意见。这个模型当时是大家智慧和汗水的结晶，甚至是团队当时的精神支柱，支撑着不少兄弟们继续留在诺亚。所谓的名额限制，以及挂名了一些毫无技术贡献的人（如一些小模型实验室的人），让兄弟们何其心寒。
+
+---
+
+暂时平安。另外，支持我勇于说出真相的战友们 https://github.com/HW-whistleblower/True-Story-of-Pangu/issues/317
@@ -1,236 +0,0 @@
-# LEANN Configuration Guide
-
-This guide helps you optimize LEANN for different use cases and understand the trade-offs between various configuration options.
-
-## Getting Started: Simple is Better
-
-When first trying LEANN, start with a small dataset to quickly validate your approach:
-
-**For document RAG**: The default `data/` directory works perfectly - includes 2 AI research papers, Pride and Prejudice literature, and a technical report
-```bash
-python -m apps.document_rag --query "What techniques does LEANN use?"
-```
-
-**For other data sources**: Limit the dataset size for quick testing
-```bash
-# WeChat: Test with recent messages only
-python -m apps.wechat_rag --max-items 100 --query "What did we discuss about the project timeline?"
-
-# Browser history: Last few days
-python -m apps.browser_rag --max-items 500 --query "Find documentation about vector databases"
-
-# Email: Recent inbox
-python -m apps.email_rag --max-items 200 --query "Who sent updates about the deployment status?"
-```
-
-Once validated, scale up gradually:
- 100 documents → 1,000 → 10,000 → full dataset (`--max-items -1`)
- This helps identify issues early before committing to long processing times
-
-## Embedding Model Selection: Understanding the Trade-offs
-
-Based on our experience developing LEANN, embedding models fall into three categories:
-
-### Small Models (< 100M parameters)
-**Example**: `sentence-transformers/all-MiniLM-L6-v2` (22M params)
- **Pros**: Lightweight, fast for both indexing and inference
- **Cons**: Lower semantic understanding, may miss nuanced relationships
- **Use when**: Speed is critical, handling simple queries, interactive mode, or just experimenting with LEANN. If time is not a constraint, consider using a larger/better embedding model
-
-### Medium Models (100M-500M parameters)
-**Example**: `facebook/contriever` (110M params), `BAAI/bge-base-en-v1.5` (110M params)
- **Pros**: Balanced performance, good multilingual support, reasonable speed
- **Cons**: Requires more compute than small models
- **Use when**: Need quality results without extreme compute requirements, general-purpose RAG applications
-
-### Large Models (500M+ parameters)
-**Example**: `Qwen/Qwen3-Embedding-0.6B` (600M params), `intfloat/multilingual-e5-large` (560M params)
- **Pros**: Best semantic understanding, captures complex relationships, excellent multilingual support. **Qwen3-Embedding-0.6B achieves nearly OpenAI API performance!**
- **Cons**: Slower inference, longer index build times
- **Use when**: Quality is paramount and you have sufficient compute resources. **Highly recommended** for production use
-
-### Quick Start: OpenAI Embeddings (Fastest Setup)
-
-For immediate testing without local model downloads:
-```bash
-# Set OpenAI embeddings (requires OPENAI_API_KEY)
--embedding-mode openai --embedding-model text-embedding-3-small
-```
-
-<details>
-<summary><strong>Cloud vs Local Trade-offs</strong></summary>
-
-**OpenAI Embeddings** (`text-embedding-3-small/large`)
- **Pros**: No local compute needed, consistently fast, high quality
- **Cons**: Requires API key, costs money, data leaves your system, [known limitations with certain languages](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/)
- **When to use**: Prototyping, non-sensitive data, need immediate results
-
-**Local Embeddings**
- **Pros**: Complete privacy, no ongoing costs, full control, can sometimes outperform OpenAI embeddings
- **Cons**: Slower than cloud APIs, requires local compute resources
- **When to use**: Production systems, sensitive data, cost-sensitive applications
-
-</details>
-
-## Index Selection: Matching Your Scale
-
-### HNSW (Hierarchical Navigable Small World)
-**Best for**: Small to medium datasets (< 10M vectors) - **Default and recommended for extreme low storage**
- Full recomputation required
- High memory usage during build phase
- Excellent recall (95%+)
-
-```bash
-# Optimal for most use cases
--backend-name hnsw --graph-degree 32 --build-complexity 64
-```
-
-### DiskANN
-**Best for**: Large datasets (> 10M vectors, 10GB+ index size) - **⚠️ Beta version, still in active development**
- Uses Product Quantization (PQ) for coarse filtering during graph traversal
- Novel approach: stores only PQ codes, performs rerank with exact computation in final step
- Implements a corner case of double-queue: prunes all neighbors and recomputes at the end
-
-```bash
-# For billion-scale deployments
--backend-name diskann --graph-degree 64 --build-complexity 128
-```
-
-## LLM Selection: Engine and Model Comparison
-
-### LLM Engines
-
-**OpenAI** (`--llm openai`)
- **Pros**: Best quality, consistent performance, no local resources needed
- **Cons**: Costs money ($0.15-2.5 per million tokens), requires internet, data privacy concerns
- **Models**: `gpt-4o-mini` (fast, cheap), `gpt-4o` (best quality), `o3-mini` (reasoning, not so expensive)
- **Note**: Our current default, but we recommend switching to Ollama for most use cases
-
-**Ollama** (`--llm ollama`)
- **Pros**: Fully local, free, privacy-preserving, good model variety
- **Cons**: Requires local GPU/CPU resources, slower than cloud APIs, need to install extra [ollama app](https://github.com/ollama/ollama?tab=readme-ov-file#ollama) and pre-download models by `ollama pull`
- **Models**: `qwen3:0.6b` (ultra-fast), `qwen3:1.7b` (balanced), `qwen3:4b` (good quality), `qwen3:7b` (high quality), `deepseek-r1:1.5b` (reasoning)
-
-**HuggingFace** (`--llm hf`)
- **Pros**: Free tier available, huge model selection, direct model loading (vs Ollama's server-based approach)
- **Cons**: More complex initial setup
- **Models**: `Qwen/Qwen3-1.7B-FP8`
-
-## Parameter Tuning Guide
-
-### Search Complexity Parameters
-
-**`--build-complexity`** (index building)
- Controls thoroughness during index construction
- Higher = better recall but slower build
- Recommendations:
-  - 32: Quick prototyping
-  - 64: Balanced (default)
-  - 128: Production systems
-  - 256: Maximum quality
-
-**`--search-complexity`** (query time)
- Controls search thoroughness
- Higher = better results but slower
- Recommendations:
-  - 16: Fast/Interactive search
-  - 32: High quality with diversity
-  - 64+: Maximum accuracy
-
-### Top-K Selection
-
-**`--top-k`** (number of retrieved chunks)
- More chunks = better context but slower LLM processing
- Should be always smaller than `--search-complexity`
- Guidelines:
-  - 10-20: General questions (default: 20)
-  - 30+: Complex multi-hop reasoning requiring comprehensive context
-
-**Trade-off formula**:
- Retrieval time ∝ log(n) × search_complexity
- LLM processing time ∝ top_k × chunk_size
- Total context = top_k × chunk_size tokens
-
-### Graph Degree (HNSW/DiskANN)
-
-**`--graph-degree`**
- Number of connections per node in the graph
- Higher = better recall but more memory
- HNSW: 16-32 (default: 32)
- DiskANN: 32-128 (default: 64)
-
-
-## Performance Optimization Checklist
-
-### If Embedding is Too Slow
-
-1. **Switch to smaller model**:
-   ```bash
-   # From large model
-   --embedding-model Qwen/Qwen3-Embedding-0.6B
-   # To small model
-   --embedding-model sentence-transformers/all-MiniLM-L6-v2
-   ```
-
-2. **Limit dataset size for testing**:
-   ```bash
-   --max-items 1000  # Process first 1k items only
-   ```
-
-3. **Use MLX on Apple Silicon** (optional optimization):
-   ```bash
-   --embedding-mode mlx --embedding-model mlx-community/multilingual-e5-base-mlx
-   ```
-
-### If Search Quality is Poor
-
-1. **Increase retrieval count**:
-   ```bash
-   --top-k 30  # Retrieve more candidates
-   ```
-
-2. **Upgrade embedding model**:
-   ```bash
-   # For English
-   --embedding-model BAAI/bge-base-en-v1.5
-   # For multilingual
-   --embedding-model intfloat/multilingual-e5-large
-   ```
-
-## Understanding the Trade-offs
-
-Every configuration choice involves trade-offs:
-
-| Factor | Small/Fast | Large/Quality |
-|--------|------------|---------------|
-| Embedding Model | `all-MiniLM-L6-v2` | `Qwen/Qwen3-Embedding-0.6B` |
-| Chunk Size | 512 tokens | 128 tokens |
-| Index Type | HNSW | DiskANN |
-| LLM | `qwen3:1.7b` | `gpt-4o` |
-
-The key is finding the right balance for your specific use case. Start small and simple, measure performance, then scale up only where needed.
-
-## Deep Dive: Critical Configuration Decisions
-
-### When to Disable Recomputation
-
-LEANN's recomputation feature provides exact distance calculations but can be disabled for extreme QPS requirements:
-
-```bash
--no-recompute  # Disable selective recomputation
-```
-
-**Trade-offs**:
- **With recomputation** (default): Exact distances, best quality, higher latency, minimal storage (only stores metadata, recomputes embeddings on-demand)
- **Without recomputation**: Must store full embeddings, significantly higher memory and storage usage (10-100x more), but faster search
-
-**Disable when**:
- You have abundant storage and memory
- Need extremely low latency (< 100ms)
- Running a read-heavy workload where storage cost is acceptable
-
-## Further Reading
-
- [Lessons Learned Developing LEANN](https://yichuan-w.github.io/blog/lessons_learned_in_dev_leann/)
- [LEANN Technical Paper](https://arxiv.org/abs/2506.08276)
- [DiskANN Original Paper](https://papers.nips.cc/paper/2019/file/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Paper.pdf)
@@ -5,7 +5,7 @@
 - **🔄 Real-time Embeddings** - Eliminate heavy embedding storage with dynamic computation using optimized ZMQ servers and highly optimized search paradigm (overlapping and batching) with highly optimized embedding engine
 - **📈 Scalable Architecture** - Handles millions of documents on consumer hardware; the larger your dataset, the more LEANN can save
 - **🎯 Graph Pruning** - Advanced techniques to minimize the storage overhead of vector search to a limited footprint
- **🏗️ Pluggable Backends** - HNSW/FAISS (default), with optional DiskANN for large-scale deployments
+- **🏗️ Pluggable Backends** - DiskANN, HNSW/FAISS with unified API

 ## 🛠️ Technical Highlights
 - **🔄 Recompute Mode** - Highest accuracy scenarios while eliminating vector storage overhead
@@ -2,8 +2,8 @@

 ## 🎯 Q2 2025

- [X] HNSW backend integration
 - [X] DiskANN backend with MIPS/L2/Cosine support
+- [X] HNSW backend integration
 - [X] Real-time embedding pipeline
 - [X] Memory-efficient graph pruning

@@ -7,7 +7,6 @@ from pathlib import Path
 from typing import Any, Literal

 import numpy as np
-import psutil
 from leann.interface import (
    LeannBackendBuilderInterface,
    LeannBackendFactoryInterface,
@@ -85,43 +84,6 @@ def _write_vectors_to_bin(data: np.ndarray, file_path: Path):
        f.write(data.tobytes())


-def _calculate_smart_memory_config(data: np.ndarray) -> tuple[float, float]:
-    """
-    Calculate smart memory configuration for DiskANN based on data size and system specs.
-
-    Args:
-        data: The embedding data array
-
-    Returns:
-        tuple: (search_memory_maximum, build_memory_maximum) in GB
-    """
-    num_vectors, dim = data.shape
-
-    # Calculate embedding storage size
-    embedding_size_bytes = num_vectors * dim * 4  # float32 = 4 bytes
-    embedding_size_gb = embedding_size_bytes / (1024**3)
-
-    # search_memory_maximum: 1/10 of embedding size for optimal PQ compression
-    # This controls Product Quantization size - smaller means more compression
-    search_memory_gb = max(0.1, embedding_size_gb / 10)  # At least 100MB
-
-    # build_memory_maximum: Based on available system RAM for sharding control
-    # This controls how much memory DiskANN uses during index construction
-    available_memory_gb = psutil.virtual_memory().available / (1024**3)
-    total_memory_gb = psutil.virtual_memory().total / (1024**3)
-
-    # Use 50% of available memory, but at least 2GB and at most 75% of total
-    build_memory_gb = max(2.0, min(available_memory_gb * 0.5, total_memory_gb * 0.75))
-
-    logger.info(
-        f"Smart memory config - Data: {embedding_size_gb:.2f}GB, "
-        f"Search mem: {search_memory_gb:.2f}GB (PQ control), "
-        f"Build mem: {build_memory_gb:.2f}GB (sharding control)"
-    )
-
-    return search_memory_gb, build_memory_gb
-
-
@register_backend("diskann")
 class DiskannBackend(LeannBackendFactoryInterface):
    @staticmethod
@@ -159,16 +121,6 @@ class DiskannBuilder(LeannBackendBuilderInterface):
                f"Unsupported distance_metric '{build_kwargs.get('distance_metric', 'unknown')}'."
            )

-        # Calculate smart memory configuration if not explicitly provided
-        if (
-            "search_memory_maximum" not in build_kwargs
-            or "build_memory_maximum" not in build_kwargs
-        ):
-            smart_search_mem, smart_build_mem = _calculate_smart_memory_config(data)
-        else:
-            smart_search_mem = build_kwargs.get("search_memory_maximum", 4.0)
-            smart_build_mem = build_kwargs.get("build_memory_maximum", 8.0)
-
        try:
            from . import _diskannpy as diskannpy  # type: ignore

@@ -179,8 +131,8 @@ class DiskannBuilder(LeannBackendBuilderInterface):
                    index_prefix,
                    build_kwargs.get("complexity", 64),
                    build_kwargs.get("graph_degree", 32),
-                    build_kwargs.get("search_memory_maximum", smart_search_mem),
-                    build_kwargs.get("build_memory_maximum", smart_build_mem),
+                    build_kwargs.get("search_memory_maximum", 4.0),
+                    build_kwargs.get("build_memory_maximum", 8.0),
                    build_kwargs.get("num_threads", 8),
                    build_kwargs.get("pq_disk_bytes", 0),
                    "",
@@ -312,8 +264,6 @@ class DiskannSearcher(BaseSearcher):
            use_global_pruning = True

        # Perform search with suppressed C++ output based on log level
-        use_deferred_fetch = kwargs.get("USE_DEFERRED_FETCH", True)
-        recompute_neighors = False
        with suppress_cpp_output_if_needed():
            labels, distances = self._index.batch_search(
                query,
@@ -322,9 +272,9 @@ class DiskannSearcher(BaseSearcher):
                complexity,
                beam_width,
                self.num_threads,
-                use_deferred_fetch,
+                kwargs.get("USE_DEFERRED_FETCH", False),
                kwargs.get("skip_search_reorder", False),
-                recompute_neighors,
+                recompute_embeddings,
                dedup_node_dis,
                prune_ratio,
                batch_recompute,
@@ -4,8 +4,8 @@ build-backend = "scikit_build_core.build"

 [project]
 name = "leann-backend-diskann"
-version = "0.2.1"
-dependencies = ["leann-core==0.2.1", "numpy", "protobuf>=3.19.0"]
+version = "0.1.16"
+dependencies = ["leann-core==0.1.16", "numpy", "protobuf>=3.19.0"]

 [tool.scikit-build]
 # Key: simplified CMake path
@@ -6,10 +6,10 @@ build-backend = "scikit_build_core.build"

 [project]
 name = "leann-backend-hnsw"
-version = "0.2.1"
+version = "0.1.16"
 description = "Custom-built HNSW (Faiss) backend for the Leann toolkit."
 dependencies = [
-    "leann-core==0.2.1",
+    "leann-core==0.1.16",
    "numpy",
    "pyzmq>=23.0.0",
    "msgpack>=1.0.0",
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "leann-core"
-version = "0.2.1"
+version = "0.1.16"
 description = "Core API and plugin system for LEANN"
 readme = "README.md"
 requires-python = ">=3.9"
@@ -636,10 +636,7 @@ class LeannChat:
            "Please provide the best answer you can based on this context and your knowledge."
        )

-        ask_time = time.time()
        ans = self.llm.ask(prompt, **llm_kwargs)
-        ask_time = time.time() - ask_time
-        logger.info(f"  Ask time: {ask_time} seconds")
        return ans

    def start_interactive(self):
@@ -358,11 +358,7 @@ def validate_model_and_suggest(model_name: str, llm_type: str) -> str | None:
                error_msg += f"\n\nModel '{model_name}' was not found in Ollama's library."

                if suggestions:
-                    error_msg += (
-                        "\n\nDid you mean one of these installed models?\n"
-                        + "\nTry to use ollama pull to install the model you need\n"
-                    )
-
+                    error_msg += "\n\nDid you mean one of these installed models?\n"
                    for i, suggestion in enumerate(suggestions, 1):
                        error_msg += f"  {i}. {suggestion}\n"
                else:
@@ -546,41 +542,14 @@ class HFChat(LLMInterface):
            self.device = "cpu"
            logger.info("No GPU detected. Using CPU.")

-        # Load tokenizer and model with timeout protection
-        try:
-            import signal
-
-            def timeout_handler(signum, frame):
-                raise TimeoutError("Model download/loading timed out")
-
-            # Set timeout for model loading (60 seconds)
-            old_handler = signal.signal(signal.SIGALRM, timeout_handler)
-            signal.alarm(60)
-
-            try:
-                logger.info(f"Loading tokenizer for {model_name}...")
-                self.tokenizer = AutoTokenizer.from_pretrained(model_name)
-
-                logger.info(f"Loading model {model_name}...")
-                self.model = AutoModelForCausalLM.from_pretrained(
-                    model_name,
-                    torch_dtype=torch.float16 if self.device != "cpu" else torch.float32,
-                    device_map="auto" if self.device != "cpu" else None,
-                    trust_remote_code=True,
-                )
-                logger.info(f"Successfully loaded {model_name}")
-            finally:
-                signal.alarm(0)  # Cancel the alarm
-                signal.signal(signal.SIGALRM, old_handler)  # Restore old handler
-
-        except TimeoutError:
-            logger.error(f"Model loading timed out for {model_name}")
-            raise RuntimeError(
-                f"Model loading timed out for {model_name}. Please check your internet connection or try a smaller model."
-            )
-        except Exception as e:
-            logger.error(f"Failed to load model {model_name}: {e}")
-            raise
+        # Load tokenizer and model
+        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
+        self.model = AutoModelForCausalLM.from_pretrained(
+            model_name,
+            torch_dtype=torch.float16 if self.device != "cpu" else torch.float32,
+            device_map="auto" if self.device != "cpu" else None,
+            trust_remote_code=True,
+        )

        # Move model to device if not using device_map
        if self.device != "cpu" and "device_map" not in str(self.model):
@@ -354,21 +354,13 @@ class EmbeddingServerManager:
        self.server_process.terminate()

        try:
-            self.server_process.wait(timeout=3)
+            self.server_process.wait(timeout=5)
            logger.info(f"Server process {self.server_process.pid} terminated.")
        except subprocess.TimeoutExpired:
            logger.warning(
-                f"Server process {self.server_process.pid} did not terminate gracefully within 3 seconds, killing it."
+                f"Server process {self.server_process.pid} did not terminate gracefully, killing it."
            )
            self.server_process.kill()
-            try:
-                self.server_process.wait(timeout=2)
-                logger.info(f"Server process {self.server_process.pid} killed successfully.")
-            except subprocess.TimeoutExpired:
-                logger.error(
-                    f"Failed to kill server process {self.server_process.pid} - it may be hung"
-                )
-                # Don't hang indefinitely

        # Clean up process resources to prevent resource tracker warnings
        try:
@@ -5,8 +5,11 @@ LEANN is a revolutionary vector database that democratizes personal AI. Transfor
 ## Installation

 ```bash
-# Default installation (includes both HNSW and DiskANN backends)
+# Default installation (HNSW backend, recommended)
 uv pip install leann
+
+# With DiskANN backend (for large-scale deployments)
+uv pip install leann[diskann]
 ```

 ## Quick Start
@@ -16,8 +19,8 @@ from leann import LeannBuilder, LeannSearcher, LeannChat
 from pathlib import Path
 INDEX_PATH = str(Path("./").resolve() / "demo.leann")

-# Build an index (choose backend: "hnsw" or "diskann")
-builder = LeannBuilder(backend_name="hnsw")  # or "diskann" for large-scale deployments
+# Build an index
+builder = LeannBuilder(backend_name="hnsw")
 builder.add_text("LEANN saves 97% storage compared to traditional vector databases.")
 builder.add_text("Tung Tung Tung Sahur called—they need their banana‑crocodile hybrid back")
 builder.build_index(INDEX_PATH)
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "leann"
-version = "0.2.1"
+version = "0.1.16"
 description = "LEANN - The smallest vector index in the world. RAG Everything with LEANN!"
 readme = "README.md"
 requires-python = ">=3.9"
@@ -24,15 +24,16 @@ classifiers = [
    "Programming Language :: Python :: 3.12",
 ]

-# Default installation: core + hnsw + diskann
+# Default installation: core + hnsw
 dependencies = [
    "leann-core>=0.1.0",
    "leann-backend-hnsw>=0.1.0",
-    "leann-backend-diskann>=0.1.0",
 ]

 [project.optional-dependencies]
-# All backends now included by default
+diskann = [
+    "leann-backend-diskann>=0.1.0",
+]

 [project.urls]
 Repository = "https://github.com/yichuan-w/LEANN"
Author	SHA1	Message	Date
Andy Lee	0877960547	docs: update README to use proper module imports for apps - Change from 'python apps/xxx.py' to 'python -m apps.xxx' - More professional and pythonic module calling - Ensures proper module resolution and imports - Better separation between apps/ (production tools) and examples/ (demos)	2025-08-03 23:05:48 -07:00
yichuan520030910320	d68af63d05	merge	2025-08-03 23:02:45 -07:00
yichuan520030910320	b844aca968	Merge branch 'refactor-app' of https://github.com/yichuan-w/LEANN into refactor-app	2025-08-03 23:02:12 -07:00
yichuan520030910320	85277ba67a	fix wechat	2025-08-03 23:02:06 -07:00
Andy Lee	e9562acdc2	fix: handle certificate errors in link checker	2025-08-03 22:42:16 -07:00
Andy Lee	7fd3db1ddb	fix: add init.py	2025-08-03 22:41:20 -07:00
Andy Lee	c1ccc51a75	refactor: reorganize examples and add link checker	2025-08-03 22:40:15 -07:00
Andy Lee	b0239b6e4d	refactor: reorgnize all examples/ and test/	2025-08-03 22:37:45 -07:00
yichuan520030910320	58556ef44c	merge	2025-08-03 22:29:30 -07:00
yichuan520030910320	87c930d705	fix email wrong -1 to process all file	2025-08-03 22:27:04 -07:00
Andy Lee	86f919a6da	fix: WeChat history reader bugs and refactor wechat_rag to use unified architecture	2025-08-03 21:54:40 -07:00
Andy Lee	f8d34663b4	feat: check if k is larger than #docs	2025-08-03 21:41:53 -07:00
yichuan520030910320	568cf597f4	fix some example	2025-08-03 21:19:05 -07:00
yichuan520030910320	baf70dc411	change rebuild logic	2025-08-03 20:54:52 -07:00
yichuan520030910320	7ad2ec39d6	add response highlight	2025-08-03 20:32:07 -07:00
Andy Lee	31fd3c816a	fix: update default embedding models for better performance - Change WeChat, Browser, and Email RAG examples to use all-MiniLM-L6-v2 - Previous Qwen/Qwen3-Embedding-0.6B was too slow for these use cases - all-MiniLM-L6-v2 is a fast 384-dim model, ideal for large-scale personal data	2025-08-02 19:04:59 -07:00
Andy Lee	1f6c7f2f5a	docs: Emphasize diverse data sources in examples/data description	2025-07-30 22:42:34 -07:00
Andy Lee	c1124eb349	feat: Update documentation based on review feedback - Add MLX embedding example to README - Clarify examples/data content description (two papers, Pride and Prejudice, Chinese README) - Move chunk parameters to common parameters section - Remove duplicate chunk parameters from document-specific section	2025-07-30 18:05:39 -07:00
Andy Lee	274bbb19ea	feat: Add chunk-size parameters and improve file type filtering - Add --chunk-size and --chunk-overlap parameters to all RAG examples - Preserve original default values for each data source: - Document: 256/128 (optimized for general documents) - Email: 256/25 (smaller overlap for email threads) - Browser: 256/128 (standard for web content) - WeChat: 192/64 (smaller chunks for chat messages) - Make --file-types optional filter instead of restriction in document_rag - Update README to clarify interactive mode and parameter usage - Fix LLM default model documentation (gpt-4o, not gpt-4o-mini)	2025-07-29 18:31:56 -07:00
Andy Lee	8c152c7a31	feat: Address review comments - Add complexity parameter to LeannChat initialization (default: search_complexity) - Fix chunk-size default in README documentation (256, not 2048) - Add more index building parameters as CLI arguments: - --backend-name (hnsw/diskann) - --graph-degree (default: 32) - --build-complexity (default: 64) - --no-compact (disable compact storage) - --no-recompute (disable embedding recomputation) - Update README to document all new parameters	2025-07-29 16:59:24 -07:00
Andy Lee	ce77eef13a	fix: Fix async/await and add_text issues in unified examples - Remove incorrect await from chat.ask() calls (not async) - Fix add_texts -> add_text method calls - Verify search-complexity correctly maps to efSearch parameter - All examples now run successfully	2025-07-29 16:00:58 -07:00
Andy Lee	9d77175ac8	fix: Fix issues in unified examples - Add smart path detection for data directory - Fix add_texts -> add_text method call - Handle both running from project root and examples directory	2025-07-29 15:55:46 -07:00
Andy Lee	7fbb6c98ef	docs: nit	2025-07-29 14:30:04 -07:00
Andy Lee	914a248c28	docs: Add introduction for Common Parameters section - Add 'Flexible Configuration' heading with descriptive sentence - Create parallel structure with 'Generation Model Setup' section - Improve document flow and readability	2025-07-29 14:16:33 -07:00
Andy Lee	55fc5862f9	docs: Fix collapsible sections - Make Common Parameters collapsible (as it's lengthy reference material) - Keep CLI Installation visible (important for users to see immediately) - Better information hierarchy	2025-07-29 14:14:26 -07:00
Andy Lee	fd97b8dfa8	style: format	2025-07-29 14:11:49 -07:00
Andy Lee	57959947a1	docs: Add collapsible section for CLI installation - Wrap CLI installation instructions in details/summary tags - Keep consistent with other collapsible sections in README - Improve document readability and navigation	2025-07-29 14:10:30 -07:00
Andy Lee	cc0c091ca5	docs: Clarify CLI global installation process - Explain the transition from venv to global installation - Add upgrade command for global installation - Make it clear that global install allows usage without venv activation	2025-07-29 14:06:16 -07:00
Andy Lee	ff389c7d8d	docs: Add CLI installation instructions - Add two installation options: venv and global uv tool - Clearly explain when to use each option - Make CLI more accessible for daily use	2025-07-29 14:05:33 -07:00
Andy Lee	6780a8eaba	docs: polish applications	2025-07-29 14:04:34 -07:00
Andy Lee	984056f126	docs: Reorganize parameter documentation structure - Move common parameters to a dedicated section before all examples - Rename sections to 'X-Specific Arguments' for clarity - Remove duplicate common parameters from individual examples - Better information architecture for users	2025-07-29 14:01:19 -07:00
Andy Lee	bd4451bf50	docs: Make example commands more representative - Add default values to parameter descriptions - Replace generic examples with real-world use cases - Focus on data-source-specific features in examples - Remove redundant demonstrations of common parameters	2025-07-29 13:59:29 -07:00
Andy Lee	34e313f64a	docs: Improve parameter categorization in README - Clearly separate core (shared) vs specific parameters - Move LLM and embedding examples to 'Example Commands' section - Add descriptive comments for all specific parameters - Keep only truly data-source-specific parameters in specific sections	2025-07-29 13:54:47 -07:00
Andy Lee	ddc789b231	fix: Restore embedding-mode parameter to all examples - All examples now have --embedding-mode parameter (unified interface benefit) - Default is 'sentence-transformers' (consistent with original behavior) - Users can now use OpenAI or MLX embeddings with any data source - Maintains functional equivalence with original scripts	2025-07-29 13:33:40 -07:00
Andy Lee	ff1b622bdd	refactor: Remove old example scripts and migration references - Delete old example scripts (mail_reader_leann.py, google_history_reader_leann.py, etc.) - Remove migration hints and backward compatibility - Update tests to use new unified examples directly - Clean up all references to old script names - Users now only see the new unified interface	2025-07-29 12:39:36 -07:00
Andy Lee	3cde4fc7b3	fix: Fix pre-commit issues and update tests - Fix import sorting and unused imports - Update type annotations to use built-in types (list, dict) instead of typing.List/Dict - Fix trailing whitespace and end-of-file issues - Fix Chinese fullwidth comma to regular comma - Update test_main_cli.py to test_document_rag.py - Add backward compatibility test for main_cli_example.py - Pass all pre-commit hooks (ruff, ruff-format, etc.)	2025-07-29 10:19:05 -07:00
Andy Lee	4e3bcda5fa	fix: Update CI tests for new unified examples interface - Rename test_main_cli.py to test_document_rag.py - Update all references from main_cli_example.py to document_rag.py - Update tests/README.md documentation The tests now properly test the new unified interface while maintaining the same test coverage and functionality.	2025-07-28 23:16:51 -07:00
Andy Lee	46f6f76fc3	refactor: Unify examples interface with BaseRAGExample - Create BaseRAGExample base class for all RAG examples - Refactor 4 examples to use unified interface: - document_rag.py (replaces main_cli_example.py) - email_rag.py (replaces mail_reader_leann.py) - browser_rag.py (replaces google_history_reader_leann.py) - wechat_rag.py (replaces wechat_history_reader_leann.py) - Maintain 100% parameter compatibility with original files - Add interactive mode support for all examples - Unify parameter names (--max-items replaces --max-emails/--max-entries) - Update README.md with new examples usage - Add PARAMETER_CONSISTENCY.md documenting all parameter mappings - Keep main_cli_example.py for backward compatibility with migration notice All default values, LeannBuilder parameters, and chunking settings remain identical to ensure full compatibility with existing indexes.	2025-07-28 23:11:16 -07:00