diff --git a/README.md b/README.md index 88efe41..b74af2d 100755 --- a/README.md +++ b/README.md @@ -24,13 +24,13 @@ LEANN achieves this through *graph-based selective recomputation* with *high-deg LEANN vs Traditional Vector DB Storage Comparison

-**The numbers speak for themselves:** Index 60 million Wikipedia articles in just 6GB instead of 201GB. Finally, your MacBook can handle enterprise-scale datasets. [See detailed benchmarks below ↓](#storage-usage-comparison) +**The numbers speak for themselves:** Index 60 million Wikipedia articles in just 6GB instead of 201GB. From emails to browser history, everything fits on your laptop. [See detailed benchmarks below ↓](#storage-usage-comparison) ## Why This Matters 🔒 **Privacy:** Your data never leaves your laptop. No OpenAI, no cloud, no "terms of service". -🪶 **Lightweight:** Minimal resource requirements - runs smoothly on any laptop without specialized hardware. +🪶 **Lightweight:** Smart graph pruning means less storage, less memory usage, better performance on your existing hardware. 📈 **Scalability:** Organize our messy personal data that would crash traditional vector DBs, with performance that gets better as your data grows more personalized. @@ -93,16 +93,19 @@ Just 3 lines of code. Our declarative API makes RAG as easy as writing a config ```python from leann.api import LeannBuilder, LeannSearcher -# Index your entire email history (90K emails = 14MB vs 305MB) +# 1. Build index (no embeddings stored!) builder = LeannBuilder(backend_name="hnsw") -builder.add_from_mailbox("~/Library/Mail") # Your actual emails -builder.build_index("my_life.leann") +builder.add_text("C# is a powerful programming language") +builder.add_text("Python is a powerful programming language") +builder.add_text("Machine learning transforms industries") +builder.add_text("Neural networks process complex data") +builder.add_text("Leann is a great storage saving engine for RAG on your macbook") +builder.build_index("knowledge.leann") -# Ask questions about your own data -searcher = LeannSearcher("my_life.leann") -searcher.search("What did my boss say about the deadline?") -searcher.search("Find emails about vacation requests") -searcher.search("Show me all conversations with John about the project") +# 2. Search with real-time embeddings +searcher = LeannSearcher("knowledge.leann") +results = searcher.search("C++ programming languages", top_k=2, recompute_beighbor_embeddings=True) +print(results) ``` **That's it.** No cloud setup, no API keys, no "fine-tuning". Just your data, your questions, your laptop. @@ -160,6 +163,15 @@ python examples/mail_reader_leann.py --query "What did my boss say about deadlin +
+📋 Click to expand: Example queries you can try + +Once the index is built, you can ask questions like: +- "Find emails from my boss about deadlines" +- "What did John say about the project timeline?" +- "Show me emails about travel expenses" +
+ ### 🌐 Time Machine for the Web ```bash python examples/google_history_reader_leann.py @@ -187,14 +199,54 @@ python examples/google_history_reader_leann.py --query "What websites did I visi +
+📋 Click to expand: How to find your Chrome profile + +The default Chrome profile path is configured for a typical macOS setup. If you need to find your specific Chrome profile: + +1. Open Terminal +2. Run: `ls ~/Library/Application\ Support/Google/Chrome/` +3. Look for folders like "Default", "Profile 1", "Profile 2", etc. +4. Use the full path as your `--chrome-profile` argument + +**Common Chrome profile locations:** +- macOS: `~/Library/Application Support/Google/Chrome/Default` +- Linux: `~/.config/google-chrome/Default` + +
+ +
+💬 Click to expand: Example queries you can try + +Once the index is built, you can ask questions like: + +- "What websites did I visit about machine learning?" +- "Find my search history about programming" +- "What YouTube videos did I watch recently?" +- "Show me websites I visited about travel planning" + +
+ ### 💬 WeChat Detective + ```bash -python examples/wechat_history_reader_leann.py -# "我想买魔术师约翰逊的球衣,给我一些对应聊天记录" +python examples/wechat_history_reader_leann.py # "Show me all group chats about weekend plans" ``` **400K messages → 64MB.** Search years of chat history in any language. +
+🔧 Click to expand: Installation Requirements + +First, you need to install the WeChat exporter: + +```bash +sudo packages/wechat-exporter/wechattweak-cli install +``` + +**Troubleshooting**: If you encounter installation issues, check the [WeChatTweak-CLI issues page](https://github.com/sunnyyoung/WeChatTweak-CLI/issues/41). +
+
📋 Click to expand: Command Examples @@ -202,7 +254,7 @@ python examples/wechat_history_reader_leann.py # Use default settings (recommended for first run) python examples/wechat_history_reader_leann.py -# Run with custom export directory +# Run with custom export directory and wehn we run the first time, LEANN will export all chat history automatically for you python examples/wechat_history_reader_leann.py --export-dir "./my_wechat_exports" # Run with custom index directory @@ -217,21 +269,14 @@ python examples/wechat_history_reader_leann.py --query "Show me conversations ab
-### 📚 Personal Wikipedia -```bash -# Index 60M Wikipedia articles in 6GB (not 201GB) -python examples/build_massive_index.py --source wikipedia -# "Explain quantum computing like I'm 5" -# "What are the connections between philosophy and AI?" -``` +
+💬 Click to expand: Example queries you can try -**PDF RAG Demo (using LlamaIndex for document parsing and Leann for indexing/search)** +Once the index is built, you can ask questions like: -This demo showcases how to build a RAG system for PDF/md documents using Leann. - -1. Place your PDF files (and other supported formats like .docx, .pptx, .xlsx) into the `examples/data/` directory. -2. Ensure you have an `OPENAI_API_KEY` set in your environment variables or in a `.env` file for the LLM to function. +- "我想买魔术师约翰逊的球衣,给我一些对应聊天记录?" (Chinese: Show me chat records about buying Magic Johnson's jersey) +
## 🏗️ Architecture & How It Works