Compare commits
4 Commits
feature/ch
...
feature/im
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f95b344011 | ||
|
|
e88ba0e822 | ||
|
|
9447997a80 | ||
|
|
5219370019 |
88
README.md
88
README.md
@@ -20,7 +20,7 @@ LEANN is an innovative vector database that democratizes personal AI. Transform
|
||||
|
||||
LEANN achieves this through *graph-based selective recomputation* with *high-degree preserving pruning*, computing embeddings on-demand instead of storing them all. [Illustration Fig →](#️-architecture--how-it-works) | [Paper →](https://arxiv.org/abs/2506.08276)
|
||||
|
||||
**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can semantic search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)**, **[codebase](#-claude-code-integration-transform-your-development-workflow)**\* , or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
|
||||
**Ready to RAG Everything?** Transform your laptop into a personal AI assistant that can semantic search your **[file system](#-personal-data-manager-process-any-documents-pdf-txt-md)**, **[emails](#-your-personal-email-secretary-rag-on-apple-mail)**, **[browser history](#-time-machine-for-the-web-rag-your-entire-browser-history)**, **[chat history](#-wechat-detective-unlock-your-golden-memories)** ([WeChat](#-wechat-detective-unlock-your-golden-memories), [iMessage](#-imessage-history-your-personal-conversation-archive)), **[agent memory](#-chatgpt-chat-history-your-personal-ai-conversation-archive)** ([ChatGPT](#-chatgpt-chat-history-your-personal-ai-conversation-archive), [Claude](#-claude-chat-history-your-personal-ai-conversation-archive)), **[codebase](#-claude-code-integration-transform-your-development-workflow)**\* , or external knowledge bases (i.e., 60M documents) - all on your laptop, with zero cloud costs and complete privacy.
|
||||
|
||||
|
||||
\* Claude Code only supports basic `grep`-style keyword search. **LEANN** is a drop-in **semantic search MCP service fully compatible with Claude Code**, unlocking intelligent retrieval without changing your workflow. 🔥 Check out [the easy setup →](packages/leann-mcp/README.md)
|
||||
@@ -176,7 +176,7 @@ response = chat.ask("How much storage does LEANN save?", top_k=1)
|
||||
|
||||
## RAG on Everything!
|
||||
|
||||
LEANN supports RAG on various data sources including documents (`.pdf`, `.txt`, `.md`), Apple Mail, Google Search History, WeChat, ChatGPT conversations, Claude conversations, and more.
|
||||
LEANN supports RAG on various data sources including documents (`.pdf`, `.txt`, `.md`), Apple Mail, Google Search History, WeChat, ChatGPT conversations, Claude conversations, iMessage conversations, and more.
|
||||
|
||||
|
||||
|
||||
@@ -625,6 +625,90 @@ Once your Claude conversations are indexed, you can search with queries like:
|
||||
|
||||
</details>
|
||||
|
||||
### 💬 iMessage History: Your Personal Conversation Archive!
|
||||
|
||||
Transform your iMessage conversations into a searchable knowledge base! Search through all your text messages, group chats, and conversations with friends, family, and colleagues.
|
||||
|
||||
```bash
|
||||
python -m apps.imessage_rag --query "What did we discuss about the weekend plans?"
|
||||
```
|
||||
|
||||
**Unlock your message history.** Never lose track of important conversations, shared links, or memorable moments from your iMessage history.
|
||||
|
||||
<details>
|
||||
<summary><strong>📋 Click to expand: How to Access iMessage Data</strong></summary>
|
||||
|
||||
**iMessage data location:**
|
||||
|
||||
iMessage conversations are stored in a SQLite database on your Mac at:
|
||||
```
|
||||
~/Library/Messages/chat.db
|
||||
```
|
||||
|
||||
**Important setup requirements:**
|
||||
|
||||
1. **Grant Full Disk Access** to your terminal or IDE:
|
||||
- Open **System Preferences** → **Security & Privacy** → **Privacy**
|
||||
- Select **Full Disk Access** from the left sidebar
|
||||
- Click the **+** button and add your terminal app (Terminal, iTerm2) or IDE (VS Code, etc.)
|
||||
- Restart your terminal/IDE after granting access
|
||||
|
||||
2. **Alternative: Use a backup database**
|
||||
- If you have Time Machine backups or manual copies of the database
|
||||
- Use `--db-path` to specify a custom location
|
||||
|
||||
**Supported formats:**
|
||||
- Direct access to `~/Library/Messages/chat.db` (default)
|
||||
- Custom database path with `--db-path`
|
||||
- Works with backup copies of the database
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><strong>📋 Click to expand: iMessage-Specific Arguments</strong></summary>
|
||||
|
||||
#### Parameters
|
||||
```bash
|
||||
--db-path PATH # Path to chat.db file (default: ~/Library/Messages/chat.db)
|
||||
--concatenate-conversations # Group messages by conversation (default: True)
|
||||
--no-concatenate-conversations # Process each message individually
|
||||
--chunk-size N # Text chunk size (default: 1000)
|
||||
--chunk-overlap N # Overlap between chunks (default: 200)
|
||||
```
|
||||
|
||||
#### Example Commands
|
||||
```bash
|
||||
# Basic usage (requires Full Disk Access)
|
||||
python -m apps.imessage_rag
|
||||
|
||||
# Search with specific query
|
||||
python -m apps.imessage_rag --query "family dinner plans"
|
||||
|
||||
# Use custom database path
|
||||
python -m apps.imessage_rag --db-path /path/to/backup/chat.db
|
||||
|
||||
# Process individual messages instead of conversations
|
||||
python -m apps.imessage_rag --no-concatenate-conversations
|
||||
|
||||
# Limit processing for testing
|
||||
python -m apps.imessage_rag --max-items 100 --query "weekend"
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><strong>💡 Click to expand: Example queries you can try</strong></summary>
|
||||
|
||||
Once your iMessage conversations are indexed, you can search with queries like:
|
||||
- "What did we discuss about vacation plans?"
|
||||
- "Find messages about restaurant recommendations"
|
||||
- "Show me conversations with John about the project"
|
||||
- "Search for shared links about technology"
|
||||
- "Find group chat discussions about weekend events"
|
||||
- "What did mom say about the family gathering?"
|
||||
|
||||
</details>
|
||||
|
||||
### 🚀 Claude Code Integration: Transform Your Development Workflow!
|
||||
|
||||
<details>
|
||||
|
||||
1
apps/imessage_data/__init__.py
Normal file
1
apps/imessage_data/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""iMessage data processing module."""
|
||||
342
apps/imessage_data/imessage_reader.py
Normal file
342
apps/imessage_data/imessage_reader.py
Normal file
@@ -0,0 +1,342 @@
|
||||
"""
|
||||
iMessage data reader.
|
||||
|
||||
Reads and processes iMessage conversation data from the macOS Messages database.
|
||||
"""
|
||||
|
||||
import sqlite3
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from llama_index.core import Document
|
||||
from llama_index.core.readers.base import BaseReader
|
||||
|
||||
|
||||
class IMessageReader(BaseReader):
|
||||
"""
|
||||
iMessage data reader.
|
||||
|
||||
Reads iMessage conversation data from the macOS Messages database (chat.db).
|
||||
Processes conversations into structured documents with metadata.
|
||||
"""
|
||||
|
||||
def __init__(self, concatenate_conversations: bool = True) -> None:
|
||||
"""
|
||||
Initialize.
|
||||
|
||||
Args:
|
||||
concatenate_conversations: Whether to concatenate messages within conversations for better context
|
||||
"""
|
||||
self.concatenate_conversations = concatenate_conversations
|
||||
|
||||
def _get_default_chat_db_path(self) -> Path:
|
||||
"""
|
||||
Get the default path to the iMessage chat database.
|
||||
|
||||
Returns:
|
||||
Path to the chat.db file
|
||||
"""
|
||||
home = Path.home()
|
||||
return home / "Library" / "Messages" / "chat.db"
|
||||
|
||||
def _convert_cocoa_timestamp(self, cocoa_timestamp: int) -> str:
|
||||
"""
|
||||
Convert Cocoa timestamp to readable format.
|
||||
|
||||
Args:
|
||||
cocoa_timestamp: Timestamp in Cocoa format (nanoseconds since 2001-01-01)
|
||||
|
||||
Returns:
|
||||
Formatted timestamp string
|
||||
"""
|
||||
if cocoa_timestamp == 0:
|
||||
return "Unknown"
|
||||
|
||||
try:
|
||||
# Cocoa timestamp is nanoseconds since 2001-01-01 00:00:00 UTC
|
||||
# Convert to seconds and add to Unix epoch
|
||||
cocoa_epoch = datetime(2001, 1, 1)
|
||||
unix_timestamp = cocoa_timestamp / 1_000_000_000 # Convert nanoseconds to seconds
|
||||
message_time = cocoa_epoch.timestamp() + unix_timestamp
|
||||
return datetime.fromtimestamp(message_time).strftime("%Y-%m-%d %H:%M:%S")
|
||||
except (ValueError, OSError):
|
||||
return "Unknown"
|
||||
|
||||
def _get_contact_name(self, handle_id: str) -> str:
|
||||
"""
|
||||
Get a readable contact name from handle ID.
|
||||
|
||||
Args:
|
||||
handle_id: The handle ID (phone number or email)
|
||||
|
||||
Returns:
|
||||
Formatted contact name
|
||||
"""
|
||||
if not handle_id:
|
||||
return "Unknown"
|
||||
|
||||
# Clean up phone numbers and emails for display
|
||||
if "@" in handle_id:
|
||||
return handle_id # Email address
|
||||
elif handle_id.startswith("+"):
|
||||
return handle_id # International phone number
|
||||
else:
|
||||
# Try to format as phone number
|
||||
digits = "".join(filter(str.isdigit, handle_id))
|
||||
if len(digits) == 10:
|
||||
return f"({digits[:3]}) {digits[3:6]}-{digits[6:]}"
|
||||
elif len(digits) == 11 and digits[0] == "1":
|
||||
return f"+1 ({digits[1:4]}) {digits[4:7]}-{digits[7:]}"
|
||||
else:
|
||||
return handle_id
|
||||
|
||||
def _read_messages_from_db(self, db_path: Path) -> list[dict]:
|
||||
"""
|
||||
Read messages from the iMessage database.
|
||||
|
||||
Args:
|
||||
db_path: Path to the chat.db file
|
||||
|
||||
Returns:
|
||||
List of message dictionaries
|
||||
"""
|
||||
if not db_path.exists():
|
||||
print(f"iMessage database not found at: {db_path}")
|
||||
return []
|
||||
|
||||
try:
|
||||
# Connect to the database
|
||||
conn = sqlite3.connect(str(db_path))
|
||||
cursor = conn.cursor()
|
||||
|
||||
# Query to get messages with chat and handle information
|
||||
query = """
|
||||
SELECT
|
||||
m.ROWID as message_id,
|
||||
m.text,
|
||||
m.date,
|
||||
m.is_from_me,
|
||||
m.service,
|
||||
c.chat_identifier,
|
||||
c.display_name as chat_display_name,
|
||||
h.id as handle_id,
|
||||
c.ROWID as chat_id
|
||||
FROM message m
|
||||
LEFT JOIN chat_message_join cmj ON m.ROWID = cmj.message_id
|
||||
LEFT JOIN chat c ON cmj.chat_id = c.ROWID
|
||||
LEFT JOIN handle h ON m.handle_id = h.ROWID
|
||||
WHERE m.text IS NOT NULL AND m.text != ''
|
||||
ORDER BY c.ROWID, m.date
|
||||
"""
|
||||
|
||||
cursor.execute(query)
|
||||
rows = cursor.fetchall()
|
||||
|
||||
messages = []
|
||||
for row in rows:
|
||||
(
|
||||
message_id,
|
||||
text,
|
||||
date,
|
||||
is_from_me,
|
||||
service,
|
||||
chat_identifier,
|
||||
chat_display_name,
|
||||
handle_id,
|
||||
chat_id,
|
||||
) = row
|
||||
|
||||
message = {
|
||||
"message_id": message_id,
|
||||
"text": text,
|
||||
"timestamp": self._convert_cocoa_timestamp(date),
|
||||
"is_from_me": bool(is_from_me),
|
||||
"service": service or "iMessage",
|
||||
"chat_identifier": chat_identifier or "Unknown",
|
||||
"chat_display_name": chat_display_name or "Unknown Chat",
|
||||
"handle_id": handle_id or "Unknown",
|
||||
"contact_name": self._get_contact_name(handle_id or ""),
|
||||
"chat_id": chat_id,
|
||||
}
|
||||
messages.append(message)
|
||||
|
||||
conn.close()
|
||||
print(f"Found {len(messages)} messages in database")
|
||||
return messages
|
||||
|
||||
except sqlite3.Error as e:
|
||||
print(f"Error reading iMessage database: {e}")
|
||||
return []
|
||||
except Exception as e:
|
||||
print(f"Unexpected error reading iMessage database: {e}")
|
||||
return []
|
||||
|
||||
def _group_messages_by_chat(self, messages: list[dict]) -> dict[int, list[dict]]:
|
||||
"""
|
||||
Group messages by chat ID.
|
||||
|
||||
Args:
|
||||
messages: List of message dictionaries
|
||||
|
||||
Returns:
|
||||
Dictionary mapping chat_id to list of messages
|
||||
"""
|
||||
chats = {}
|
||||
for message in messages:
|
||||
chat_id = message["chat_id"]
|
||||
if chat_id not in chats:
|
||||
chats[chat_id] = []
|
||||
chats[chat_id].append(message)
|
||||
|
||||
return chats
|
||||
|
||||
def _create_concatenated_content(self, chat_id: int, messages: list[dict]) -> str:
|
||||
"""
|
||||
Create concatenated content from chat messages.
|
||||
|
||||
Args:
|
||||
chat_id: The chat ID
|
||||
messages: List of messages in the chat
|
||||
|
||||
Returns:
|
||||
Concatenated text content
|
||||
"""
|
||||
if not messages:
|
||||
return ""
|
||||
|
||||
# Get chat info from first message
|
||||
first_msg = messages[0]
|
||||
chat_name = first_msg["chat_display_name"]
|
||||
chat_identifier = first_msg["chat_identifier"]
|
||||
|
||||
# Build message content
|
||||
message_parts = []
|
||||
for message in messages:
|
||||
timestamp = message["timestamp"]
|
||||
is_from_me = message["is_from_me"]
|
||||
text = message["text"]
|
||||
contact_name = message["contact_name"]
|
||||
|
||||
if is_from_me:
|
||||
prefix = "[You]"
|
||||
else:
|
||||
prefix = f"[{contact_name}]"
|
||||
|
||||
if timestamp != "Unknown":
|
||||
prefix += f" ({timestamp})"
|
||||
|
||||
message_parts.append(f"{prefix}: {text}")
|
||||
|
||||
concatenated_text = "\n\n".join(message_parts)
|
||||
|
||||
doc_content = f"""Chat: {chat_name}
|
||||
Identifier: {chat_identifier}
|
||||
Messages ({len(messages)} messages):
|
||||
|
||||
{concatenated_text}
|
||||
"""
|
||||
return doc_content
|
||||
|
||||
def _create_individual_content(self, message: dict) -> str:
|
||||
"""
|
||||
Create content for individual message.
|
||||
|
||||
Args:
|
||||
message: Message dictionary
|
||||
|
||||
Returns:
|
||||
Formatted message content
|
||||
"""
|
||||
timestamp = message["timestamp"]
|
||||
is_from_me = message["is_from_me"]
|
||||
text = message["text"]
|
||||
contact_name = message["contact_name"]
|
||||
chat_name = message["chat_display_name"]
|
||||
|
||||
sender = "You" if is_from_me else contact_name
|
||||
|
||||
return f"""Message from {sender} in chat "{chat_name}"
|
||||
Time: {timestamp}
|
||||
Content: {text}
|
||||
"""
|
||||
|
||||
def load_data(self, input_dir: str | None = None, **load_kwargs: Any) -> list[Document]:
|
||||
"""
|
||||
Load iMessage data and return as documents.
|
||||
|
||||
Args:
|
||||
input_dir: Optional path to directory containing chat.db file.
|
||||
If not provided, uses default macOS location.
|
||||
**load_kwargs: Additional arguments (unused)
|
||||
|
||||
Returns:
|
||||
List of Document objects containing iMessage data
|
||||
"""
|
||||
docs = []
|
||||
|
||||
# Determine database path
|
||||
if input_dir:
|
||||
db_path = Path(input_dir) / "chat.db"
|
||||
else:
|
||||
db_path = self._get_default_chat_db_path()
|
||||
|
||||
print(f"Reading iMessage database from: {db_path}")
|
||||
|
||||
# Read messages from database
|
||||
messages = self._read_messages_from_db(db_path)
|
||||
if not messages:
|
||||
return docs
|
||||
|
||||
if self.concatenate_conversations:
|
||||
# Group messages by chat and create concatenated documents
|
||||
chats = self._group_messages_by_chat(messages)
|
||||
|
||||
for chat_id, chat_messages in chats.items():
|
||||
if not chat_messages:
|
||||
continue
|
||||
|
||||
content = self._create_concatenated_content(chat_id, chat_messages)
|
||||
|
||||
# Create metadata
|
||||
first_msg = chat_messages[0]
|
||||
last_msg = chat_messages[-1]
|
||||
|
||||
metadata = {
|
||||
"source": "iMessage",
|
||||
"chat_id": chat_id,
|
||||
"chat_name": first_msg["chat_display_name"],
|
||||
"chat_identifier": first_msg["chat_identifier"],
|
||||
"message_count": len(chat_messages),
|
||||
"first_message_date": first_msg["timestamp"],
|
||||
"last_message_date": last_msg["timestamp"],
|
||||
"participants": list(
|
||||
{msg["contact_name"] for msg in chat_messages if not msg["is_from_me"]}
|
||||
),
|
||||
}
|
||||
|
||||
doc = Document(text=content, metadata=metadata)
|
||||
docs.append(doc)
|
||||
|
||||
else:
|
||||
# Create individual documents for each message
|
||||
for message in messages:
|
||||
content = self._create_individual_content(message)
|
||||
|
||||
metadata = {
|
||||
"source": "iMessage",
|
||||
"message_id": message["message_id"],
|
||||
"chat_id": message["chat_id"],
|
||||
"chat_name": message["chat_display_name"],
|
||||
"chat_identifier": message["chat_identifier"],
|
||||
"timestamp": message["timestamp"],
|
||||
"is_from_me": message["is_from_me"],
|
||||
"contact_name": message["contact_name"],
|
||||
"service": message["service"],
|
||||
}
|
||||
|
||||
doc = Document(text=content, metadata=metadata)
|
||||
docs.append(doc)
|
||||
|
||||
print(f"Created {len(docs)} documents from iMessage data")
|
||||
return docs
|
||||
125
apps/imessage_rag.py
Normal file
125
apps/imessage_rag.py
Normal file
@@ -0,0 +1,125 @@
|
||||
"""
|
||||
iMessage RAG Example.
|
||||
|
||||
This example demonstrates how to build a RAG system on your iMessage conversation history.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
from pathlib import Path
|
||||
|
||||
from leann.chunking_utils import create_text_chunks
|
||||
|
||||
from apps.base_rag_example import BaseRAGExample
|
||||
from apps.imessage_data.imessage_reader import IMessageReader
|
||||
|
||||
|
||||
class IMessageRAG(BaseRAGExample):
|
||||
"""RAG example for iMessage conversation history."""
|
||||
|
||||
def __init__(self):
|
||||
super().__init__(
|
||||
name="iMessage",
|
||||
description="RAG on your iMessage conversation history",
|
||||
default_index_name="imessage_index",
|
||||
)
|
||||
|
||||
def _add_specific_arguments(self, parser):
|
||||
"""Add iMessage-specific arguments."""
|
||||
imessage_group = parser.add_argument_group("iMessage Parameters")
|
||||
imessage_group.add_argument(
|
||||
"--db-path",
|
||||
type=str,
|
||||
default=None,
|
||||
help="Path to iMessage chat.db file (default: ~/Library/Messages/chat.db)",
|
||||
)
|
||||
imessage_group.add_argument(
|
||||
"--concatenate-conversations",
|
||||
action="store_true",
|
||||
default=True,
|
||||
help="Concatenate messages within conversations for better context (default: True)",
|
||||
)
|
||||
imessage_group.add_argument(
|
||||
"--no-concatenate-conversations",
|
||||
action="store_true",
|
||||
help="Process each message individually instead of concatenating by conversation",
|
||||
)
|
||||
imessage_group.add_argument(
|
||||
"--chunk-size",
|
||||
type=int,
|
||||
default=1000,
|
||||
help="Maximum characters per text chunk (default: 1000)",
|
||||
)
|
||||
imessage_group.add_argument(
|
||||
"--chunk-overlap",
|
||||
type=int,
|
||||
default=200,
|
||||
help="Overlap between text chunks (default: 200)",
|
||||
)
|
||||
|
||||
async def load_data(self, args) -> list[str]:
|
||||
"""Load iMessage history and convert to text chunks."""
|
||||
print("Loading iMessage conversation history...")
|
||||
|
||||
# Determine concatenation setting
|
||||
concatenate = args.concatenate_conversations and not args.no_concatenate_conversations
|
||||
|
||||
# Initialize iMessage reader
|
||||
reader = IMessageReader(concatenate_conversations=concatenate)
|
||||
|
||||
# Load documents
|
||||
try:
|
||||
if args.db_path:
|
||||
# Use custom database path
|
||||
db_dir = str(Path(args.db_path).parent)
|
||||
documents = reader.load_data(input_dir=db_dir)
|
||||
else:
|
||||
# Use default macOS location
|
||||
documents = reader.load_data()
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error loading iMessage data: {e}")
|
||||
print("\nTroubleshooting tips:")
|
||||
print("1. Make sure you have granted Full Disk Access to your terminal/IDE")
|
||||
print("2. Check that the iMessage database exists at ~/Library/Messages/chat.db")
|
||||
print("3. Try specifying a custom path with --db-path if you have a backup")
|
||||
return []
|
||||
|
||||
if not documents:
|
||||
print("No iMessage conversations found!")
|
||||
return []
|
||||
|
||||
print(f"Loaded {len(documents)} iMessage documents")
|
||||
|
||||
# Show some statistics
|
||||
total_messages = sum(doc.metadata.get("message_count", 1) for doc in documents)
|
||||
print(f"Total messages: {total_messages}")
|
||||
|
||||
if concatenate:
|
||||
# Show chat statistics
|
||||
chat_names = [doc.metadata.get("chat_name", "Unknown") for doc in documents]
|
||||
unique_chats = len(set(chat_names))
|
||||
print(f"Unique conversations: {unique_chats}")
|
||||
|
||||
# Convert to text chunks
|
||||
all_texts = create_text_chunks(
|
||||
documents,
|
||||
chunk_size=args.chunk_size,
|
||||
chunk_overlap=args.chunk_overlap,
|
||||
)
|
||||
|
||||
# Apply max_items limit if specified
|
||||
if args.max_items > 0:
|
||||
all_texts = all_texts[: args.max_items]
|
||||
print(f"Limited to {len(all_texts)} text chunks (max_items={args.max_items})")
|
||||
|
||||
return all_texts
|
||||
|
||||
|
||||
async def main():
|
||||
"""Main entry point."""
|
||||
app = IMessageRAG()
|
||||
await app.run()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
Reference in New Issue
Block a user