* feat: Add MCP integration support for Slack and Twitter - Implement SlackMCPReader for connecting to Slack MCP servers - Implement TwitterMCPReader for connecting to Twitter MCP servers - Add SlackRAG and TwitterRAG applications with full CLI support - Support live data fetching via Model Context Protocol (MCP) - Add comprehensive documentation and usage examples - Include connection testing capabilities with --test-connection flag - Add standalone tests for core functionality - Update README with detailed MCP integration guide - Add Aakash Suresh to Active Contributors Resolves #36 * fix: Resolve linting issues in MCP integration - Replace deprecated typing.Dict/List with built-in dict/list - Fix boolean comparisons (== True/False) to direct checks - Remove unused variables in demo script - Update type annotations to use modern Python syntax All pre-commit hooks should now pass. * fix: Apply final formatting fixes for pre-commit hooks - Remove unused imports (asyncio, pathlib.Path) - Remove unused class imports in demo script - Ensure all files pass ruff format and pre-commit checks This should resolve all remaining CI linting issues. * fix: Apply pre-commit formatting changes - Fix trailing whitespace in all files - Apply ruff formatting to match project standards - Ensure consistent code style across all MCP integration files This commit applies the exact changes that pre-commit hooks expect. * fix: Apply pre-commit hooks formatting fixes - Remove trailing whitespace from all files - Fix ruff formatting issues (2 errors resolved) - Apply consistent code formatting across 3 files - Ensure all files pass pre-commit validation This resolves all CI formatting failures. * fix: Update MCP RAG classes to match BaseRAGExample signature - Fix SlackMCPRAG and TwitterMCPRAG __init__ methods to provide required parameters - Add name, description, and default_index_name to super().__init__ calls - Resolves test failures: test_slack_rag_initialization and test_twitter_rag_initialization This fixes the TypeError caused by BaseRAGExample requiring additional parameters. * style: Apply ruff formatting - add trailing commas - Add trailing commas to super().__init__ calls in SlackMCPRAG and TwitterMCPRAG - Fixes ruff format pre-commit hook requirements * fix: Resolve SentenceTransformer model_kwargs parameter conflict - Fix local_files_only parameter conflict in embedding_compute.py - Create separate copies of model_kwargs and tokenizer_kwargs for local vs network loading - Prevents parameter conflicts when falling back from local to network loading - Resolves TypeError in test_readme_examples.py tests This addresses the SentenceTransformer initialization issues in CI tests. * fix: Add comprehensive SentenceTransformer version compatibility - Handle both old and new sentence-transformers versions - Gracefully fallback from advanced parameters to basic initialization - Catch TypeError for model_kwargs/tokenizer_kwargs and use basic SentenceTransformer init - Ensures compatibility across different CI environments and local setups - Maintains optimization benefits where supported while ensuring broad compatibility This resolves test failures in CI environments with older sentence-transformers versions. * style: Apply ruff formatting to embedding_compute.py - Break long logger.warning lines for better readability - Fixes pre-commit hook formatting requirements * docs: Comprehensive documentation improvements for better user experience - Add clear step-by-step Getting Started Guide for new users - Add comprehensive CLI Reference with all commands and options - Improve installation instructions with clear steps and verification - Add detailed troubleshooting section for common issues (Ollama, OpenAI, etc.) - Clarify difference between CLI commands and specialized apps - Add environment variables documentation - Improve MCP integration documentation with CLI integration examples - Address user feedback about confusing installation and setup process This resolves documentation gaps that made LEANN difficult for non-specialists to use. * style: Remove trailing whitespace from README.md - Fix trailing whitespace issues found by pre-commit hooks - Ensures consistent formatting across documentation * docs: Simplify README by removing excessive documentation - Remove overly complex CLI reference and getting started sections (lines 61-334) - Remove emojis from section headers for cleaner appearance - Keep README simple and focused as requested - Maintain essential MCP integration documentation This addresses feedback to keep documentation minimal and avoid auto-generated content. * docs: Address maintainer feedback on README improvements - Restore emojis in section headers (Prerequisites and Quick Install) - Add MCP live data feature mention in line 23 with links to Slack and Twitter - Add detailed API credential setup instructions for Slack: - Step-by-step Slack App creation process - Required OAuth scopes and permissions - Clear token identification (xoxb- vs xapp-) - Add detailed API credential setup instructions for Twitter: - Twitter Developer Account application process - API v2 requirements for bookmarks access - Required permissions and scopes This addresses maintainer feedback to make API setup more user-friendly.
196 lines
6.9 KiB
Python
196 lines
6.9 KiB
Python
#!/usr/bin/env python3
|
|
"""
|
|
Twitter RAG Application with MCP Support
|
|
|
|
This application enables RAG (Retrieval-Augmented Generation) on Twitter bookmarks
|
|
by connecting to Twitter MCP servers to fetch live data and index it in LEANN.
|
|
|
|
Usage:
|
|
python -m apps.twitter_rag --mcp-server "twitter-mcp-server" --query "What articles did I bookmark about AI?"
|
|
"""
|
|
|
|
import argparse
|
|
import asyncio
|
|
|
|
from apps.base_rag_example import BaseRAGExample
|
|
from apps.twitter_data.twitter_mcp_reader import TwitterMCPReader
|
|
|
|
|
|
class TwitterMCPRAG(BaseRAGExample):
|
|
"""
|
|
RAG application for Twitter bookmarks via MCP servers.
|
|
|
|
This class provides a complete RAG pipeline for Twitter bookmark data, including
|
|
MCP server connection, data fetching, indexing, and interactive chat.
|
|
"""
|
|
|
|
def __init__(self):
|
|
super().__init__(
|
|
name="Twitter MCP RAG",
|
|
description="RAG application for Twitter bookmarks via MCP servers",
|
|
default_index_name="twitter_bookmarks",
|
|
)
|
|
|
|
def _add_specific_arguments(self, parser: argparse.ArgumentParser):
|
|
"""Add Twitter MCP-specific arguments."""
|
|
parser.add_argument(
|
|
"--mcp-server",
|
|
type=str,
|
|
required=True,
|
|
help="Command to start the Twitter MCP server (e.g., 'twitter-mcp-server' or 'npx twitter-mcp-server')",
|
|
)
|
|
|
|
parser.add_argument(
|
|
"--username", type=str, help="Twitter username to filter bookmarks (without @)"
|
|
)
|
|
|
|
parser.add_argument(
|
|
"--max-bookmarks",
|
|
type=int,
|
|
default=1000,
|
|
help="Maximum number of bookmarks to fetch (default: 1000)",
|
|
)
|
|
|
|
parser.add_argument(
|
|
"--no-tweet-content",
|
|
action="store_true",
|
|
help="Exclude tweet content, only include metadata",
|
|
)
|
|
|
|
parser.add_argument(
|
|
"--no-metadata",
|
|
action="store_true",
|
|
help="Exclude engagement metadata (likes, retweets, etc.)",
|
|
)
|
|
|
|
parser.add_argument(
|
|
"--test-connection",
|
|
action="store_true",
|
|
help="Test MCP server connection and list available tools without indexing",
|
|
)
|
|
|
|
async def test_mcp_connection(self, args) -> bool:
|
|
"""Test the MCP server connection and display available tools."""
|
|
print(f"Testing connection to MCP server: {args.mcp_server}")
|
|
|
|
try:
|
|
reader = TwitterMCPReader(
|
|
mcp_server_command=args.mcp_server,
|
|
username=args.username,
|
|
include_tweet_content=not args.no_tweet_content,
|
|
include_metadata=not args.no_metadata,
|
|
max_bookmarks=args.max_bookmarks,
|
|
)
|
|
|
|
async with reader:
|
|
tools = await reader.list_available_tools()
|
|
|
|
print("\n✅ Successfully connected to MCP server!")
|
|
print(f"Available tools ({len(tools)}):")
|
|
|
|
for i, tool in enumerate(tools, 1):
|
|
name = tool.get("name", "Unknown")
|
|
description = tool.get("description", "No description available")
|
|
print(f"\n{i}. {name}")
|
|
print(
|
|
f" Description: {description[:100]}{'...' if len(description) > 100 else ''}"
|
|
)
|
|
|
|
# Show input schema if available
|
|
schema = tool.get("inputSchema", {})
|
|
if schema.get("properties"):
|
|
props = list(schema["properties"].keys())[:3] # Show first 3 properties
|
|
print(
|
|
f" Parameters: {', '.join(props)}{'...' if len(schema['properties']) > 3 else ''}"
|
|
)
|
|
|
|
return True
|
|
|
|
except Exception as e:
|
|
print(f"\n❌ Failed to connect to MCP server: {e}")
|
|
print("\nTroubleshooting tips:")
|
|
print("1. Make sure the Twitter MCP server is installed and accessible")
|
|
print("2. Check if the server command is correct")
|
|
print("3. Ensure you have proper Twitter API credentials configured")
|
|
print("4. Verify your Twitter account has bookmarks to fetch")
|
|
print("5. Try running the MCP server command directly to test it")
|
|
return False
|
|
|
|
async def load_data(self, args) -> list[str]:
|
|
"""Load Twitter bookmarks via MCP server."""
|
|
print(f"Connecting to Twitter MCP server: {args.mcp_server}")
|
|
|
|
if args.username:
|
|
print(f"Username filter: @{args.username}")
|
|
|
|
print(f"Max bookmarks: {args.max_bookmarks}")
|
|
print(f"Include tweet content: {not args.no_tweet_content}")
|
|
print(f"Include metadata: {not args.no_metadata}")
|
|
|
|
try:
|
|
reader = TwitterMCPReader(
|
|
mcp_server_command=args.mcp_server,
|
|
username=args.username,
|
|
include_tweet_content=not args.no_tweet_content,
|
|
include_metadata=not args.no_metadata,
|
|
max_bookmarks=args.max_bookmarks,
|
|
)
|
|
|
|
texts = await reader.read_twitter_bookmarks()
|
|
|
|
if not texts:
|
|
print("❌ No bookmarks found! This could mean:")
|
|
print("- You don't have any bookmarks on Twitter")
|
|
print("- The MCP server couldn't access your bookmarks")
|
|
print("- Authentication issues with Twitter API")
|
|
print("- The username filter didn't match any bookmarks")
|
|
return []
|
|
|
|
print(f"✅ Successfully loaded {len(texts)} bookmarks from Twitter")
|
|
|
|
# Show sample of what was loaded
|
|
if texts:
|
|
sample_text = texts[0][:300] + "..." if len(texts[0]) > 300 else texts[0]
|
|
print("\nSample bookmark:")
|
|
print("-" * 50)
|
|
print(sample_text)
|
|
print("-" * 50)
|
|
|
|
return texts
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error loading Twitter bookmarks: {e}")
|
|
print("\nThis might be due to:")
|
|
print("- MCP server connection issues")
|
|
print("- Twitter API authentication problems")
|
|
print("- Network connectivity issues")
|
|
print("- Rate limiting from Twitter API")
|
|
raise
|
|
|
|
async def run(self):
|
|
"""Main entry point with MCP connection testing."""
|
|
args = self.parser.parse_args()
|
|
|
|
# Test connection if requested
|
|
if args.test_connection:
|
|
success = await self.test_mcp_connection(args)
|
|
if not success:
|
|
return
|
|
print(
|
|
"\n🎉 MCP server is working! You can now run without --test-connection to start indexing."
|
|
)
|
|
return
|
|
|
|
# Run the standard RAG pipeline
|
|
await super().run()
|
|
|
|
|
|
async def main():
|
|
"""Main entry point for the Twitter MCP RAG application."""
|
|
app = TwitterMCPRAG()
|
|
await app.run()
|
|
|
|
|
|
if __name__ == "__main__":
|
|
asyncio.run(main())
|