* feat: Add MCP integration support for Slack and Twitter - Implement SlackMCPReader for connecting to Slack MCP servers - Implement TwitterMCPReader for connecting to Twitter MCP servers - Add SlackRAG and TwitterRAG applications with full CLI support - Support live data fetching via Model Context Protocol (MCP) - Add comprehensive documentation and usage examples - Include connection testing capabilities with --test-connection flag - Add standalone tests for core functionality - Update README with detailed MCP integration guide - Add Aakash Suresh to Active Contributors Resolves #36 * fix: Resolve linting issues in MCP integration - Replace deprecated typing.Dict/List with built-in dict/list - Fix boolean comparisons (== True/False) to direct checks - Remove unused variables in demo script - Update type annotations to use modern Python syntax All pre-commit hooks should now pass. * fix: Apply final formatting fixes for pre-commit hooks - Remove unused imports (asyncio, pathlib.Path) - Remove unused class imports in demo script - Ensure all files pass ruff format and pre-commit checks This should resolve all remaining CI linting issues. * fix: Apply pre-commit formatting changes - Fix trailing whitespace in all files - Apply ruff formatting to match project standards - Ensure consistent code style across all MCP integration files This commit applies the exact changes that pre-commit hooks expect. * fix: Apply pre-commit hooks formatting fixes - Remove trailing whitespace from all files - Fix ruff formatting issues (2 errors resolved) - Apply consistent code formatting across 3 files - Ensure all files pass pre-commit validation This resolves all CI formatting failures. * fix: Update MCP RAG classes to match BaseRAGExample signature - Fix SlackMCPRAG and TwitterMCPRAG __init__ methods to provide required parameters - Add name, description, and default_index_name to super().__init__ calls - Resolves test failures: test_slack_rag_initialization and test_twitter_rag_initialization This fixes the TypeError caused by BaseRAGExample requiring additional parameters. * style: Apply ruff formatting - add trailing commas - Add trailing commas to super().__init__ calls in SlackMCPRAG and TwitterMCPRAG - Fixes ruff format pre-commit hook requirements * fix: Resolve SentenceTransformer model_kwargs parameter conflict - Fix local_files_only parameter conflict in embedding_compute.py - Create separate copies of model_kwargs and tokenizer_kwargs for local vs network loading - Prevents parameter conflicts when falling back from local to network loading - Resolves TypeError in test_readme_examples.py tests This addresses the SentenceTransformer initialization issues in CI tests. * fix: Add comprehensive SentenceTransformer version compatibility - Handle both old and new sentence-transformers versions - Gracefully fallback from advanced parameters to basic initialization - Catch TypeError for model_kwargs/tokenizer_kwargs and use basic SentenceTransformer init - Ensures compatibility across different CI environments and local setups - Maintains optimization benefits where supported while ensuring broad compatibility This resolves test failures in CI environments with older sentence-transformers versions. * style: Apply ruff formatting to embedding_compute.py - Break long logger.warning lines for better readability - Fixes pre-commit hook formatting requirements * docs: Comprehensive documentation improvements for better user experience - Add clear step-by-step Getting Started Guide for new users - Add comprehensive CLI Reference with all commands and options - Improve installation instructions with clear steps and verification - Add detailed troubleshooting section for common issues (Ollama, OpenAI, etc.) - Clarify difference between CLI commands and specialized apps - Add environment variables documentation - Improve MCP integration documentation with CLI integration examples - Address user feedback about confusing installation and setup process This resolves documentation gaps that made LEANN difficult for non-specialists to use. * style: Remove trailing whitespace from README.md - Fix trailing whitespace issues found by pre-commit hooks - Ensures consistent formatting across documentation * docs: Simplify README by removing excessive documentation - Remove overly complex CLI reference and getting started sections (lines 61-334) - Remove emojis from section headers for cleaner appearance - Keep README simple and focused as requested - Maintain essential MCP integration documentation This addresses feedback to keep documentation minimal and avoid auto-generated content. * docs: Address maintainer feedback on README improvements - Restore emojis in section headers (Prerequisites and Quick Install) - Add MCP live data feature mention in line 23 with links to Slack and Twitter - Add detailed API credential setup instructions for Slack: - Step-by-step Slack App creation process - Required OAuth scopes and permissions - Clear token identification (xoxb- vs xapp-) - Add detailed API credential setup instructions for Twitter: - Twitter Developer Account application process - API v2 requirements for bookmarks access - Required permissions and scopes This addresses maintainer feedback to make API setup more user-friendly.
207 lines
7.2 KiB
Python
207 lines
7.2 KiB
Python
#!/usr/bin/env python3
|
|
"""
|
|
Slack RAG Application with MCP Support
|
|
|
|
This application enables RAG (Retrieval-Augmented Generation) on Slack messages
|
|
by connecting to Slack MCP servers to fetch live data and index it in LEANN.
|
|
|
|
Usage:
|
|
python -m apps.slack_rag --mcp-server "slack-mcp-server" --query "What did the team discuss about the project?"
|
|
"""
|
|
|
|
import argparse
|
|
import asyncio
|
|
|
|
from apps.base_rag_example import BaseRAGExample
|
|
from apps.slack_data.slack_mcp_reader import SlackMCPReader
|
|
|
|
|
|
class SlackMCPRAG(BaseRAGExample):
|
|
"""
|
|
RAG application for Slack messages via MCP servers.
|
|
|
|
This class provides a complete RAG pipeline for Slack data, including
|
|
MCP server connection, data fetching, indexing, and interactive chat.
|
|
"""
|
|
|
|
def __init__(self):
|
|
super().__init__(
|
|
name="Slack MCP RAG",
|
|
description="RAG application for Slack messages via MCP servers",
|
|
default_index_name="slack_messages",
|
|
)
|
|
|
|
def _add_specific_arguments(self, parser: argparse.ArgumentParser):
|
|
"""Add Slack MCP-specific arguments."""
|
|
parser.add_argument(
|
|
"--mcp-server",
|
|
type=str,
|
|
required=True,
|
|
help="Command to start the Slack MCP server (e.g., 'slack-mcp-server' or 'npx slack-mcp-server')",
|
|
)
|
|
|
|
parser.add_argument(
|
|
"--workspace-name",
|
|
type=str,
|
|
help="Slack workspace name for better organization and filtering",
|
|
)
|
|
|
|
parser.add_argument(
|
|
"--channels",
|
|
nargs="+",
|
|
help="Specific Slack channels to index (e.g., general random). If not specified, fetches from all available channels",
|
|
)
|
|
|
|
parser.add_argument(
|
|
"--concatenate-conversations",
|
|
action="store_true",
|
|
default=True,
|
|
help="Group messages by channel/thread for better context (default: True)",
|
|
)
|
|
|
|
parser.add_argument(
|
|
"--no-concatenate-conversations",
|
|
action="store_true",
|
|
help="Process individual messages instead of grouping by channel",
|
|
)
|
|
|
|
parser.add_argument(
|
|
"--max-messages-per-channel",
|
|
type=int,
|
|
default=100,
|
|
help="Maximum number of messages to include per channel (default: 100)",
|
|
)
|
|
|
|
parser.add_argument(
|
|
"--test-connection",
|
|
action="store_true",
|
|
help="Test MCP server connection and list available tools without indexing",
|
|
)
|
|
|
|
async def test_mcp_connection(self, args) -> bool:
|
|
"""Test the MCP server connection and display available tools."""
|
|
print(f"Testing connection to MCP server: {args.mcp_server}")
|
|
|
|
try:
|
|
reader = SlackMCPReader(
|
|
mcp_server_command=args.mcp_server,
|
|
workspace_name=args.workspace_name,
|
|
concatenate_conversations=not args.no_concatenate_conversations,
|
|
max_messages_per_conversation=args.max_messages_per_channel,
|
|
)
|
|
|
|
async with reader:
|
|
tools = await reader.list_available_tools()
|
|
|
|
print("\n✅ Successfully connected to MCP server!")
|
|
print(f"Available tools ({len(tools)}):")
|
|
|
|
for i, tool in enumerate(tools, 1):
|
|
name = tool.get("name", "Unknown")
|
|
description = tool.get("description", "No description available")
|
|
print(f"\n{i}. {name}")
|
|
print(
|
|
f" Description: {description[:100]}{'...' if len(description) > 100 else ''}"
|
|
)
|
|
|
|
# Show input schema if available
|
|
schema = tool.get("inputSchema", {})
|
|
if schema.get("properties"):
|
|
props = list(schema["properties"].keys())[:3] # Show first 3 properties
|
|
print(
|
|
f" Parameters: {', '.join(props)}{'...' if len(schema['properties']) > 3 else ''}"
|
|
)
|
|
|
|
return True
|
|
|
|
except Exception as e:
|
|
print(f"\n❌ Failed to connect to MCP server: {e}")
|
|
print("\nTroubleshooting tips:")
|
|
print("1. Make sure the MCP server is installed and accessible")
|
|
print("2. Check if the server command is correct")
|
|
print("3. Ensure you have proper authentication/credentials configured")
|
|
print("4. Try running the MCP server command directly to test it")
|
|
return False
|
|
|
|
async def load_data(self, args) -> list[str]:
|
|
"""Load Slack messages via MCP server."""
|
|
print(f"Connecting to Slack MCP server: {args.mcp_server}")
|
|
|
|
if args.workspace_name:
|
|
print(f"Workspace: {args.workspace_name}")
|
|
|
|
if args.channels:
|
|
print(f"Channels: {', '.join(args.channels)}")
|
|
else:
|
|
print("Fetching from all available channels")
|
|
|
|
concatenate = not args.no_concatenate_conversations
|
|
print(
|
|
f"Processing mode: {'Concatenated conversations' if concatenate else 'Individual messages'}"
|
|
)
|
|
|
|
try:
|
|
reader = SlackMCPReader(
|
|
mcp_server_command=args.mcp_server,
|
|
workspace_name=args.workspace_name,
|
|
concatenate_conversations=concatenate,
|
|
max_messages_per_conversation=args.max_messages_per_channel,
|
|
)
|
|
|
|
texts = await reader.read_slack_data(channels=args.channels)
|
|
|
|
if not texts:
|
|
print("❌ No messages found! This could mean:")
|
|
print("- The MCP server couldn't fetch messages")
|
|
print("- The specified channels don't exist or are empty")
|
|
print("- Authentication issues with the Slack workspace")
|
|
return []
|
|
|
|
print(f"✅ Successfully loaded {len(texts)} text chunks from Slack")
|
|
|
|
# Show sample of what was loaded
|
|
if texts:
|
|
sample_text = texts[0][:200] + "..." if len(texts[0]) > 200 else texts[0]
|
|
print("\nSample content:")
|
|
print("-" * 40)
|
|
print(sample_text)
|
|
print("-" * 40)
|
|
|
|
return texts
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error loading Slack data: {e}")
|
|
print("\nThis might be due to:")
|
|
print("- MCP server connection issues")
|
|
print("- Authentication problems")
|
|
print("- Network connectivity issues")
|
|
print("- Incorrect channel names")
|
|
raise
|
|
|
|
async def run(self):
|
|
"""Main entry point with MCP connection testing."""
|
|
args = self.parser.parse_args()
|
|
|
|
# Test connection if requested
|
|
if args.test_connection:
|
|
success = await self.test_mcp_connection(args)
|
|
if not success:
|
|
return
|
|
print(
|
|
"\n🎉 MCP server is working! You can now run without --test-connection to start indexing."
|
|
)
|
|
return
|
|
|
|
# Run the standard RAG pipeline
|
|
await super().run()
|
|
|
|
|
|
async def main():
|
|
"""Main entry point for the Slack MCP RAG application."""
|
|
app = SlackMCPRAG()
|
|
await app.run()
|
|
|
|
|
|
if __name__ == "__main__":
|
|
asyncio.run(main())
|