Fix/twitter bookmarks anchor link (#143)

* fix: Fix Twitter bookmarks anchor link - Convert Twitter Bookmarks from collapsible details to proper header - Update internal link to match new anchor format - Ensures external links to #twitter-bookmarks-your-personal-tweet-library work correctly Fixes broken link: https://github.com/yichuan-w/LEANN?tab=readme-ov-file#twitter-bookmarks-your-personal-tweet-library * fix: Fix Slack messages anchor link as well - Convert Slack Messages from collapsible details to proper header - Update internal link to match new anchor format - Ensures external links to #slack-messages-search-your-team-conversations work correctly Both Twitter and Slack MCP sections now have reliable anchor links. * fix: Point Slack and Twitter links to main MCP section - Both Slack and Twitter are subsections under MCP Integration - Links should point to #mcp-integration-rag-on-live-data-from-any-platform - Users will land on the MCP section and can find both Slack and Twitter subsections there This matches the actual document structure where Slack and Twitter are under the MCP Integration section. * Improve Slack MCP integration with retry logic and comprehensive setup guide - Add retry mechanism with exponential backoff for cache sync issues - Handle 'users cache is not ready yet' errors gracefully - Add max-retries and retry-delay CLI arguments for better control - Create comprehensive Slack setup guide with troubleshooting - Update README with link to detailed setup guide - Improve error messages and user experience * Fix trailing whitespace in slack setup guide Pre-commit hooks formatting fixes * Add comprehensive Slack setup guide with success screenshot - Create detailed setup guide with step-by-step instructions - Add troubleshooting section for common issues like cache sync errors - Include real terminal output example from successful integration - Add screenshot showing VS Code interface with Slack channel data - Remove excessive emojis for more professional documentation - Document retry logic improvements and CLI arguments * Fix formatting issues in Slack setup guide - Remove trailing whitespace - Fix end of file formatting - Pre-commit hooks formatting fixes * Add real RAG example showing intelligent Slack query functionality - Add detailed example of asking 'What is LEANN about?' - Show retrieved messages from Slack channels - Demonstrate intelligent answer generation based on context - Add command example for running real RAG queries - Explain the 4-step process: retrieve, index, generate, cite * Update Slack setup guide with bot invitation requirements - Add important section about inviting bot to channels before RAG queries - Explain the 'not_in_channel' errors and their meaning - Provide clear steps for bot invitation process - Document realistic scenario where bot needs explicit channel access - Update documentation to be more professional and less cursor-style * Docs: add real RAG example for Sky Lab #random - Embed screenshot videos/rag-sky-random.png - Add step-by-step commands and notes - Include helper test script tests/test_channel_by_id_or_name.py - Redact example tokens from docs * Docs/CI: fix broken image paths and ruff lint\n\n- Move screenshot to docs/videos and update references\n- Remove obsolete rag-query-results image\n- Rename variable to satisfy ruff * Docs: fix image path for lychee (use videos/ relative under docs/) * Docs: finalize Slack setup guide with Sky random RAG example and image path fixes\n\n- Redact example tokens from docs * Fix Slack MCP integration and update documentation - Fix SlackMCPReader to use conversations_history instead of channels_list - Add fallback imports for leann.interactive_utils and leann.settings - Update slack-setup-guide.md with real screenshots and improved text - Remove old screenshot files * Add Slack integration screenshots to docs/videos - Add slack_integration.png showing RAG query results - Add slack_integration_2.png showing additional demo functionality - Fixes lychee link checker errors for missing image files * Update Slack integration screenshot with latest changes * Remove test_channel_by_id_or_name.py - Clean up temporary test file that was used for debugging - Keep only the main slack_rag.py application for production use * Update Slack RAG example to show LEANN announcement retrieval - Change query from 'PUBPOL 290' to 'What is LEANN about?' for more challenging retrieval - Update command to use python -m apps.slack_rag instead of test script - Add expected response showing Yichuan Wang's LEANN announcement message - Emphasize this demonstrates ability to find specific announcements in conversation history - Update description to highlight challenging query capabilities * Update Slack RAG integration with improved CSV parsing and new screenshots - Fixed CSV message parsing in slack_mcp_reader.py to properly handle individual messages - Updated slack_rag.py to filter empty channel strings - Enhanced slack-setup-guide.md with two new query examples: - Advisor Models query: 'train black-box models to adopt to your personal data' - Barbarians at the Gate query: 'AI-driven research systems ADRS' - Replaced old screenshots with four new ones showing both query examples - Updated documentation to use User OAuth Token (xoxp-) instead of Bot Token (xoxb-) - Added proper command examples with --no-concatenate-conversations and --force-rebuild flags * Update Slack RAG documentation with Ollama integration and new screenshots - Updated slack-setup-guide.md with comprehensive Ollama setup instructions - Added 6 new screenshots showing complete RAG workflow: - Command setup, search results, and LLM responses for both queries - Removed simulated LLM references, now uses real Ollama with llama3.2:1b - Enhanced documentation with step-by-step Ollama installation - Updated troubleshooting checklist to include Ollama-specific checks - Fixed command syntax and added proper Ollama configuration - Demonstrates working Slack RAG with real AI-generated responses * Remove Key Features section from Slack RAG examples - Simplified documentation by removing the bullet point list - Keeps the focus on the actual examples and screenshots
2025-10-19 11:47:29 -07:00
parent 28085f6f04
commit ab251ab751
11 changed files with 684 additions and 61 deletions
@@ -10,9 +10,39 @@ from typing import Any

 import dotenv
 from leann.api import LeannBuilder, LeannChat
-from leann.interactive_utils import create_rag_session
+
+# Optional import: older PyPI builds may not include interactive_utils
+try:
+    from leann.interactive_utils import create_rag_session
+except ImportError:
+
+    def create_rag_session(app_name: str, data_description: str):
+        class _SimpleSession:
+            def run_interactive_loop(self, handler):
+                print(f"Interactive session for {app_name}: {data_description}")
+                print("Interactive mode not available in this build")
+
+        return _SimpleSession()
+
+
 from leann.registry import register_project_directory
-from leann.settings import resolve_ollama_host, resolve_openai_api_key, resolve_openai_base_url
+
+# Optional import: older PyPI builds may not include settings
+try:
+    from leann.settings import resolve_ollama_host, resolve_openai_api_key, resolve_openai_base_url
+except ImportError:
+    # Minimal fallbacks if settings helpers are unavailable
+    import os
+
+    def resolve_ollama_host(value: str | None) -> str | None:
+        return value or os.getenv("LEANN_OLLAMA_HOST") or os.getenv("OLLAMA_HOST")
+
+    def resolve_openai_api_key(value: str | None) -> str | None:
+        return value or os.getenv("OPENAI_API_KEY")
+
+    def resolve_openai_base_url(value: str | None) -> str | None:
+        return value or os.getenv("OPENAI_BASE_URL")
+

 dotenv.load_dotenv()

@@ -29,6 +29,8 @@ class SlackMCPReader:
        workspace_name: Optional[str] = None,
        concatenate_conversations: bool = True,
        max_messages_per_conversation: int = 100,
+        max_retries: int = 5,
+        retry_delay: float = 2.0,
    ):
        """
        Initialize the Slack MCP Reader.
@@ -38,11 +40,15 @@ class SlackMCPReader:
            workspace_name: Optional workspace name to filter messages
            concatenate_conversations: Whether to group messages by channel/thread
            max_messages_per_conversation: Maximum messages to include per conversation
+            max_retries: Maximum number of retries for failed operations
+            retry_delay: Initial delay between retries in seconds
        """
        self.mcp_server_command = mcp_server_command
        self.workspace_name = workspace_name
        self.concatenate_conversations = concatenate_conversations
        self.max_messages_per_conversation = max_messages_per_conversation
+        self.max_retries = max_retries
+        self.retry_delay = retry_delay
        self.mcp_process = None

    async def start_mcp_server(self):
@@ -110,11 +116,73 @@ class SlackMCPReader:

        return response.get("result", {}).get("tools", [])

+    def _is_cache_sync_error(self, error: dict) -> bool:
+        """Check if the error is related to users cache not being ready."""
+        if isinstance(error, dict):
+            message = error.get("message", "").lower()
+            return (
+                "users cache is not ready" in message or "sync process is still running" in message
+            )
+        return False
+
+    async def _retry_with_backoff(self, func, *args, **kwargs):
+        """Retry a function with exponential backoff, especially for cache sync issues."""
+        last_exception = None
+
+        for attempt in range(self.max_retries + 1):
+            try:
+                return await func(*args, **kwargs)
+            except Exception as e:
+                last_exception = e
+
+                # Check if this is a cache sync error
+                error_dict = {}
+                if hasattr(e, "args") and e.args and isinstance(e.args[0], dict):
+                    error_dict = e.args[0]
+                elif "Failed to fetch messages" in str(e):
+                    # Try to extract error from the exception message
+                    import re
+
+                    match = re.search(r"'error':\s*(\{[^}]+\})", str(e))
+                    if match:
+                        try:
+                            error_dict = eval(match.group(1))
+                        except (ValueError, SyntaxError, NameError):
+                            pass
+                    else:
+                        # Try alternative format
+                        match = re.search(r"Failed to fetch messages:\s*(\{[^}]+\})", str(e))
+                        if match:
+                            try:
+                                error_dict = eval(match.group(1))
+                            except (ValueError, SyntaxError, NameError):
+                                pass
+
+                if self._is_cache_sync_error(error_dict):
+                    if attempt < self.max_retries:
+                        delay = self.retry_delay * (2**attempt)  # Exponential backoff
+                        logger.info(
+                            f"Cache sync not ready, waiting {delay:.1f}s before retry {attempt + 1}/{self.max_retries}"
+                        )
+                        await asyncio.sleep(delay)
+                        continue
+                    else:
+                        logger.warning(
+                            f"Cache sync still not ready after {self.max_retries} retries, giving up"
+                        )
+                        break
+                else:
+                    # Not a cache sync error, don't retry
+                    break
+
+        # If we get here, all retries failed or it's not a retryable error
+        raise last_exception
+
    async def fetch_slack_messages(
        self, channel: Optional[str] = None, limit: int = 100
    ) -> list[dict[str, Any]]:
        """
-        Fetch Slack messages using MCP tools.
+        Fetch Slack messages using MCP tools with retry logic for cache sync issues.

        Args:
            channel: Optional channel name to filter messages
@@ -123,32 +191,59 @@ class SlackMCPReader:
        Returns:
            List of message dictionaries
        """
+        return await self._retry_with_backoff(self._fetch_slack_messages_impl, channel, limit)
+
+    async def _fetch_slack_messages_impl(
+        self, channel: Optional[str] = None, limit: int = 100
+    ) -> list[dict[str, Any]]:
+        """
+        Internal implementation of fetch_slack_messages without retry logic.
+        """
        # This is a generic implementation - specific MCP servers may have different tool names
        # Common tool names might be: 'get_messages', 'list_messages', 'fetch_channel_history'

        tools = await self.list_available_tools()
+        logger.info(f"Available tools: {[tool.get('name') for tool in tools]}")
        message_tool = None

-        # Look for a tool that can fetch messages
+        # Look for a tool that can fetch messages - prioritize conversations_history
+        message_tool = None
+
+        # First, try to find conversations_history specifically
        for tool in tools:
            tool_name = tool.get("name", "").lower()
-            if any(
-                keyword in tool_name
-                for keyword in ["message", "history", "channel", "conversation"]
-            ):
+            if "conversations_history" in tool_name:
                message_tool = tool
+                logger.info(f"Found conversations_history tool: {tool}")
                break

+        # If not found, look for other message-fetching tools
+        if not message_tool:
+            for tool in tools:
+                tool_name = tool.get("name", "").lower()
+                if any(
+                    keyword in tool_name
+                    for keyword in ["conversations_search", "message", "history"]
+                ):
+                    message_tool = tool
+                    break
+
        if not message_tool:
            raise RuntimeError("No message fetching tool found in MCP server")

        # Prepare tool call parameters
-        tool_params = {"limit": limit}
+        tool_params = {"limit": "180d"}  # Use 180 days to get older messages
        if channel:
-            # Try common parameter names for channel specification
-            for param_name in ["channel", "channel_id", "channel_name"]:
-                tool_params[param_name] = channel
-                break
+            # For conversations_history, use channel_id parameter
+            if message_tool["name"] == "conversations_history":
+                tool_params["channel_id"] = channel
+            else:
+                # Try common parameter names for channel specification
+                for param_name in ["channel", "channel_id", "channel_name"]:
+                    tool_params[param_name] = channel
+                    break
+
+        logger.info(f"Tool parameters: {tool_params}")

        fetch_request = {
            "jsonrpc": "2.0",
@@ -170,8 +265,8 @@ class SlackMCPReader:
                try:
                    messages = json.loads(content["text"])
                except json.JSONDecodeError:
-                    # If not JSON, treat as plain text
-                    messages = [{"text": content["text"], "channel": channel or "unknown"}]
+                    # If not JSON, try to parse as CSV format (Slack MCP server format)
+                    messages = self._parse_csv_messages(content["text"], channel)
            else:
                messages = result["content"]
        else:
@@ -180,6 +275,56 @@ class SlackMCPReader:

        return messages if isinstance(messages, list) else [messages]

+    def _parse_csv_messages(self, csv_text: str, channel: str) -> list[dict[str, Any]]:
+        """Parse CSV format messages from Slack MCP server."""
+        import csv
+        import io
+
+        messages = []
+        try:
+            # Split by lines and process each line as a CSV row
+            lines = csv_text.strip().split("\n")
+            if not lines:
+                return messages
+
+            # Skip header line if it exists
+            start_idx = 0
+            if lines[0].startswith("MsgID,UserID,UserName"):
+                start_idx = 1
+
+            for line in lines[start_idx:]:
+                if not line.strip():
+                    continue
+
+                # Parse CSV line
+                reader = csv.reader(io.StringIO(line))
+                try:
+                    row = next(reader)
+                    if len(row) >= 7:  # Ensure we have enough columns
+                        message = {
+                            "ts": row[0],
+                            "user": row[1],
+                            "username": row[2],
+                            "real_name": row[3],
+                            "channel": row[4],
+                            "thread_ts": row[5],
+                            "text": row[6],
+                            "time": row[7] if len(row) > 7 else "",
+                            "reactions": row[8] if len(row) > 8 else "",
+                            "cursor": row[9] if len(row) > 9 else "",
+                        }
+                        messages.append(message)
+                except Exception as e:
+                    logger.warning(f"Failed to parse CSV line: {line[:100]}... Error: {e}")
+                    continue
+
+        except Exception as e:
+            logger.warning(f"Failed to parse CSV messages: {e}")
+            # Fallback: treat entire text as one message
+            messages = [{"text": csv_text, "channel": channel or "unknown"}]
+
+        return messages
+
    def _format_message(self, message: dict[str, Any]) -> str:
        """Format a single message for indexing."""
        text = message.get("text", "")
@@ -251,6 +396,40 @@ class SlackMCPReader:

        return "\n".join(content_parts)

+    async def get_all_channels(self) -> list[str]:
+        """Get list of all available channels."""
+        try:
+            channels_list_request = {
+                "jsonrpc": "2.0",
+                "id": 4,
+                "method": "tools/call",
+                "params": {"name": "channels_list", "arguments": {}},
+            }
+            channels_response = await self.send_mcp_request(channels_list_request)
+            if "result" in channels_response:
+                result = channels_response["result"]
+                if "content" in result and isinstance(result["content"], list):
+                    content = result["content"][0] if result["content"] else {}
+                    if "text" in content:
+                        # Parse the channels from the response
+                        channels = []
+                        lines = content["text"].split("\n")
+                        for line in lines:
+                            if line.strip() and ("#" in line or "C" in line[:10]):
+                                # Extract channel ID or name
+                                parts = line.split()
+                                for part in parts:
+                                    if part.startswith("C") and len(part) > 5:
+                                        channels.append(part)
+                                    elif part.startswith("#"):
+                                        channels.append(part[1:])  # Remove #
+                        logger.info(f"Found {len(channels)} channels: {channels}")
+                        return channels
+            return []
+        except Exception as e:
+            logger.warning(f"Failed to get channels list: {e}")
+            return []
+
    async def read_slack_data(self, channels: Optional[list[str]] = None) -> list[str]:
        """
        Read Slack data and return formatted text chunks.
@@ -287,36 +466,33 @@ class SlackMCPReader:
                        logger.warning(f"Failed to fetch messages from channel {channel}: {e}")
                        continue
            else:
-                # Fetch from all available channels/conversations
-                # This is a simplified approach - real implementation would need to
-                # discover available channels first
-                try:
-                    messages = await self.fetch_slack_messages(limit=1000)
-                    if messages:
-                        # Group messages by channel if concatenating
-                        if self.concatenate_conversations:
-                            channel_messages = {}
-                            for message in messages:
-                                channel = message.get(
-                                    "channel", message.get("channel_name", "general")
-                                )
-                                if channel not in channel_messages:
-                                    channel_messages[channel] = []
-                                channel_messages[channel].append(message)
+                # Fetch from all available channels
+                logger.info("Fetching from all available channels...")
+                all_channels = await self.get_all_channels()

-                            # Create concatenated content for each channel
-                            for channel, msgs in channel_messages.items():
-                                text_content = self._create_concatenated_content(msgs, channel)
+                if not all_channels:
+                    # Fallback to common channel names if we can't get the list
+                    all_channels = ["general", "random", "announcements", "C0GN5BX0F"]
+                    logger.info(f"Using fallback channels: {all_channels}")
+
+                for channel in all_channels:
+                    try:
+                        logger.info(f"Searching channel: {channel}")
+                        messages = await self.fetch_slack_messages(channel=channel, limit=1000)
+                        if messages:
+                            if self.concatenate_conversations:
+                                text_content = self._create_concatenated_content(messages, channel)
                                if text_content.strip():
                                    all_texts.append(text_content)
-                        else:
-                            # Process individual messages
-                            for message in messages:
-                                formatted_msg = self._format_message(message)
-                                if formatted_msg.strip():
-                                    all_texts.append(formatted_msg)
-                except Exception as e:
-                    logger.error(f"Failed to fetch messages: {e}")
+                            else:
+                                # Process individual messages
+                                for message in messages:
+                                    formatted_msg = self._format_message(message)
+                                    if formatted_msg.strip():
+                                        all_texts.append(formatted_msg)
+                    except Exception as e:
+                        logger.warning(f"Failed to fetch messages from channel {channel}: {e}")
+                        continue

            return all_texts

@@ -78,6 +78,20 @@ class SlackMCPRAG(BaseRAGExample):
            help="Test MCP server connection and list available tools without indexing",
        )

+        parser.add_argument(
+            "--max-retries",
+            type=int,
+            default=5,
+            help="Maximum number of retries for failed operations (default: 5)",
+        )
+
+        parser.add_argument(
+            "--retry-delay",
+            type=float,
+            default=2.0,
+            help="Initial delay between retries in seconds (default: 2.0)",
+        )
+
    async def test_mcp_connection(self, args) -> bool:
        """Test the MCP server connection and display available tools."""
        print(f"Testing connection to MCP server: {args.mcp_server}")
@@ -88,12 +102,14 @@ class SlackMCPRAG(BaseRAGExample):
                workspace_name=args.workspace_name,
                concatenate_conversations=not args.no_concatenate_conversations,
                max_messages_per_conversation=args.max_messages_per_channel,
+                max_retries=args.max_retries,
+                retry_delay=args.retry_delay,
            )

            async with reader:
                tools = await reader.list_available_tools()

-                print("\n✅ Successfully connected to MCP server!")
+                print("Successfully connected to MCP server!")
                print(f"Available tools ({len(tools)}):")

                for i, tool in enumerate(tools, 1):
@@ -115,7 +131,7 @@ class SlackMCPRAG(BaseRAGExample):
                return True

        except Exception as e:
-            print(f"\n❌ Failed to connect to MCP server: {e}")
+            print(f"Failed to connect to MCP server: {e}")
            print("\nTroubleshooting tips:")
            print("1. Make sure the MCP server is installed and accessible")
            print("2. Check if the server command is correct")
@@ -130,8 +146,11 @@ class SlackMCPRAG(BaseRAGExample):
        if args.workspace_name:
            print(f"Workspace: {args.workspace_name}")

-        if args.channels:
-            print(f"Channels: {', '.join(args.channels)}")
+        # Filter out empty strings from channels
+        channels = [ch for ch in args.channels if ch.strip()] if args.channels else None
+
+        if channels:
+            print(f"Channels: {', '.join(channels)}")
        else:
            print("Fetching from all available channels")

@@ -146,18 +165,20 @@ class SlackMCPRAG(BaseRAGExample):
                workspace_name=args.workspace_name,
                concatenate_conversations=concatenate,
                max_messages_per_conversation=args.max_messages_per_channel,
+                max_retries=args.max_retries,
+                retry_delay=args.retry_delay,
            )

-            texts = await reader.read_slack_data(channels=args.channels)
+            texts = await reader.read_slack_data(channels=channels)

            if not texts:
-                print("❌ No messages found! This could mean:")
+                print("No messages found! This could mean:")
                print("- The MCP server couldn't fetch messages")
                print("- The specified channels don't exist or are empty")
                print("- Authentication issues with the Slack workspace")
                return []

-            print(f"✅ Successfully loaded {len(texts)} text chunks from Slack")
+            print(f"Successfully loaded {len(texts)} text chunks from Slack")

            # Show sample of what was loaded
            if texts:
@@ -170,7 +191,7 @@ class SlackMCPRAG(BaseRAGExample):
            return texts

        except Exception as e:
-            print(f"❌ Error loading Slack data: {e}")
+            print(f"Error loading Slack data: {e}")
            print("\nThis might be due to:")
            print("- MCP server connection issues")
            print("- Authentication problems")
@@ -188,7 +209,7 @@ class SlackMCPRAG(BaseRAGExample):
            if not success:
                return
            print(
-                "\n🎉 MCP server is working! You can now run without --test-connection to start indexing."
+                "MCP server is working! You can now run without --test-connection to start indexing."
            )
            return