Fix/twitter bookmarks anchor link (#143)

* fix: Fix Twitter bookmarks anchor link

- Convert Twitter Bookmarks from collapsible details to proper header
- Update internal link to match new anchor format
- Ensures external links to #twitter-bookmarks-your-personal-tweet-library work correctly

Fixes broken link: https://github.com/yichuan-w/LEANN?tab=readme-ov-file#twitter-bookmarks-your-personal-tweet-library

* fix: Fix Slack messages anchor link as well

- Convert Slack Messages from collapsible details to proper header
- Update internal link to match new anchor format
- Ensures external links to #slack-messages-search-your-team-conversations work correctly

Both Twitter and Slack MCP sections now have reliable anchor links.

* fix: Point Slack and Twitter links to main MCP section

- Both Slack and Twitter are subsections under MCP Integration
- Links should point to #mcp-integration-rag-on-live-data-from-any-platform
- Users will land on the MCP section and can find both Slack and Twitter subsections there

This matches the actual document structure where Slack and Twitter are under the MCP Integration section.

* Improve Slack MCP integration with retry logic and comprehensive setup guide

- Add retry mechanism with exponential backoff for cache sync issues
- Handle 'users cache is not ready yet' errors gracefully
- Add max-retries and retry-delay CLI arguments for better control
- Create comprehensive Slack setup guide with troubleshooting
- Update README with link to detailed setup guide
- Improve error messages and user experience

* Fix trailing whitespace in slack setup guide

Pre-commit hooks formatting fixes

* Add comprehensive Slack setup guide with success screenshot

- Create detailed setup guide with step-by-step instructions
- Add troubleshooting section for common issues like cache sync errors
- Include real terminal output example from successful integration
- Add screenshot showing VS Code interface with Slack channel data
- Remove excessive emojis for more professional documentation
- Document retry logic improvements and CLI arguments

* Fix formatting issues in Slack setup guide

- Remove trailing whitespace
- Fix end of file formatting
- Pre-commit hooks formatting fixes

* Add real RAG example showing intelligent Slack query functionality

- Add detailed example of asking 'What is LEANN about?'
- Show retrieved messages from Slack channels
- Demonstrate intelligent answer generation based on context
- Add command example for running real RAG queries
- Explain the 4-step process: retrieve, index, generate, cite

* Update Slack setup guide with bot invitation requirements

- Add important section about inviting bot to channels before RAG queries
- Explain the 'not_in_channel' errors and their meaning
- Provide clear steps for bot invitation process
- Document realistic scenario where bot needs explicit channel access
- Update documentation to be more professional and less cursor-style

* Docs: add real RAG example for Sky Lab #random

- Embed screenshot videos/rag-sky-random.png
- Add step-by-step commands and notes
- Include helper test script tests/test_channel_by_id_or_name.py
- Redact example tokens from docs

* Docs/CI: fix broken image paths and ruff lint\n\n- Move screenshot to docs/videos and update references\n- Remove obsolete rag-query-results image\n- Rename variable to satisfy ruff

* Docs: fix image path for lychee (use videos/ relative under docs/)

* Docs: finalize Slack setup guide with Sky random RAG example and image path fixes\n\n- Redact example tokens from docs

* Fix Slack MCP integration and update documentation

- Fix SlackMCPReader to use conversations_history instead of channels_list
- Add fallback imports for leann.interactive_utils and leann.settings
- Update slack-setup-guide.md with real screenshots and improved text
- Remove old screenshot files

* Add Slack integration screenshots to docs/videos

- Add slack_integration.png showing RAG query results
- Add slack_integration_2.png showing additional demo functionality
- Fixes lychee link checker errors for missing image files

* Update Slack integration screenshot with latest changes

* Remove test_channel_by_id_or_name.py

- Clean up temporary test file that was used for debugging
- Keep only the main slack_rag.py application for production use

* Update Slack RAG example to show LEANN announcement retrieval

- Change query from 'PUBPOL 290' to 'What is LEANN about?' for more challenging retrieval
- Update command to use python -m apps.slack_rag instead of test script
- Add expected response showing Yichuan Wang's LEANN announcement message
- Emphasize this demonstrates ability to find specific announcements in conversation history
- Update description to highlight challenging query capabilities

* Update Slack RAG integration with improved CSV parsing and new screenshots

- Fixed CSV message parsing in slack_mcp_reader.py to properly handle individual messages
- Updated slack_rag.py to filter empty channel strings
- Enhanced slack-setup-guide.md with two new query examples:
  - Advisor Models query: 'train black-box models to adopt to your personal data'
  - Barbarians at the Gate query: 'AI-driven research systems ADRS'
- Replaced old screenshots with four new ones showing both query examples
- Updated documentation to use User OAuth Token (xoxp-) instead of Bot Token (xoxb-)
- Added proper command examples with --no-concatenate-conversations and --force-rebuild flags

* Update Slack RAG documentation with Ollama integration and new screenshots

- Updated slack-setup-guide.md with comprehensive Ollama setup instructions
- Added 6 new screenshots showing complete RAG workflow:
  - Command setup, search results, and LLM responses for both queries
- Removed simulated LLM references, now uses real Ollama with llama3.2:1b
- Enhanced documentation with step-by-step Ollama installation
- Updated troubleshooting checklist to include Ollama-specific checks
- Fixed command syntax and added proper Ollama configuration
- Demonstrates working Slack RAG with real AI-generated responses

* Remove Key Features section from Slack RAG examples

- Simplified documentation by removing the bullet point list
- Keeps the focus on the actual examples and screenshots
This commit is contained in:
Aakash Suresh
2025-10-19 11:47:29 -07:00
committed by GitHub
parent 28085f6f04
commit ab251ab751
11 changed files with 684 additions and 61 deletions

View File

@@ -10,9 +10,39 @@ from typing import Any
import dotenv
from leann.api import LeannBuilder, LeannChat
from leann.interactive_utils import create_rag_session
# Optional import: older PyPI builds may not include interactive_utils
try:
from leann.interactive_utils import create_rag_session
except ImportError:
def create_rag_session(app_name: str, data_description: str):
class _SimpleSession:
def run_interactive_loop(self, handler):
print(f"Interactive session for {app_name}: {data_description}")
print("Interactive mode not available in this build")
return _SimpleSession()
from leann.registry import register_project_directory
from leann.settings import resolve_ollama_host, resolve_openai_api_key, resolve_openai_base_url
# Optional import: older PyPI builds may not include settings
try:
from leann.settings import resolve_ollama_host, resolve_openai_api_key, resolve_openai_base_url
except ImportError:
# Minimal fallbacks if settings helpers are unavailable
import os
def resolve_ollama_host(value: str | None) -> str | None:
return value or os.getenv("LEANN_OLLAMA_HOST") or os.getenv("OLLAMA_HOST")
def resolve_openai_api_key(value: str | None) -> str | None:
return value or os.getenv("OPENAI_API_KEY")
def resolve_openai_base_url(value: str | None) -> str | None:
return value or os.getenv("OPENAI_BASE_URL")
dotenv.load_dotenv()

View File

@@ -29,6 +29,8 @@ class SlackMCPReader:
workspace_name: Optional[str] = None,
concatenate_conversations: bool = True,
max_messages_per_conversation: int = 100,
max_retries: int = 5,
retry_delay: float = 2.0,
):
"""
Initialize the Slack MCP Reader.
@@ -38,11 +40,15 @@ class SlackMCPReader:
workspace_name: Optional workspace name to filter messages
concatenate_conversations: Whether to group messages by channel/thread
max_messages_per_conversation: Maximum messages to include per conversation
max_retries: Maximum number of retries for failed operations
retry_delay: Initial delay between retries in seconds
"""
self.mcp_server_command = mcp_server_command
self.workspace_name = workspace_name
self.concatenate_conversations = concatenate_conversations
self.max_messages_per_conversation = max_messages_per_conversation
self.max_retries = max_retries
self.retry_delay = retry_delay
self.mcp_process = None
async def start_mcp_server(self):
@@ -110,11 +116,73 @@ class SlackMCPReader:
return response.get("result", {}).get("tools", [])
def _is_cache_sync_error(self, error: dict) -> bool:
"""Check if the error is related to users cache not being ready."""
if isinstance(error, dict):
message = error.get("message", "").lower()
return (
"users cache is not ready" in message or "sync process is still running" in message
)
return False
async def _retry_with_backoff(self, func, *args, **kwargs):
"""Retry a function with exponential backoff, especially for cache sync issues."""
last_exception = None
for attempt in range(self.max_retries + 1):
try:
return await func(*args, **kwargs)
except Exception as e:
last_exception = e
# Check if this is a cache sync error
error_dict = {}
if hasattr(e, "args") and e.args and isinstance(e.args[0], dict):
error_dict = e.args[0]
elif "Failed to fetch messages" in str(e):
# Try to extract error from the exception message
import re
match = re.search(r"'error':\s*(\{[^}]+\})", str(e))
if match:
try:
error_dict = eval(match.group(1))
except (ValueError, SyntaxError, NameError):
pass
else:
# Try alternative format
match = re.search(r"Failed to fetch messages:\s*(\{[^}]+\})", str(e))
if match:
try:
error_dict = eval(match.group(1))
except (ValueError, SyntaxError, NameError):
pass
if self._is_cache_sync_error(error_dict):
if attempt < self.max_retries:
delay = self.retry_delay * (2**attempt) # Exponential backoff
logger.info(
f"Cache sync not ready, waiting {delay:.1f}s before retry {attempt + 1}/{self.max_retries}"
)
await asyncio.sleep(delay)
continue
else:
logger.warning(
f"Cache sync still not ready after {self.max_retries} retries, giving up"
)
break
else:
# Not a cache sync error, don't retry
break
# If we get here, all retries failed or it's not a retryable error
raise last_exception
async def fetch_slack_messages(
self, channel: Optional[str] = None, limit: int = 100
) -> list[dict[str, Any]]:
"""
Fetch Slack messages using MCP tools.
Fetch Slack messages using MCP tools with retry logic for cache sync issues.
Args:
channel: Optional channel name to filter messages
@@ -123,32 +191,59 @@ class SlackMCPReader:
Returns:
List of message dictionaries
"""
return await self._retry_with_backoff(self._fetch_slack_messages_impl, channel, limit)
async def _fetch_slack_messages_impl(
self, channel: Optional[str] = None, limit: int = 100
) -> list[dict[str, Any]]:
"""
Internal implementation of fetch_slack_messages without retry logic.
"""
# This is a generic implementation - specific MCP servers may have different tool names
# Common tool names might be: 'get_messages', 'list_messages', 'fetch_channel_history'
tools = await self.list_available_tools()
logger.info(f"Available tools: {[tool.get('name') for tool in tools]}")
message_tool = None
# Look for a tool that can fetch messages
# Look for a tool that can fetch messages - prioritize conversations_history
message_tool = None
# First, try to find conversations_history specifically
for tool in tools:
tool_name = tool.get("name", "").lower()
if any(
keyword in tool_name
for keyword in ["message", "history", "channel", "conversation"]
):
if "conversations_history" in tool_name:
message_tool = tool
logger.info(f"Found conversations_history tool: {tool}")
break
# If not found, look for other message-fetching tools
if not message_tool:
for tool in tools:
tool_name = tool.get("name", "").lower()
if any(
keyword in tool_name
for keyword in ["conversations_search", "message", "history"]
):
message_tool = tool
break
if not message_tool:
raise RuntimeError("No message fetching tool found in MCP server")
# Prepare tool call parameters
tool_params = {"limit": limit}
tool_params = {"limit": "180d"} # Use 180 days to get older messages
if channel:
# Try common parameter names for channel specification
for param_name in ["channel", "channel_id", "channel_name"]:
tool_params[param_name] = channel
break
# For conversations_history, use channel_id parameter
if message_tool["name"] == "conversations_history":
tool_params["channel_id"] = channel
else:
# Try common parameter names for channel specification
for param_name in ["channel", "channel_id", "channel_name"]:
tool_params[param_name] = channel
break
logger.info(f"Tool parameters: {tool_params}")
fetch_request = {
"jsonrpc": "2.0",
@@ -170,8 +265,8 @@ class SlackMCPReader:
try:
messages = json.loads(content["text"])
except json.JSONDecodeError:
# If not JSON, treat as plain text
messages = [{"text": content["text"], "channel": channel or "unknown"}]
# If not JSON, try to parse as CSV format (Slack MCP server format)
messages = self._parse_csv_messages(content["text"], channel)
else:
messages = result["content"]
else:
@@ -180,6 +275,56 @@ class SlackMCPReader:
return messages if isinstance(messages, list) else [messages]
def _parse_csv_messages(self, csv_text: str, channel: str) -> list[dict[str, Any]]:
"""Parse CSV format messages from Slack MCP server."""
import csv
import io
messages = []
try:
# Split by lines and process each line as a CSV row
lines = csv_text.strip().split("\n")
if not lines:
return messages
# Skip header line if it exists
start_idx = 0
if lines[0].startswith("MsgID,UserID,UserName"):
start_idx = 1
for line in lines[start_idx:]:
if not line.strip():
continue
# Parse CSV line
reader = csv.reader(io.StringIO(line))
try:
row = next(reader)
if len(row) >= 7: # Ensure we have enough columns
message = {
"ts": row[0],
"user": row[1],
"username": row[2],
"real_name": row[3],
"channel": row[4],
"thread_ts": row[5],
"text": row[6],
"time": row[7] if len(row) > 7 else "",
"reactions": row[8] if len(row) > 8 else "",
"cursor": row[9] if len(row) > 9 else "",
}
messages.append(message)
except Exception as e:
logger.warning(f"Failed to parse CSV line: {line[:100]}... Error: {e}")
continue
except Exception as e:
logger.warning(f"Failed to parse CSV messages: {e}")
# Fallback: treat entire text as one message
messages = [{"text": csv_text, "channel": channel or "unknown"}]
return messages
def _format_message(self, message: dict[str, Any]) -> str:
"""Format a single message for indexing."""
text = message.get("text", "")
@@ -251,6 +396,40 @@ class SlackMCPReader:
return "\n".join(content_parts)
async def get_all_channels(self) -> list[str]:
"""Get list of all available channels."""
try:
channels_list_request = {
"jsonrpc": "2.0",
"id": 4,
"method": "tools/call",
"params": {"name": "channels_list", "arguments": {}},
}
channels_response = await self.send_mcp_request(channels_list_request)
if "result" in channels_response:
result = channels_response["result"]
if "content" in result and isinstance(result["content"], list):
content = result["content"][0] if result["content"] else {}
if "text" in content:
# Parse the channels from the response
channels = []
lines = content["text"].split("\n")
for line in lines:
if line.strip() and ("#" in line or "C" in line[:10]):
# Extract channel ID or name
parts = line.split()
for part in parts:
if part.startswith("C") and len(part) > 5:
channels.append(part)
elif part.startswith("#"):
channels.append(part[1:]) # Remove #
logger.info(f"Found {len(channels)} channels: {channels}")
return channels
return []
except Exception as e:
logger.warning(f"Failed to get channels list: {e}")
return []
async def read_slack_data(self, channels: Optional[list[str]] = None) -> list[str]:
"""
Read Slack data and return formatted text chunks.
@@ -287,36 +466,33 @@ class SlackMCPReader:
logger.warning(f"Failed to fetch messages from channel {channel}: {e}")
continue
else:
# Fetch from all available channels/conversations
# This is a simplified approach - real implementation would need to
# discover available channels first
try:
messages = await self.fetch_slack_messages(limit=1000)
if messages:
# Group messages by channel if concatenating
if self.concatenate_conversations:
channel_messages = {}
for message in messages:
channel = message.get(
"channel", message.get("channel_name", "general")
)
if channel not in channel_messages:
channel_messages[channel] = []
channel_messages[channel].append(message)
# Fetch from all available channels
logger.info("Fetching from all available channels...")
all_channels = await self.get_all_channels()
# Create concatenated content for each channel
for channel, msgs in channel_messages.items():
text_content = self._create_concatenated_content(msgs, channel)
if not all_channels:
# Fallback to common channel names if we can't get the list
all_channels = ["general", "random", "announcements", "C0GN5BX0F"]
logger.info(f"Using fallback channels: {all_channels}")
for channel in all_channels:
try:
logger.info(f"Searching channel: {channel}")
messages = await self.fetch_slack_messages(channel=channel, limit=1000)
if messages:
if self.concatenate_conversations:
text_content = self._create_concatenated_content(messages, channel)
if text_content.strip():
all_texts.append(text_content)
else:
# Process individual messages
for message in messages:
formatted_msg = self._format_message(message)
if formatted_msg.strip():
all_texts.append(formatted_msg)
except Exception as e:
logger.error(f"Failed to fetch messages: {e}")
else:
# Process individual messages
for message in messages:
formatted_msg = self._format_message(message)
if formatted_msg.strip():
all_texts.append(formatted_msg)
except Exception as e:
logger.warning(f"Failed to fetch messages from channel {channel}: {e}")
continue
return all_texts

View File

@@ -78,6 +78,20 @@ class SlackMCPRAG(BaseRAGExample):
help="Test MCP server connection and list available tools without indexing",
)
parser.add_argument(
"--max-retries",
type=int,
default=5,
help="Maximum number of retries for failed operations (default: 5)",
)
parser.add_argument(
"--retry-delay",
type=float,
default=2.0,
help="Initial delay between retries in seconds (default: 2.0)",
)
async def test_mcp_connection(self, args) -> bool:
"""Test the MCP server connection and display available tools."""
print(f"Testing connection to MCP server: {args.mcp_server}")
@@ -88,12 +102,14 @@ class SlackMCPRAG(BaseRAGExample):
workspace_name=args.workspace_name,
concatenate_conversations=not args.no_concatenate_conversations,
max_messages_per_conversation=args.max_messages_per_channel,
max_retries=args.max_retries,
retry_delay=args.retry_delay,
)
async with reader:
tools = await reader.list_available_tools()
print("\nSuccessfully connected to MCP server!")
print("Successfully connected to MCP server!")
print(f"Available tools ({len(tools)}):")
for i, tool in enumerate(tools, 1):
@@ -115,7 +131,7 @@ class SlackMCPRAG(BaseRAGExample):
return True
except Exception as e:
print(f"\nFailed to connect to MCP server: {e}")
print(f"Failed to connect to MCP server: {e}")
print("\nTroubleshooting tips:")
print("1. Make sure the MCP server is installed and accessible")
print("2. Check if the server command is correct")
@@ -130,8 +146,11 @@ class SlackMCPRAG(BaseRAGExample):
if args.workspace_name:
print(f"Workspace: {args.workspace_name}")
if args.channels:
print(f"Channels: {', '.join(args.channels)}")
# Filter out empty strings from channels
channels = [ch for ch in args.channels if ch.strip()] if args.channels else None
if channels:
print(f"Channels: {', '.join(channels)}")
else:
print("Fetching from all available channels")
@@ -146,18 +165,20 @@ class SlackMCPRAG(BaseRAGExample):
workspace_name=args.workspace_name,
concatenate_conversations=concatenate,
max_messages_per_conversation=args.max_messages_per_channel,
max_retries=args.max_retries,
retry_delay=args.retry_delay,
)
texts = await reader.read_slack_data(channels=args.channels)
texts = await reader.read_slack_data(channels=channels)
if not texts:
print("No messages found! This could mean:")
print("No messages found! This could mean:")
print("- The MCP server couldn't fetch messages")
print("- The specified channels don't exist or are empty")
print("- Authentication issues with the Slack workspace")
return []
print(f"Successfully loaded {len(texts)} text chunks from Slack")
print(f"Successfully loaded {len(texts)} text chunks from Slack")
# Show sample of what was loaded
if texts:
@@ -170,7 +191,7 @@ class SlackMCPRAG(BaseRAGExample):
return texts
except Exception as e:
print(f"Error loading Slack data: {e}")
print(f"Error loading Slack data: {e}")
print("\nThis might be due to:")
print("- MCP server connection issues")
print("- Authentication problems")
@@ -188,7 +209,7 @@ class SlackMCPRAG(BaseRAGExample):
if not success:
return
print(
"\n🎉 MCP server is working! You can now run without --test-connection to start indexing."
"MCP server is working! You can now run without --test-connection to start indexing."
)
return