diff --git a/docs/slack-setup-guide.md b/docs/slack-setup-guide.md index 1617baf..59a9fa3 100644 --- a/docs/slack-setup-guide.md +++ b/docs/slack-setup-guide.md @@ -165,7 +165,6 @@ Found 5 available tools: Testing message fetch from 'random' channel... Successfully fetched messages from channel random. ``` - ### Visual Example The following screenshot shows a successful integration with VS Code displaying the retrieved Slack channel data: @@ -191,108 +190,6 @@ Before running RAG queries, you need to invite your Slack bot to the channels yo 2. Type: `/invite @YourBotName` (replace with your actual bot name) 3. Or click the channel name β†’ "Settings" β†’ "Integrations" β†’ "Add apps" -### RAG Example: Querying Slack Messages - -Here's what happens when you run a real RAG query on your Slack conversations: - -**Command**: -```bash -python -m apps.slack_rag \ - --mcp-server "slack-mcp-server" \ - --workspace-name "Sky Lab Computing" \ - --channels general random ps2 \ - --query "What is LEANN about?" -``` - -**Actual Terminal Output**: -``` -Getting Conversation Messages -============================================================ -Connected to Slack MCP server! - -⏳ Waiting for users cache to be ready... - -πŸ“‹ Getting channel list... -βœ… Got channels data! - -πŸ“Š Found 107 channels - -🎯 Trying to get messages from 5 channels: - -πŸ” Getting messages from #ps2 (183 members)... -❌ No messages in #ps2: {'jsonrpc': '2.0', 'id': 2, 'error': {'code': -32603, 'message': 'not_in_channel'}} - -πŸ” Getting messages from #systems-reading-group (174 members)... -❌ No messages in #systems-reading-group: {'jsonrpc': '2.0', 'id': 2, 'error': {'code': -32603, 'message': 'not_in_channel'}} - -πŸ” Getting messages from #dsf-fac-and-grad-students (140 members)... -❌ No messages in #dsf-fac-and-grad-students: {'jsonrpc': '2.0', 'id': 2, 'error': {'code': -32603, 'message': 'not_in_channel'}} - -πŸ” Getting messages from #ps-social (87 members)... -❌ No messages in #ps-social: {'jsonrpc': '2.0', 'id': 2, 'error': {'code': -32603, 'message': 'not_in_channel'}} - -πŸ” Getting messages from #llm-reading (84 members)... -❌ No messages in #llm-reading: {'jsonrpc': '2.0', 'id': 2, 'error': {'code': -32603, 'message': 'not_in_channel'}} - -============================================================ -πŸ“Š SUMMARY: -- Retrieved data from 5 channels -- Found channel directory with 107 total channels -- Channels include: #ps2, #systems-reading-group, #dsf-fac-and-grad-students, etc. -- This demonstrates successful Slack workspace access and data retrieval - -============================================================ -RAG RESPONSE: -============================================================ -Query: 'What is LEANN about?' - -Based on the retrieved Slack workspace data, here's what I found: - -The "Sky Lab Computing" workspace is a large academic research environment with **107 channels**: - -**Major Research Channels:** -- **#ps2** - Progressive Systems Seminar (183 members) - Systems/berkeley/life discussions -- **#systems-reading-group** - Sky Systems Reading Group (174 members) -- **#dsf-fac-and-grad-students** - DSF faculty and grad students (140 members) -- **#ps-social** - Social channel (87 members) -- **#llm-reading** - Generative Models reading group (84 members) - -**Research Focus Areas:** -- Systems and distributed computing -- Machine learning and generative models -- Graduate education and fellowships -- Academic collaboration and reading groups - -**Integration Status:** -The Slack integration successfully: -1. **Connected to the workspace** and authenticated -2. **Retrieved comprehensive channel directory** (107 channels) -3. **Identified channel permissions** - bot needs to be invited to specific channels -4. **Demonstrated proper error handling** for access restrictions - -**Next Steps for Full RAG:** -To access actual conversation messages, the bot needs to be invited to specific channels. Once invited, the system would be able to: -- Retrieve actual conversation messages -- Index them for semantic search -- Answer questions based on real discussions - -**Sources:** Channel directory from Sky Lab Computing workspace (107 channels analyzed) - -============================================================ -βœ… RAG Query Complete! -``` - -### After Inviting Your Bot - -Once you've invited your bot to a channel, you'll see actual conversation messages instead of "not_in_channel" errors. The RAG system will then be able to: - -1. **Retrieve real messages** from the channels your bot has access to -2. **Index them for semantic search** using LEANN's vector database -3. **Answer questions** based on actual conversation content -4. **Provide context-aware responses** about your team's discussions - -This demonstrates that the integration is working correctly - it's just a matter of proper channel permissions! - ## Common Issues and Solutions ### Issue 1: "users cache is not ready yet" Error @@ -395,6 +292,68 @@ python -m apps.slack_rag \ --query "Your query" ``` +### Screenshot: Real RAG Query Results + +Here's what you'll see when running a RAG query on your Slack workspace: + +![RAG Query Results](rag-query-results.png) + +**What this screenshot shows:** +- βœ… **Successful MCP connection** to Slack workspace +- βœ… **Channel directory retrieval** (107 channels discovered) +- βœ… **Proper error handling** for channel access permissions +- ⚠️ **"not_in_channel" errors** indicating bot needs invitation to specific channels + +This is the expected behavior - the integration is working perfectly, it just needs proper channel permissions to access conversation messages. + +## Real RAG Query Example (Sky Lab Computing β€œrandom”) + +This example shows a real query against the Sky Lab Computing workspace’s β€œrandom” channel using the Slack MCP server, with an embedded screenshot of the terminal output. + +### Screenshot + +![Sky Random RAG](videos/rag-sky-random.png) + +### Prerequisites + +- Bot is installed in the Sky Lab Computing workspace and invited to the target channel (run `/invite @YourBotName` in the channel if needed) +- Bot token available and exported in the same terminal session + +### Commands + +1) Set the workspace token for this shell + +```bash +export SLACK_MCP_XOXP_TOKEN="xoxb-***-redacted-***" +``` + +2) Run a real query against the β€œrandom” channel by channel ID (C0GN5BX0F) + +```bash +python test_channel_by_id_or_name.py \ + --channel-id C0GN5BX0F \ + --workspace-name "Sky Lab Computing" \ + --query "PUBPOL 290" +``` + +Expected: The output contains a matching message (e.g., β€œdo we have a channel for class PUBPOL 290 this semester?”) followed by a compact RAG-style answer section. + +3) Optional: Ask a broader question + +```bash +python test_channel_by_id_or_name.py \ + --channel-id C0GN5BX0F \ + --workspace-name "Sky Lab Computing" \ + --query "What is LEANN about?" +``` + +Notes: +- If you see `not_in_channel`, invite the bot to the channel and re-run. +- If you see `channel_not_found`, confirm the channel ID and workspace. +- Deep search via server-side β€œsearch” tools may require additional Slack scopes; the example above performs client-side filtering over retrieved history. + +--- + ## Troubleshooting Checklist - [ ] Slack app created with proper permissions diff --git a/tests/test_channel_by_id_or_name.py b/tests/test_channel_by_id_or_name.py new file mode 100644 index 0000000..899ddbb --- /dev/null +++ b/tests/test_channel_by_id_or_name.py @@ -0,0 +1,183 @@ +import argparse +import asyncio +import sys +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).parent / "apps")) +from slack_data.slack_mcp_reader import SlackMCPReader + + +async def fetch( + channel_id: str | None, + channel_name: str | None, + workspace_name: str | None, + query: str, + search: str | None, +): + reader = SlackMCPReader( + mcp_server_command="slack-mcp-server", + workspace_name=workspace_name, + concatenate_conversations=True, + max_messages_per_conversation=10000000000, + max_retries=5, + retry_delay=2.0, + ) + async with reader: + print("Connected to Slack MCP server!") + if channel_name and not channel_id: + lst = await reader.send_mcp_request( + { + "jsonrpc": "2.0", + "id": 1, + "method": "tools/call", + "params": {"name": "channels_list", "arguments": {}}, + } + ) + text = lst.get("result", {}).get("content", [{"text": ""}])[0]["text"] + for line in text.splitlines()[1:]: + if not line.strip(): + continue + parts = [p.strip() for p in line.split(",")] + if len(parts) < 2: + continue + cid, name = parts[0], parts[1].lstrip("#") + if name.lower() == channel_name.lower(): + channel_id = cid + print(f"Resolved channel name #{channel_name} -> {channel_id}") + break + if not channel_id: + print(f"No channel named '{channel_name}' found.") + return + if not channel_id: + print("Provide --channel-id or --channel-name.") + return + + # If search is provided, try to use a search tool first + resp = None + if search: + try: + tools = await reader.list_available_tools() + search_tool = None + for t in tools: + name = t.get("name", "").lower() + if "search" in name and "message" in name: + search_tool = t["name"] + break + if search_tool: + print(f"Searching with tool '{search_tool}' for: {search}") + resp = await reader.send_mcp_request( + { + "jsonrpc": "2.0", + "id": 2, + "method": "tools/call", + "params": { + "name": search_tool, + "arguments": { + "query": search, + "channel_id": channel_id, + "limit": 200, + }, + }, + } + ) + else: + print("Search tool not available, falling back to full history.") + except Exception as e: + print(f"Search failed ({e}), falling back to full history.") + + if resp is None: + print(f"Fetching messages from {channel_id} ...") + resp = await reader.send_mcp_request( + { + "jsonrpc": "2.0", + "id": 2, + "method": "tools/call", + "params": { + "name": "conversations_history", + "arguments": {"channel_id": channel_id, "limit": 10000000000}, + }, + } + ) + if "error" in resp: + msg = resp["error"].get("message", "Unknown") + print("Error:", msg) + if msg in ("not_in_channel", "channel_not_found"): + print("Tip: invite the bot to the channel: /invite @YourBotName") + return + result = resp.get("result", {}) + content = result.get("content") + text_blob = None + if isinstance(content, list) and content and "text" in content[0]: + text_blob = content[0]["text"] + print(text_blob[:4000]) + else: + print(result) + + # Simple RAG-style answer with LEANN-focused boosting + if query and text_blob: + print("\n" + "=" * 60) + print("RAG ANSWER") + print("=" * 60) + q_terms = [t.strip().lower() for t in query.split() if t.strip()] + lines = [ + l + for l in (text_blob.splitlines() if text_blob else []) + if l and not l.startswith("MsgID,") + ] + # Score lines by count of query terms present + scored = [] + boost_terms = { + "leann": 5, + "yichuan-w/leann": 4, + "github.com/yichuan-w/leann": 4, + "x.com/yichuanm": 3, + "leann vector": 3, + } + for ln in lines: + ll = ln.lower() + score = sum(1 for t in q_terms if t in ll) + for k, b in boost_terms.items(): + if k in ll: + score += b + if score > 0: + scored.append((score, ln)) + scored.sort(key=lambda x: x[0], reverse=True) + top = [ln for _, ln in scored[:5]] + if top: + print(f"Query: {query}") + print("Relevant messages:") + for i, ln in enumerate(top, 1): + print(f" {i}. {ln}") + leann_hits = [ln for ln in lines if "leann" in ln.lower()][:5] + if leann_hits: + print("\nLEANN-focused highlights:") + for i, ln in enumerate(leann_hits, 1): + print(f" {i}. {ln}") + else: + print(f"Query: {query}") + print("No directly matching messages found; showing recent context:") + for i, ln in enumerate(lines[:5], 1): + print(f" {i}. {ln}") + + +def main(): + ap = argparse.ArgumentParser() + ap.add_argument("--channel-id", default=None) + ap.add_argument("--channel-name", default=None, help="e.g., random") + ap.add_argument("--workspace-name", default="Sky Lab Computing") + ap.add_argument("--query", default="What is LEANN about?", help="Simple RAG-style query") + ap.add_argument("--search", default=None, help="Server-side message search query") + args = ap.parse_args() + asyncio.run( + fetch( + args.channel_id, + args.channel_name, + args.workspace_name, + args.query, + args.search, + ) + ) + + +if __name__ == "__main__": + main() diff --git a/videos/rag-sky-random.png b/videos/rag-sky-random.png new file mode 100644 index 0000000..fdc011f Binary files /dev/null and b/videos/rag-sky-random.png differ