fix: prevent test runner hanging on Python 3.9/3.13 due to ZMQ and process cleanup issues

Based on excellent analysis from user, implemented comprehensive fixes:

1. ZMQ Socket Cleanup:
   - Set LINGER=0 on all ZMQ sockets (client and server)
   - Use try-finally blocks to ensure socket.close() and context.term()
   - Prevents blocking on exit when ZMQ contexts have pending operations

2. Global Test Cleanup:
   - Added tests/conftest.py with session-scoped cleanup fixture
   - Cleans up leftover ZMQ contexts and child processes after all tests
   - Lists remaining threads for debugging

3. CI Improvements:
   - Apply timeout to ALL Python versions on Linux (not just 3.13)
   - Increased timeout to 180s for better reliability
   - Added process cleanup (pkill) on timeout

4. Dependencies:
   - Added psutil>=5.9.0 to test dependencies for process management

Root cause: Python 3.9/3.13 are more sensitive to cleanup timing during
interpreter shutdown. ZMQ's default LINGER=-1 was blocking exit, and
atexit handlers were unreliable for cleanup.

This should resolve the 'all tests pass but CI hangs' issue.
This commit is contained in:
Andy Lee
2025-08-08 15:57:22 -07:00
parent 05e1efa00a
commit e3762458fc
7 changed files with 138 additions and 29 deletions

View File

@@ -60,8 +60,9 @@ dev = [
test = [
"pytest>=8.3.0", # Minimum version for Python 3.13 support
"pytest-timeout>=2.3",
"anyio>=4.0", # For async test support (includes pytest plugin)
"pytest-timeout>=2.3",
"anyio>=4.0", # For async test support (includes pytest plugin)
"psutil>=5.9.0", # For process cleanup in tests
"llama-index-core>=0.12.0",
"llama-index-readers-file>=0.4.0",
"python-dotenv>=1.0.0",