Fixed the actual root cause instead of just masking it in tests:
1. Root Problem:
- C++ side's ZmqDistanceComputer creates ZMQ connections but doesn't clean them
- Python 3.9/3.13 are more sensitive to cleanup timing during shutdown
2. Core Fixes in SearcherBase and LeannSearcher:
- Added cleanup() method to BaseSearcher that cleans ZMQ and embedding server
- LeannSearcher.cleanup() now also handles ZMQ context cleanup
- Both HNSW and DiskANN searchers now properly delete C++ index objects
3. Backend-Specific Cleanup:
- HNSWSearcher.cleanup(): Deletes self.index to trigger C++ destructors
- DiskannSearcher.cleanup(): Deletes self._index and resets state
- Both force garbage collection after deletion
4. Test Infrastructure:
- Added auto_cleanup_searcher fixture for explicit resource management
- Global cleanup now more aggressive with ZMQ context destruction
This is the proper fix - cleaning up resources at the source, not just
working around the issue in tests. The hanging was caused by C++ side
ZMQ connections not being properly terminated when is_recompute=True.
Based on excellent analysis from user, implemented comprehensive fixes:
1. ZMQ Socket Cleanup:
- Set LINGER=0 on all ZMQ sockets (client and server)
- Use try-finally blocks to ensure socket.close() and context.term()
- Prevents blocking on exit when ZMQ contexts have pending operations
2. Global Test Cleanup:
- Added tests/conftest.py with session-scoped cleanup fixture
- Cleans up leftover ZMQ contexts and child processes after all tests
- Lists remaining threads for debugging
3. CI Improvements:
- Apply timeout to ALL Python versions on Linux (not just 3.13)
- Increased timeout to 180s for better reliability
- Added process cleanup (pkill) on timeout
4. Dependencies:
- Added psutil>=5.9.0 to test dependencies for process management
Root cause: Python 3.9/3.13 are more sensitive to cleanup timing during
interpreter shutdown. ZMQ's default LINGER=-1 was blocking exit, and
atexit handlers were unreliable for cleanup.
This should resolve the 'all tests pass but CI hangs' issue.