fix: address root cause of test hanging - improper ZMQ/C++ resource cleanup

Fixed the actual root cause instead of just masking it in tests:

1. Root Problem:
   - C++ side's ZmqDistanceComputer creates ZMQ connections but doesn't clean them
   - Python 3.9/3.13 are more sensitive to cleanup timing during shutdown

2. Core Fixes in SearcherBase and LeannSearcher:
   - Added cleanup() method to BaseSearcher that cleans ZMQ and embedding server
   - LeannSearcher.cleanup() now also handles ZMQ context cleanup
   - Both HNSW and DiskANN searchers now properly delete C++ index objects

3. Backend-Specific Cleanup:
   - HNSWSearcher.cleanup(): Deletes self.index to trigger C++ destructors
   - DiskannSearcher.cleanup(): Deletes self._index and resets state
   - Both force garbage collection after deletion

4. Test Infrastructure:
   - Added auto_cleanup_searcher fixture for explicit resource management
   - Global cleanup now more aggressive with ZMQ context destruction

This is the proper fix - cleaning up resources at the source, not just
working around the issue in tests. The hanging was caused by C++ side
ZMQ connections not being properly terminated when is_recompute=True.
This commit is contained in:
Andy Lee
2025-08-08 17:53:41 -07:00
parent 131f10b286
commit a6dad47280
5 changed files with 130 additions and 5 deletions

View File

@@ -18,13 +18,37 @@ def global_test_cleanup() -> Generator:
yield
# Cleanup after all tests
print("\n🧹 Running global test cleanup...")
# 1. Force cleanup of any LeannSearcher instances
try:
import gc
# Force garbage collection to trigger __del__ methods
gc.collect()
time.sleep(0.2)
except Exception:
pass
# 2. Terminate ZMQ contexts more aggressively
try:
import zmq
# Set a very short linger on any remaining contexts
# This prevents blocking on context termination
# Get the global instance and destroy it
ctx = zmq.Context.instance()
ctx.linger = 0
# Force termination - this is aggressive but needed for CI
try:
ctx.destroy(linger=0)
except Exception:
pass
# Also try to terminate the default context
try:
zmq.Context.term(zmq.Context.instance())
except Exception:
pass
except Exception:
pass
@@ -78,6 +102,32 @@ def global_test_cleanup() -> Generator:
pass
@pytest.fixture
def auto_cleanup_searcher():
"""Fixture that automatically cleans up LeannSearcher instances."""
searchers = []
def register(searcher):
"""Register a searcher for cleanup."""
searchers.append(searcher)
return searcher
yield register
# Cleanup all registered searchers
for searcher in searchers:
try:
searcher.cleanup()
except Exception:
pass
# Force garbage collection
import gc
gc.collect()
time.sleep(0.1)
@pytest.fixture(autouse=True)
def cleanup_after_each_test():
"""Cleanup after each test to prevent resource leaks."""