fix: address root cause of test hanging - improper ZMQ/C++ resource cleanup

Fixed the actual root cause instead of just masking it in tests:

1. Root Problem:
   - C++ side's ZmqDistanceComputer creates ZMQ connections but doesn't clean them
   - Python 3.9/3.13 are more sensitive to cleanup timing during shutdown

2. Core Fixes in SearcherBase and LeannSearcher:
   - Added cleanup() method to BaseSearcher that cleans ZMQ and embedding server
   - LeannSearcher.cleanup() now also handles ZMQ context cleanup
   - Both HNSW and DiskANN searchers now properly delete C++ index objects

3. Backend-Specific Cleanup:
   - HNSWSearcher.cleanup(): Deletes self.index to trigger C++ destructors
   - DiskannSearcher.cleanup(): Deletes self._index and resets state
   - Both force garbage collection after deletion

4. Test Infrastructure:
   - Added auto_cleanup_searcher fixture for explicit resource management
   - Global cleanup now more aggressive with ZMQ context destruction

This is the proper fix - cleaning up resources at the source, not just
working around the issue in tests. The hanging was caused by C++ side
ZMQ connections not being properly terminated when is_recompute=True.
This commit is contained in:
Andy Lee
2025-08-08 17:53:41 -07:00
parent 131f10b286
commit a6dad47280
5 changed files with 130 additions and 5 deletions

View File

@@ -459,3 +459,25 @@ class DiskannSearcher(BaseSearcher):
string_labels = [[str(int_label) for int_label in batch_labels] for batch_labels in labels]
return {"labels": string_labels, "distances": distances}
def cleanup(self):
"""Cleanup DiskANN-specific resources including C++ index."""
# Call parent cleanup first
super().cleanup()
# Delete the C++ index to trigger destructors
try:
if hasattr(self, "_index") and self._index is not None:
del self._index
self._index = None
self._current_zmq_port = None
except Exception:
pass
# Force garbage collection to ensure C++ objects are destroyed
try:
import gc
gc.collect()
except Exception:
pass