fix: prevent test runner hanging on Python 3.9/3.13 due to ZMQ and process cleanup issues
Based on excellent analysis from user, implemented comprehensive fixes: 1. ZMQ Socket Cleanup: - Set LINGER=0 on all ZMQ sockets (client and server) - Use try-finally blocks to ensure socket.close() and context.term() - Prevents blocking on exit when ZMQ contexts have pending operations 2. Global Test Cleanup: - Added tests/conftest.py with session-scoped cleanup fixture - Cleans up leftover ZMQ contexts and child processes after all tests - Lists remaining threads for debugging 3. CI Improvements: - Apply timeout to ALL Python versions on Linux (not just 3.13) - Increased timeout to 180s for better reliability - Added process cleanup (pkill) on timeout 4. Dependencies: - Added psutil>=5.9.0 to test dependencies for process management Root cause: Python 3.9/3.13 are more sensitive to cleanup timing during interpreter shutdown. ZMQ's default LINGER=-1 was blocking exit, and atexit handlers were unreliable for cleanup. This should resolve the 'all tests pass but CI hangs' issue.
This commit is contained in:
26
.github/workflows/build-reusable.yml
vendored
26
.github/workflows/build-reusable.yml
vendored
@@ -263,25 +263,25 @@ jobs:
|
||||
# Activate virtual environment
|
||||
source .venv/bin/activate || source .venv/Scripts/activate
|
||||
|
||||
# Run all tests with timeout to debug Python 3.13 hanging issue
|
||||
# Using timeout with INT signal to get full traceback when hanging (Linux only)
|
||||
if [[ "${{ matrix.python }}" == "3.13" ]] && [[ "$RUNNER_OS" == "Linux" ]]; then
|
||||
echo "Running tests with timeout for Python 3.13 debugging (Linux)..."
|
||||
timeout --signal=INT 120 pytest tests/ --full-trace -v || {
|
||||
# Run all tests with timeout on Linux to prevent hanging
|
||||
if [[ "$RUNNER_OS" == "Linux" ]]; then
|
||||
echo "Running tests with timeout (Linux)..."
|
||||
timeout --signal=INT 180 pytest tests/ -v || {
|
||||
EXIT_CODE=$?
|
||||
if [ $EXIT_CODE -eq 124 ]; then
|
||||
echo "⚠️ Tests timed out after 120 seconds - likely hanging during collection"
|
||||
echo "This is a known issue with Python 3.13 - see traceback above for details"
|
||||
echo "⚠️ Tests timed out after 180 seconds - likely process cleanup issue"
|
||||
echo "Check for lingering ZMQ connections or child processes"
|
||||
# Try to clean up any leftover processes
|
||||
pkill -TERM -P $$ || true
|
||||
sleep 1
|
||||
pkill -KILL -P $$ || true
|
||||
fi
|
||||
exit $EXIT_CODE
|
||||
}
|
||||
elif [[ "${{ matrix.python }}" == "3.13" ]]; then
|
||||
# For macOS/Windows, run with verbose output but no timeout
|
||||
echo "Running tests for Python 3.13 (no timeout on $RUNNER_OS)..."
|
||||
pytest tests/ --full-trace -v
|
||||
else
|
||||
# Normal test run for other Python versions
|
||||
pytest tests/
|
||||
# For macOS/Windows, run without GNU timeout
|
||||
echo "Running tests ($RUNNER_OS)..."
|
||||
pytest tests/ -v
|
||||
fi
|
||||
|
||||
- name: Run sanity checks (optional)
|
||||
|
||||
Reference in New Issue
Block a user