-
-
Notifications
You must be signed in to change notification settings - Fork 33.9k
Description
Bug report
Bug description:
I have been observing that on my local build of main branch when I run the test suite, I occasionally am getting a failed test. Failure here:
0:01:56 load avg: 2.52 [ 39/498] test.test_concurrent_futures.test_interpreter_pool passed
0:01:56 load avg: 2.52 [ 40/498] test.test_concurrent_futures.test_process_pool
test test.test_concurrent_futures.test_process_pool failed -- Traceback (most recent call last):
File "/Users/a12k/opt/cpython/Lib/test/test_concurrent_futures/test_process_pool.py", line 119, in test_traceback_when_child_process_terminates_abruptly
self.assertIsInstance(cause, futures.process._RemoteTraceback)
~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: None is not an instance of <class 'concurrent.futures.process._RemoteTraceback'>
0:02:08 load avg: 2.97 [ 40/498/1] test.test_concurrent_futures.test_process_pool failed (1 failure)
0:02:08 load avg: 2.97 [ 41/498/1] test.test_concurrent_futures.test_shutdown
I tried a few different ways of repro-ing this deterministically, mostly letting it and the few tests that preceded it run for an hour (./python.exe -m test -v -F test.test_concurrent_futures.test_process_pool, or ./python -m test -v -F test_concurrent_futures.test_deadlock test_concurrent_futures.test_interpreter_pool test_concurrent_futures.test_process_pool) until it failed.
I ended up forcing the fail in Lib/concurrent/futures/process.py which is as follows (insert at line 486 right after errors = [] all the way until # Mark pending tasks as failed.):
if any(fn == os._exit for fn in [w.fn for w in self.pending_work_items.values()]):
print("~~~ ARTIFICAL DELAY ~~~")
for p in list(self.processes.values()):
# set exit code to None to simulate it not ready yet
object.__setattr__(p, "_exitcode", None)
for p in self.processes.values():
if p.exitcode is not None and p.exitcode != 0:
errors.append(f"Process {p.pid} terminated abruptly "
f"with exit code {p.exitcode}")
if errors:
cause_str = "\n".join(errors)
if cause_str and any(fn == os._exit for fn in [w.fn for w in self.pending_work_items.values()]):
print("~~~ ARTIFICAL DELAY ~~~ Waiting to set __cause__ for 3 seconds")
def delayed_set_cause():
import time
time.sleep(3)
print("~~~ ARTIFICAL DELAY COMPLETE ~~~ setting __cause__")
nonlocal bpe
bpe.__cause__ = _RemoteTraceback(f"\n'''\n{cause_str}'''")
# Set cause after delay
threading.Thread(target=delayed_set_cause, daemon=True).start()
elif cause_str:
bpe.__cause__ = _RemoteTraceback(f"\n'''\n{cause_str}'''")Basically forcing the race condition, setting the cause to None and delaying it. Not sure if this was all necessary, but that's how I was able to deterministically get it to continually fail.
I updated the test to account for this race condition by waiting for __cause__ to be populated and now the test passes. PR incoming for review.
CPython versions tested on:
CPython main branch
Operating systems tested on:
macOS