nonatomic.labs | pytest, pytest-xdist and Allure: friends or foes?

This post is accompanied by a GitHub repo which contains all code examples.

Introduction

Last week, I discovered a broken import in our codebase, which had been introduced the previous day. Making mistakes happen and the fix was trivial, so no big deal.

What puzzled me though is that our test results on that day looked fine (we have thousands of tests running every day, so it’s hard to notice a missing test), so I decided to investigate further what we were seeing. I discovered that using pytest, pytest-xdist and Allure has some consequences I hadn’t thought of.

No tests are run when there is a collection error…

When a pytest session starts, pytest first collects the tests in the test folder (or . if there is no test folder specified). If there are errors that occur during the collection, for instance an import that can’t be resolved, pytest stops there and no tests are run.

For instance, the tests directory in the example repo contains two test modules: one runs fine, but the other contains a broken import statement. If we run the test suite, we get the following:

> docker run --rm pytest-xdist-allure-error
============================= test session starts ==============================
platform linux -- Python 3.12.1, pytest-8.0.0, pluggy-1.4.0
rootdir: /home
plugins: xdist-3.5.0, allure-pytest-2.13.2
collected 1 item / 1 error

==================================== ERRORS ====================================
___________________ ERROR collecting tests/test_breaking.py ____________________
ImportError while importing test module '/home/tests/test_breaking.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
/Users/jean/Code/pytest-xdist-allure-error/tests/test_breaking.py:1: in <module>
    ???
E   ModuleNotFoundError: No module named 'idontexist'
=========================== short test summary info ============================
ERROR tests/test_breaking.py
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
=============================== 1 error in 0.03s ===============================
Return code: 2

We notice the line Interrupted: 1 error during collection, and the fact that no tests are run.

… unless you use pytest-xdist…

This behavior is actually nice because we probably don’t want to go further in the test suite if we have import errors.

However, when running tests in parallel with pytest-xdist, this is not the behavior by default. We can observe that, by setting the PYTEST_XDIST_AUTO_NUM_WORKERS environment variable to a number greater than 0:

> docker run --rm -e PYTEST_XDIST_AUTO_NUM_WORKERS=2 pytest-xdist-allure-error
============================= test session starts ==============================
platform linux -- Python 3.12.1, pytest-8.0.0, pluggy-1.4.0
rootdir: /home
plugins: xdist-3.5.0, allure-pytest-2.13.2
created: 2/2 workers
2 workers [1 item]

.                                                                        [100%]
==================================== ERRORS ====================================
___________________ ERROR collecting tests/test_breaking.py ____________________
ImportError while importing test module '/home/tests/test_breaking.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
/Users/jean/Code/pytest-xdist-allure-error/tests/test_breaking.py:1: in <module>
    ???
E   ModuleNotFoundError: No module named 'idontexist'
=========================== short test summary info ============================
ERROR tests/test_breaking.py
========================== 1 passed, 1 error in 0.21s ==========================
Return code: 1

Now, see how we still get the collection error but it doesn’t actually prevent the working test from running, as the last line of the report tells us (1 passed, 1 error in 0.21s).

This actually makes sense from pytest-xdist’s point-of-view: one of the xdist processes encounters an error but it doesn’t prevent the other processes from running, so the other test runs and passes.

In that case, only the return code of the command informs us that something went wrong, but if we don’t check it and we don’t look at the Pytest output directly (more on that later), we might not see the issue.

… unless you use the “-x” option

One exception to that is if you use the -x option (short for --exitfirst): this option stops the test session as soon as there is a failure, so it will stop the test session when the collection error is encountered:

docker run --rm -e PYTEST_XDIST_AUTO_NUM_WORKERS=2 -e PYTEST_ADDITIONAL_ARGS="-x" pytest-xdist-allure-error
============================= test session starts ==============================
platform linux -- Python 3.12.1, pytest-8.0.0, pluggy-1.4.0
rootdir: /home
plugins: xdist-3.5.0, allure-pytest-2.13.2
created: 2/2 workers

==================================== ERRORS ====================================
___________________ ERROR collecting tests/test_breaking.py ____________________
ImportError while importing test module '/home/tests/test_breaking.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
/Users/jean/Code/pytest-xdist-allure-error/tests/test_breaking.py:1: in <module>
    ???
E   ModuleNotFoundError: No module named 'idontexist'
=========================== short test summary info ============================
ERROR tests/test_breaking.py
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
=============================== 1 error in 0.22s ===============================
Return code: 2

This is nice, but deciding to add -x to all your pytest calls shouldn’t be done lightly. Read one more time what the option does: it stops the test session as soon as there is a failure. If you have a realistic test suite with a few thousand tests, it is likely that a few of them fail every day. Do you really want to stop your test execution as soon as the first failure is encountered, potentially leaving hundreds of tests not running?

So far, the only real solution we have is reading the pytest logs to see if there was and error. But wait, I promised I would tell you more about not looking at pytest logs (who does that?!).

Introducing Allure

When may one not look at pytest logs, in which case one would see the error directly? Well, this can be the case if you use a tool like Allure to view your test results. In that case, you rely on it to tell you whether tests failed. Well, can you imagine what happens if you run tests with pytest-xdist and that you didn’t pass the -x option? Let’s find out:

docker run --rm -it -p 9090:9090 -e PYTEST_XDIST_AUTO_NUM_WORKERS=2 -e ALLURE_REPORT=1 pytest-xdist-allure-error
============================= test session starts =============================
platform linux -- Python 3.12.1, pytest-8.0.0, pluggy-1.4.0
rootdir: /home
plugins: xdist-3.5.0, allure-pytest-2.13.2
2 workers [1 item]     error
.                                                                       [100%]
=================================== ERRORS ====================================
___________________ ERROR collecting tests/test_breaking.py ___________________
ImportError while importing test module '/home/tests/test_breaking.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
/Users/jean/Code/pytest-xdist-allure-error/tests/test_breaking.py:1: in <module>
    ???
E   ModuleNotFoundError: No module named 'idontexist'
=========================== short test summary info ===========================
ERROR tests/test_breaking.py
========================= 1 passed, 1 error in 0.21s ==========================
Return code: 1
Generating report to temp directory...
Report successfully generated to /tmp/6845943324082651755/allure-report
Starting web server...
2024-01-31 16:03:14.550:INFO::main: Logging initialized @941ms to org.eclipse.jetty.util.log.StdErrLog
Can not open browser because this capability is not supported on your platform. You can use the link below to open the report manually.
Server started at <http://172.17.0.2:9090/>. Press <Ctrl+C> to exit

Opening up localhost:9090, we see that the Allure report tells us that everything is fine, although it wasn’t:

Conclusion

Ideally, I shouldn’t have encountered this issue because:

most of the times our test collection is broken, it’s because of static issues that could be easily caught by an IDE or a static analysis tool, like mypy
alternatively, the issue should get caught during the review
if it is not, we should read regularly Pytest logs to see if there are issues
we should have a test suite that is reliable enough so that we can use the -x option and know that any failure is a “real failure”
etc.

Yet, all integration test suites on which I worked had thousands of tests, most of them relying on non entirely deterministic environment (network to name one), with log files that are tens of megabytes big and hard to parse. Automated static analysis, being on the optional tooling side, is often easy to neglect. “By the book”, those are all bad practices but in my experience, this is what a real-life project looks like, so let’s try to keep our eyes open.