Flaky tests - Paul Swail

Flaky tests are automated tests that are non-deterministic whereby they may pass or fail when executed against the same build artefact or deployed system. If you've ever retried the execution of a failed test run in your CI/CD pipeline tool without any code or config changes in order to get a failing test case to pass, that's an indicator that you have a flaky test case. Flaky tests are more common with [[Integration testing|Integration tests]] and [[E2E tests]] than with [[Unit testing|Unit tests]]. ## Causes of flaky tests - Concurrently executing test suites (e.g. separate test/spec files in [[Jest]]) which act upon the same datastore could potentially interleave in such a way that the assumptions made by a test case on the state of the database no longer hold. - A test case is waiting for a fixed period of time on an asynchronous task being performed on the server before performing a verification/assertion step (see [[How to wait for an async task to complete inside an E2E test]]) - [[Example of concurrent test execution causing test flakiness]] - A test case which is dependent on the clock time/date ## Solutions for flaky tests - Delete them! - Skip them ## See also - [[How to handle flaky tests - Daily Email Broadcast - 2021-05-12]] --- ## References - [A randomly failing test is a failing test](https://jhall.io/archive/2021/02/22/a-randomly-failing-test-is-a-failing-test/) by [[@Jonathan Hall]] - [The Unexpected Costs of Flaky Tests – The New Stack](https://thenewstack.io/the-unexpected-costs-of-flaky-tests/) by [[@Serkan Ozal]] 1. **Concurrency:** In multithreaded software, when the threads rely on an implicit ordering of the data, but race-conditions occur. 2. **Async await:** When a system starts asynchronous tasks, but doesn’t wait for them to finish. 3. **Too restrictive range:** Tests define a range of valid outputs, but actual outputs go out of that range while still being valid results. 4. **Test order dependency:** The outcome of one test relies on the test running before it. 5. **Test case timeout:** The size of a test grew over time, but the timeout wasn’t increased. 6. **Resource leak:** Memory isn’t released properly and can overflow in some cases. 7. **Platform dependency:** A test relies on platform-specific behavior. Such as a task that yields a deterministic result on one operating system and non-deterministic on another one. 8. **Float precision:** Float overflows or underflows were not considered, but are a crucial part of the test result. 9. **Test suite timeout:** Contrary to the test case timeout, in this case, no single test is responsible for the flakiness; but the aggregate of the tests causes the entire test suite to timeout. 10. **Time:** Here a test relies on the local system clock and becomes flaky. For example, this could happen when two timestamps of different time zones are compared. 11. **Randomness:** Sometimes actual randomness is required for a test case, but the developer forgets to check for edge cases. --- tags: [[Software testing MOC|Software testing]]