Flaky tests are automated tests that are non-deterministic whereby they may pass or fail when executed against the same build artefact or deployed system.
If you've ever retried the execution of a failed test run in your CI/CD pipeline tool without any code or config changes in order to get a failing test case to pass, that's an indicator that you have a flaky test case.
Flaky tests are more common with [[Integration testing|Integration tests]] and [[E2E tests]] than with [[Unit testing|Unit tests]].
## Causes of flaky tests
- Concurrently executing test suites (e.g. separate test/spec files in [[Jest]]) which act upon the same datastore could potentially interleave in such a way that the assumptions made by a test case on the state of the database no longer hold.
- A test case is waiting for a fixed period of time on an asynchronous task being performed on the server before performing a verification/assertion step (see [[How to wait for an async task to complete inside an E2E test]])
- [[Example of concurrent test execution causing test flakiness]]
- A test case which is dependent on the clock time/date
## Solutions for flaky tests
- Delete them!
- Skip them
## See also
- [[How to handle flaky tests - Daily Email Broadcast - 2021-05-12]]
---
## References
- [A randomly failing test is a failing test](https://jhall.io/archive/2021/02/22/a-randomly-failing-test-is-a-failing-test/) by [[@Jonathan Hall]]
- [The Unexpected Costs of Flaky Tests – The New Stack](https://thenewstack.io/the-unexpected-costs-of-flaky-tests/) by [[@Serkan Ozal]]
1. **Concurrency:** In multithreaded software, when the threads rely on an implicit ordering of the data, but race-conditions occur.
2. **Async await:** When a system starts asynchronous tasks, but doesn’t wait for them to finish.
3. **Too restrictive range:** Tests define a range of valid outputs, but actual outputs go out of that range while still being valid results.
4. **Test order dependency:** The outcome of one test relies on the test running before it.
5. **Test case timeout:** The size of a test grew over time, but the timeout wasn’t increased.
6. **Resource leak:** Memory isn’t released properly and can overflow in some cases.
7. **Platform dependency:** A test relies on platform-specific behavior. Such as a task that yields a deterministic result on one operating system and non-deterministic on another one.
8. **Float precision:** Float overflows or underflows were not considered, but are a crucial part of the test result.
9. **Test suite timeout:** Contrary to the test case timeout, in this case, no single test is responsible for the flakiness; but the aggregate of the tests causes the entire test suite to timeout.
10. **Time:** Here a test relies on the local system clock and becomes flaky. For example, this could happen when two timestamps of different time zones are compared.
11. **Randomness:** Sometimes actual randomness is required for a test case, but the developer forgets to check for edge cases.
---
tags: [[Software testing MOC|Software testing]]