Managing test data for automated tests

Integration and E2E tests inevitably require some data to be added into the database or other storage systems that the [[System under test|SUT]] is interacting with. ## Pain points Issues to be wary of: - Test data gets generated and not cleaned up. This leads to potentially slower test runs over time and potential bill costs. - Interference between the test data acted upon by different tests, with potential for race conditions when tests are run in parallel. - Tests rely on base data being present in the database which subsequently gets modified by other tests ## Test case scopes A *test case scope* is either a single test case or a group of test cases that have been grouped together within their test runner. Within [[Jest]], for example, test scopes include: - `it` and `test` constructs - `describe` constructs - `.spec` or `.test` files on the filesystem Other test runners include a global scope, which includes all test files that were found matching the glob pattern. Test data can be setup and tore down at each scope level. ## Test parallelisation The [[Jest]] test runner executes its tests as follows: 1. Find all test files matching the glob pattern 2. For each test file, create an isolated environment for that test file to run, with separate process.env, etc. Then start processing each test file **in parallel**. 3. Within each test file, run through construct **in series**: 1. `beforeAll` 2. `beforeEach` 3. `it` and `test` in the order they appear in the file 4. `afterEach` 5. `afterAll` ## General principles for setting up test data in your tests - Each *test case scope* should create and destroy all the data it needs to perform its test - As far as possible, do not use dynamically generated primary keys when setting up test data in DynamoDB. Instead use hardcoded IDs. If these IDs don't need to be in a specific format (e.g. UUID), use the name of the test scope as a prefix of the ID. This ensures that records can be safely `put` and easily cleaned up, and also makes it easier to identify the tests which are "leaking" test data into the database. ## #OpenQuestions - **How to handle test cases that need to perform queries against an empty database?** (e.g. the `GET /clubs` example in the Serverless Testing Workshop). Could use the `runInBand` to force sequential execution of each file, but that seems to be overkill. Could combine this with a testNamePattern tag to exclude these tests from main test run and instead run them afterwards. But I need to clearer define the use cases that would be affected by this.