Flakiness in testing and how to deal with it

One of the most frustrating obstacles that occurs in automated software testing is test flakiness. Test flakiness effectively means unreliable, inconsistent or inconclusive test results. Automated testing was designed to avoid the flakiness that comes with manual testing, but it still occurs, and when it does, it can be very confusing and time-consuming to resolve. It can cause significant issues in test runs, leading to wasted resources, test failures, and test detection challenges. If flaky tests aren’t properly managed, they can erode trust in your entire test environment.

Additionally flaky testing can cause you to waste significant resources as developers spend valuable time chasing false positives. This can cause inconsistencies in continuous integration pipelines, and can lead to incorrect evaluations on the quality of your software. To maintain a robust testing process, it’s crucial to address the causes of flakiness and implement strategies to minimise the effect it has on your testing processes.

Common causes of test flakiness

Test “flakiness” can emanate from a number of causes. However, most of these boil down to timing issues, dependencies and other changes along the testing process.

Asynchronous code issues

An extremely common cause of a flaky testing is asynchronous code. Test fails can occur due to minor variations in the speed of execution, especially when waiting for network responses or background tasks. For instance, if a test relies on an element loading within a threshold time, that test will fail in the case of a slow network response, even though the functionality might be correct.

The solution to this is proper synchronisation. Waits should be used that can compensate for the response time dynamically, ensuring that the asynchronous task has finished executing before proceeding with the test. Hard-coded wait times are really not ideal, since these tend to introduce problems and flakiness in testing when there are delays in other dependencies – as we’ve discussed, maybe a network delay or slow connection.

Test data dependency

Flakiness can also arise when tests share mutable data or state. If multiple tests interact with the same database entries or files, changes in the state can result in unpredictable outcomes – dictated by flaky test detection challenges. For example, if one test alters a piece of data that another test relies on, the second test could fail due to the fact that the second test wasn’t expecting or accounting for the change in data.

To avoid this you should always use isolated and immutable test data. Each test should independently run rather than depend on the state of other tests. Techniques like dependency injection and mocking can help create isolated environments for tests, minimising data-related conflicts.

External system dependencies

It’s unfortunately a very common perception that tests integrated with an external system, such as a third-party API, a database, or other services, will end up being flaky or inconsistent. The variance in stability and responsiveness for such systems can contribute to tests failing unpredictably. Additionally if you are using external services, you have no bearing or control on when they are available. If a third party service is unavailable because of network partitions, adjustment of upstream dependency limits, or just plain old maintenance, it can easily cause unwanted test failures that have nothing to do with the functionality of the application being tested.

Ideally you would work around this by using mock services or stubs, which imitate third-party systems. This isolates test behaviour from the unknown of external services, therefore providing you with consistent testing outcomes regardless of any external dependency availability or performance.

Environment-related flakiness

Differences in testing environments – such as variations in browsers, operating systems, or hardware – can lead to flakiness. A test might pass in one browser but fail in another simply because of subtle differences in how each handles operations. Similarly, discrepancies in system configurations can result in environment-specific failures.

This is easily resolved by using containers (Docker) or virtualisation to standardise the testing environment. By ensuring a consistent environment across all platforms, you can eliminate these kinds of discrepancies that can cause flaky behaviour.c

Identifying and diagnosing flaky tests

First and most important, flaky test identification is the first step towards mitigation. This includes an interpretation of when and why these tests fail at certain instances and what may be the reasons for the failure, so that your team can address the root causes efficiently and not waste their time chasing false positives.

Running test cases multiple times and across different environments can help differentiate flaky failures from actual bugs. This analysis allows testers to identify which failures are intermittent and which require further investigation.

Consistent logging and debugging practices

It’s impossible to diagnose flaky tests without proper error handling and logging. Detailed logs allow you to trace exactly when a test has failed, and if your testing is failing intermittently then logs can give important clues leading you to the problem. For example, detailed logging might show that test failure is caused by certain sequences of events, environmental factors, or interactions with specific external systems.

Switching on extended logging in your test suite can also capture the required information during the different phases of test executions. Some (like T-Plan Robot) will even provide you with screenshot and video captures to provide context of exactly what happened during the test. This kind of visual evidence is key in identifying intermittent failures caused by unexpected UI changes or timing-related issues.

Run tests multiple times

You should be running tests multiple times and across several different environments to help determine flaky tests from actual failures. A test that sometimes fails but then manages to pass on a rerun without code changes is a good indicator of flakiness rather than an actual incident or bug that needs to be resolved. This will help you identify patterns that will show you exactly what is flaky and what is something that genuinely needs to be looked at by your developers.

In this regard, test retrying strategies might come in useful. Running the same test a number of times will let you figure out whether failures happen randomly or consistently. Furthermore, using flakiness detectors such as Jenkins Flaky Test Handler – which automatically detect and flag tests whose results are inconsistent – will help simplify the process of identifying these tests.

Strategies to reduce flakiness in testing

Once you have identified which tests are flaky and which tests show you genuine issues, the next step is to implement strategies that may reduce flakiness. This section will help you produce more stable tests with more consistent results, so you aren’t wasting time investigating issues that aren’t there.

Improve test isolation

Inter-test dependencies are a common source of flakiness; often caused by shared states or reliance on other tests. If one test changes something in shared state – such as changing the value of a database entry – a test that depends on the same resource may fail and produce inconsistent results. The best way to avoid this is to ensure that every test is completely independent and self-contained. As much as possible, avoid tests having shared states by using techniques like dependency injection and mocks to isolate tests from other tests and external systems. This reduces the occurrence of unexpected interactions, hence creating a most stable testing environment.

Optimise test execution

Parallel testing can introduce race conditions, causing flaky outcomes of tests accessing shared resources simultaneously or running on the same thread. You can fix this by ensuring thread safety in your test suite: avoid sharing states, and use proper synchronisation mechanisms. Implement containers or virtual environments to create isolated and consistent testing conditions. Containerisation ensures that your tests execute their course in a controlled setting and reduces the influence of external factors, such as differing system configurations.

Automated flaky test management

To stop flaky tests from affecting the outcome of your entire test suite, they must be treated proactively. That means you’ll be adding automated re-runs for your tests within your continuous integration pipeline. This will help in differentiating failures – which one is real and which one is intermittent. If there is an intermittent failure, run such a test again to see whether it still fails. Quarantine known flaky tests by separating them out from the rest of the test suite so it cannot affect the overall stability. This allows the main test suite to be reliable, while flaky tests are systematically reduced to their tangible roots, ensuring they do not hinder development cycles.

Tools to mitigate test flakiness

Effectively managing test flakiness requires the right tooling to manage, detect, and address unstable tests. Key tools that can assist with this process are:

Test automation tools with retry mechanisms

Modern test automation tools often come equipped with retry mechanisms that can help mitigate flakiness. For example, T-Plan’s automation platform offers configurable retry options, allowing tests that fail due to temporary issues to be rerun automatically. This helps distinguish between true failures and those caused by transient issues like network latency or temporary service unavailability. However, while retry mechanisms are helpful, they must be judiciously used so as not to mask other underlying problems that require attention.

Monitoring and reporting tools

Detection and monitoring tools are an integral part of managing flaky tests. Tools like the T-Plan reporting suite give teams insights into the frequency at which a flaky test crops up, its behaviours and more. These tools can automatically flag unstable tests and offer detailed logs and metrics to analyse root causes. Such tools enable teams to proactively monitor test suite health, prioritise which flaky tests to tackle first, and then apply targeted solutions to improve overall stability.

Final thoughts on flaky testing

Flakiness in testing breeds chaos in software development due to the confusion and delays it can cause. Unless resolved, a lot of time is lost in sifting out real issues from false alarms. Identifying the common causes and implementation of effective mitigation strategies is ultimately going to save you time and money and allow you to ship a higher quality product.

T-Plan offers robust tools to identify and manage flaky tests, with features like configurable retry mechanisms and detailed logging. Integrating T-Plan into your testing strategy can enhance test stability and streamline your development process. Learn more about T-Plan’s automation features.

Get in touch and find out how T-Plan’s automation platform can help your business identify and eliminate flaky tests to deliver a more reliable testing process.