Why Automated Tests Fail (and How to Prevent It)

“Just re-run the pipeline.”

If this phrase is familiar in your team, you have a test automation problem. Specifically, you have tests that lie — tests that fail for reasons unrelated to whether your code works. They’re called flaky tests, and they are more dangerous than having no tests at all.

When a test suite becomes unreliable, engineers stop trusting it. When they stop trusting it, they stop acting on failures. When they stop acting on failures, real bugs slip through. The test suite becomes a bureaucratic checkbox: something you run because CI requires it, not because it catches real problems.

In this post I’ll go through the five most common root causes of automation failure and give you concrete, actionable fixes for each.

Why Flaky Tests Are Worse Than No Tests

A passing test suite that contains flaky tests gives you false confidence. You see green and think everything is fine — but you don’t know if green means “the code works” or “the flaky tests happened to pass this time.”

There’s a simple rule: a test that doesn’t reliably tell you when something is broken is worse than no test. It consumes maintenance time, slows CI, and erodes trust in the entire suite.

The goal of test automation is fast, reliable feedback. Every flaky test is an attack on that goal.

Root Cause #1 — Timing and Async Issues

This is the single most common cause of flaky E2E and integration tests.

The problem

Tests that use hardcoded waits:

// Bad: fragile and platform-dependent
await page.waitForTimeout(2000);
await expect(page.getByText('Order confirmed')).toBeVisible();

If the page loads in 1.8 seconds in CI and 2.3 seconds on a slow GitHub Actions runner, this test will fail intermittently.

The fix

Always wait for a specific condition, never for a fixed duration:

// Good: waits until the condition is true, up to a timeout
await expect(page.getByText('Order confirmed')).toBeVisible();

// Good: wait for network idle when loading data
await page.waitForLoadState('networkidle');

// Good: wait for a specific element state
await page.getByRole('button', { name: 'Submit' }).waitFor({ state: 'enabled' });

Playwright’s built-in auto-waiting handles most cases automatically — but only if you use the right assertions. expect(locator).toBeVisible() retries automatically. locator.isVisible() does not.

Rule: If you have waitForTimeout in your tests, treat it as a bug to fix.

Root Cause #2 — Bad Selectors

The problem

Selectors that are tightly coupled to implementation details break whenever a developer refactors the UI — even when the functionality is unchanged.

// Bad: brittle selectors
await page.click('div > div:nth-child(3) > button');
await page.click('#app > main > form > div.btn-container > button[type="submit"]');
await page.click('.MuiButtonBase-root-5');  // generated class name

These selectors break on:

Any DOM restructuring
CSS class renames
Component library updates
CSS-in-JS generating new class names

The fix — Playwright’s recommended priority

Playwright recommends selectors in this order:

// 1. Role-based (best) — mirrors how users and screen readers see the page
await page.getByRole('button', { name: 'Place Order' }).click();
await page.getByRole('textbox', { name: 'Email address' }).fill('test@example.com');

// 2. Label-based — for form inputs
await page.getByLabel('Password').fill('s3cr3t');

// 3. Placeholder-based
await page.getByPlaceholder('Search articles...').fill('playwright');

// 4. Text-based — for links and content
await page.getByText('View all orders').click();

// 5. Test ID — explicit marker, stable across refactors
await page.getByTestId('submit-order-btn').click();

// 6. CSS/XPath — only as a last resort
await page.locator('[data-testid="submit-order-btn"]').click();

Add data-testid attributes to elements that need stable selectors. They’re invisible to users, ignored by styling, and survive refactors:

<button data-testid="submit-order-btn" type="submit">Place Order</button>

Rule: If a selector contains numbers, .nth-child, or generated class names, it’s a bug.

Root Cause #3 — No Test Data Isolation

The problem

Tests that depend on data that exists in the environment — rather than data they create themselves — are fragile in multiple ways:

// Bad: depends on a user existing in the database
await loginAs('testuser@example.com', 'password');
await expect(page.getByText('Welcome, Test User')).toBeVisible();

What happens when:

Someone deletes that user while debugging?
Tests run in parallel and two tests modify the same user?
The test database is wiped and re-seeded differently?

The fix

Each test creates and owns its data:

// Good: test creates the user it needs via API
test.beforeEach(async ({ request }) => {
  const response = await request.post('/api/users', {
    data: { email: 'test-user-unique@example.com', password: 'Test1234!' }
  });
  userId = (await response.json()).id;
});

test.afterEach(async ({ request }) => {
  await request.delete(`/api/users/${userId}`);
});

For database-heavy tests, use TestContainers:

// .NET integration test — spins up a real PostgreSQL instance per test
await using var container = new PostgreSqlBuilder()
    .WithImage("postgres:16")
    .Build();
await container.StartAsync();

var connectionString = container.GetConnectionString();
// run your tests with full isolation

Rule: No test should depend on data it didn’t create. No test should leave data behind.

Root Cause #4 — Environment Dependency

The problem

“Works on my machine” is a classic symptom of environment-dependent tests.

Common causes:

Tests use a hardcoded path (C:\temp\upload\) that doesn’t exist in CI
Tests depend on a local timezone (DateTime.Now giving different results)
Tests use a local service running on localhost:5432 (not in CI)
Tests assume a specific locale for number/date formatting

The fix

Containerise everything:

# docker-compose.test.yml — CI gets the same environment as local
services:
  app:
    build: .
    environment:
      - TZ=UTC
      - LANG=en_GB.UTF-8
  db:
    image: postgres:16
    environment:
      - POSTGRES_PASSWORD=test
  playwright:
    image: mcr.microsoft.com/playwright:v1.45.0-jammy
    depends_on: [app]

Lock time-sensitive behaviour:

// In tests, inject a fixed clock instead of using DateTime.Now
var fixedClock = new FixedClock(new DateTime(2025, 1, 1, 12, 0, 0, DateTimeKind.Utc));
var service = new OrderService(fixedClock);

Rule: Tests should pass in any environment where the container runs. If a test only passes on your machine, it’s broken.

Root Cause #5 — No Automation Strategy

The problem

The most insidious failure mode isn’t technical — it’s strategic. Teams write automated tests without a strategy, which typically means:

Automating everything: Every manual test case gets a Playwright script. You end up with 800 E2E tests that take 4 hours to run.
Checklist automation: Manual test cases are transcribed 1:1 into automation. The tests verify the steps, not the behaviour.
No risk prioritisation: Time is spent automating low-risk, rarely-changing paths while critical, risky features have no automation at all.

The fix

Define your automation strategy before writing the first test:

What is the purpose of this test suite? Fast feedback (unit), regression safety (integration), user journey confidence (E2E)?
What are the highest-risk areas? Checkout, authentication, payment, data migration?
What should NOT be automated? Visual design, subjective UX, one-time data migrations.

Apply the ROI check to every test:

Question	Good answer	Bad answer
What bug has this caught recently?	”The cart total calculation broke twice last quarter"	"None that I know of”
How long does maintenance take per sprint?	< 30 minutes	> 2 hours
Does it fail for the right reasons?	Only when code behaviour changes	Also on network hiccups, UI refactors

Delete tests that don’t earn their place. A test that hasn’t caught a real bug in six months and breaks regularly should be deleted, not fixed. It’s noise.

Automation Health Checklist

Use this to audit your current test suite:

☐ No hardcoded waits (waitForTimeout, Thread.Sleep, time.sleep)
☐ Selectors use role/label/test-id, not DOM structure
☐ Each test creates and cleans up its own data
☐ Tests pass consistently in CI/CD (Docker environment)
☐ Each automated test has a documented purpose
☐ Full suite runs in under 10 minutes on CI
☐ Flaky test rate is below 2% (tracked in CI dashboard)
☐ Tests are reviewed in code review, not just added
☐ There is a written automation strategy, not just a test folder

Conclusion

Automated tests that lie are not an asset — they are a liability. They consume engineering time, slow pipelines, and create false confidence.

The good news: most of these problems have well-known solutions. The challenge is prioritising them. Pick one root cause from this list, pick the most painful test in your suite that exhibits it, and fix it this week. Then pick the next one.

Clean test suites don’t happen by accident. They’re the result of treating test code with the same engineering discipline as production code.

This is the fourth post in the Series 1 — QA General series.