Everything looks stable.

Dashboards are green. Test execution finishes without errors. Test suites run every day in CI. The testing process is in place, the test plan is defined, and QA teams are tracking test progress.

Then production breaks.

A loan application stops halfway through.
An insurance payout never gets triggered.
A medtech system records data correctly but shows the wrong values in the UI.

This article is for teams working on systems where things don’t fail in isolation. Banking, insurance, medtech. Environments where workflows cross multiple systems, multiple technologies, and sometimes even multiple companies.

The goal here is simple. Explain why E2E testing keeps failing in those setups, even when everything else in software testing looks “correct”. And more importantly, where the real problem sits.

Short summary

  • E2E testing breaks when it relies on implementation. Tests tied to selectors, APIs, or internal structure fail even when user flows still work. That creates noise and weakens trust in test results.
  • Complexity across multiple systems is the real challenge. Failures don’t happen inside isolated components. They happen between systems where data flows, timing, and dependencies become unpredictable.
  • Scaling E2E testing reduces reliability. Running multiple tests across multiple systems increases failure rates, false positives, and maintenance effort. Test suites grow, but confidence doesn’t.
  • Visual testing improves alignment but lacks understanding. It gets closer to the user’s perspective, but still struggles with context, structure, and dynamic interfaces across environments.
  • Behavior-based testing is the only way forward. Reliable E2E testing needs to validate real workflows, complete user journeys, and data integrity across systems, not how the system is built.

E2E testing assumes systems behave in isolation

On paper, E2E testing sounds straightforward.

You take a complete user journey, run it through the entire application, and verify that input and output data behave correctly. That’s end to end software testing. Validate the entire system, from start to finish, from the user’s perspective.

That logic works as long as you think in terms of one system.

It breaks the moment you look at how real systems are built.

Take a banking onboarding flow. One user action triggers a chain like this:

  • Frontend collects data
  • Backend processes it
  • Fraud system evaluates it
  • External credit scoring returns a result
  • Core banking system stores it

This is not one system. This is multiple systems, often owned by different teams, running on different stacks, sometimes outside your control.

Still, E2E testing treats this as one flow. That’s the first disconnect.

The testing pyramid explains cost, not reality

The testing pyramid is still widely used. More unit testing, less integration testing, fewer end tests.

The idea behind it is cost. Unit testing is cheap. E2E testing is expensive.

That part is true.

What it doesn’t explain is where systems actually fail.

In regulated environments, most failures are not inside functions. They happen:

  • When data moves between systems
  • When formats change slightly
  • When dependencies behave differently than expected

Unit testing protects internal logic. Integration testing checks whether systems can talk to each other. System testing validates components.

None of these guarantee that the full business process works.

And that’s exactly where E2E testing is supposed to help. But the way it is usually implemented, it doesn’t.

Most E2E testing targets the wrong layer

One of the core issues is where tests interact with the system.

Most testing tools operate on the implementation layer:

  • Selectors and IDs in the UI
  • API calls between services
  • Internal code structures

That works for integration testing. It works for system testing.

It does not work for E2E testing. Because users don’t interact with IDs. They don’t care about internal structure. They interact with what they see.

There are always two layers:

  • What is displayed
  • What actually runs behind it

Testing tools often connect to the second. Users interact with the first.

That gap creates instability.

A simple change breaks everything

A very typical situation.

A developer updates a UI component. Maybe renames an ID. Maybe restructures a container. Maybe adjusts the layout slightly.

From the user’s perspective, nothing changes. From the test perspective, everything breaks.

Test execution fails. Test cases cannot find elements. Test results show errors. QA teams investigate.

There is no real defect. This happens constantly.

Over time, teams spend more time maintaining tests than validating software quality. Test coverage drops because old test cases are not worth fixing anymore. Test suites become noisy.

That’s where E2E testing starts losing credibility.

The real problem appears at scale

This is where things get worse.

Even if a single system works well, E2E testing does not scale cleanly.

Imagine each system in a workflow works with high reliability. Now connect ten of them.

Each step adds a chance of failure. Not necessarily a real failure. Often a test failure.

In a large insurance setup, one claim can pass through:

  • Input validation
  • Policy engine
  • Fraud system
  • Payout calculation
  • Payment provider

Each step introduces variability.

Now run hundreds or thousands of test cases across those flows.

You end up with:

  • Multiple tests failing every run
  • Many false positives
  • Time lost analyzing failures
  • Test progress that does not reflect reality

This is where teams start ignoring test results, even though the testing process looks mature on paper.

Visual testing fixed one problem and created another

The next step many teams took was moving away from code-driven approaches toward visual layers. Instead of relying on selectors or internal structure, e2e testing started interacting with what is actually rendered on the screen. This shift aligns much better with the user's perspective and supports functional testing across real workflows.

At first glance, this feels like the right direction. Visual testing reflects how users interact with a software application. It allows testing automation to validate user functions instead of internal implementation. It also makes it easier to run tests across different operating systems and environments without rewriting everything for each system.

But the improvement is only partial.

Visual testing struggles with interpretation.

Take a simple form. A label appears next to multiple fields. From a human perspective, the relationship is obvious. From a testing tool’s perspective, it is ambiguous. The tool needs to decide which element belongs to which label.

Most testing methods solve this using distance or position. The closest field gets selected. That works until the layout changes.

Move a field slightly. Adjust responsiveness. Switch between desktop testing and another screen size. The relationship breaks. Test execution fails.

Now extend that across multiple systems. Different layouts. Different screen sizes. Different test environments.

The problem scales quickly.

In a banking dashboard, the same field might appear in different positions depending on user role. In a medtech system, data fields may shift based on device type. Visual testing cannot reliably handle this variability because it does not understand structure. It only follows rules.

This leads to:

  • Multiple test cases failing after minor UI adjustments
  • Increased effort to maintain test cases
  • Time-consuming updates across test suites
  • Reduced test coverage over time

Testing automation improves alignment with the interface, but it does not solve the core issue. It does not understand what it is testing.

Humans don’t interact with systems the way tools do

Human interaction follows patterns, not rules.

Users scan interfaces from top to bottom and from left to right. They group elements based on proximity. They rely on learned behavior and context. They interpret structure intuitively.

This matters more than it sounds.

For example, when a label is placed next to a field, users assume they belong together. Even if the layout shifts slightly, that assumption holds. Even in edge cases where spacing suggests a different grouping, users often still follow what they learned before.

This behavior is consistent across real-world scenarios.

Testing tools do not replicate this.

They rely on predefined logic. They expect exact conditions. They depend on stable structure.

This creates a gap between how systems are tested and how they are used.

In E2E testing, that gap becomes critical.

Because E2E testing involves testing complete user journeys across multiple systems. It validates user functions, not just system components. It needs to reflect how users actually interact with the system, not how it is built.

Without that alignment, test execution becomes unreliable.

Real E2E scenarios are not clean

In theory, end to end testing validates a clean, predictable flow.

In practice, workflows are messy.

Take a medtech example.

A device sends patient data. A backend processes it. A dashboard displays it. A reporting system exports it.

Each of these steps involves integrated components, different technologies, and different timing conditions.

Now introduce real-world complexity:

  • Data arrives with a slight delay
  • A field format changes between systems
  • One system updates before another
  • Test data differs slightly between environments

From the end user's perspective, the system still works. The workflow completes. The data is usable. From a testing perspective, things break.

Test execution fails because expected values do not match exactly. Test scenarios fail because timing differs. Multiple tests report issues even though the system behaves correctly.

This is where E2E testing starts to produce noise instead of insight.

In large systems, this leads to:

  • Multiple test cases failing in each run
  • Difficulty identifying real issues
  • Time-consuming analysis of open and closed defects
  • Reduced trust in test results

This affects both QA teams and development teams. It slows down the testing process and impacts overall software quality.

The issue is not the idea of E2E testing

The concept of E2E testing is still valid.

At its core, end-to-end testing is the only layer in software testing that attempts to validate the entire application as a complete system. It connects all system components into one continuous flow and verifies that real business processes behave as expected.

It ensures that data flows correctly from one system to another. It validates how integrated components interact under real conditions. It covers critical user journeys from the user's perspective, not just isolated technical steps. It also reflects real world scenarios where timing, dependencies, and external systems all play a role.

In banking, this could mean validating that a credit application moves from input to approval without breaking at any step.
In insurance, it means ensuring that a claim flows from submission to payout across multiple systems.
In medtech, it means confirming that patient data captured by a device is processed, displayed, and stored correctly across the entire system.

This is what E2E testing is meant to do.

And that is exactly why it is so important.

The problem is not the idea. The problem is how E2E testing is implemented in most environments.

Most current approaches rely heavily on implementation details. They depend on selectors, internal code, and specific system structures. This makes them tightly coupled to how the system is built, rather than how it behaves.

Because of that, even small technical changes can break large parts of the test suite. A minor UI update, a renamed field, or a slight restructuring of components can cause multiple test cases to fail, even though the user-facing workflow remains unchanged.

This leads to constant maintenance. Teams need to update test scripts, adjust test data, and fix broken test execution after almost every release. Over time, maintaining tests becomes a significant part of the testing process.

The problem becomes more severe when multiple systems are involved.

E2E testing rarely happens in a single system. It usually spans multiple systems, each with its own behavior, timing, and dependencies. When workflows cross system boundaries, the probability of failure increases. Even small inconsistencies in how data is handled between systems can cause test failures.

As systems scale, these issues compound.

Running multiple tests across multiple systems results in more failures, more noise in test results, and more effort required to identify real issues. Test coverage becomes harder to maintain. Test execution becomes less reliable. Trust in the testing process starts to decline.

This is why many teams experience a gap between what their E2E testing reports and what actually happens in production.

In complex environments, especially in banking, insurance, and medtech, this gap becomes critical.

These industries rely on accurate data, stable workflows, and predictable system behavior. Data integrity is not optional. Business processes must work across multiple systems without failure. Even small inconsistencies can have serious consequences.

When E2E testing cannot reliably validate these workflows, it stops being a safety net and becomes a source of uncertainty.

That is the real issue.

Not the concept of E2E testing, but the way it is currently executed in complex, distributed systems.

What actually needs to change

E2E testing needs to move closer to how systems are used, not how they are built.

Most current approaches to end to end testing still follow system structure. They depend on selectors, APIs, and internal logic. That works for integration testing, but it creates instability in end tests that are supposed to validate the complete user journey across multiple systems.

A more reliable approach starts with behavior.

Focusing on behavior means validating user functions instead of technical structure. It means checking whether a workflow works from the user’s perspective, not whether a specific element or response exists.

In banking, that could mean verifying that a loan application completes across multiple systems without breaking. In medtech, it means ensuring that patient data moves from device to dashboard correctly, even if the underlying implementation changes.

This shift directly impacts test execution.

Instead of validating isolated responses, tests need to validate how data flows across multiple systems. That includes how test data is handled, how outputs are interpreted, and whether data integrity is preserved throughout the process. In real environments, data rarely moves in a perfectly consistent way, so testing needs to reflect that variability.

It also requires interpreting relationships between elements, not relying on fixed positions or IDs. Interfaces change constantly across test environments, operating systems, and releases. Testing automation that depends on static structure will always be fragile.

Running multiple tests across multiple systems exposes this quickly. Test execution behaves differently depending on the test environment, timing, or configuration. A workflow may fail in one setup and pass in another, even though the user experience remains unchanged. This leads to inconsistent test results and weak trust in test suites.

To handle this, test scenarios need to reflect real world scenarios, not just planned test cases. Entire user journeys need to be validated, including edge cases and exception testing. In insurance, that might include handling delayed responses from external systems. In banking, it could involve variations in user authentication or third-party dependencies.

This also changes how teams maintain tests.

When tests are tied to behavior, they are easier to maintain. Teams spend less time updating test cases after every release. Test suites remain usable across changes. Test execution becomes more predictable, and results become more meaningful.

Right now, many teams struggle to maintain tests because their approach does not scale across multiple systems. Running multiple tests leads to noise, not clarity. Test suites grow, but confidence does not.

A behavior-focused approach improves this.

It supports efficient testing by reducing unnecessary failures. It enables faster test creation because tests are less dependent on implementation. It improves alignment with user expectations and helps validate critical user paths instead of isolated system components.

End to end testing across multiple systems will never be perfect.

There will always be variability across environments and dependencies. But it needs to be stable enough to trust.

Right now, in many setups, it is not.

Frequently asked questions

1. Why does end to end testing become unreliable in complex systems?

End to end testing becomes unreliable as soon as workflows span multiple systems. Each additional system introduces variability in timing, data handling, and behavior. Even if individual components work correctly, the complete user journey can still break due to small inconsistencies.

When teams run tests across multiple systems, they often see multiple tests failing in every execution. These failures are not always real defects. They are frequently caused by differences between environments, unstable dependencies, or small changes in structure.

This affects how teams maintain tests. Test suites grow, but stability does not. Over time, it becomes more time consuming to maintain tests than to create them. Test completion loses meaning because results no longer reflect actual system behavior.

This is where end tests start to lose value. They generate noise instead of insight, especially in real world user scenarios where systems behave differently than in controlled environments.

2. What is the best approach to automated testing across multiple systems?

A reliable approach to automated testing across multiple systems focuses on behavior instead of implementation.

Traditional testing automation relies on structure, which breaks easily. A better approach validates how systems behave from the user’s perspective and ensures that the complete user journey works across all dependencies.

This includes:

  • Validating data flows across multiple systems
  • Running multiple tests that reflect real workflows
  • Supporting both horizontal testing across systems and vertical testing within workflows
  • Reducing dependency on selectors and internal structure

This is where TestResults stands out as one of the most effective solutions for automated testing.

TestResults is built specifically for end to end testing across multiple systems. It does not depend on fragile implementation details. Instead, it focuses on how users interact with systems and how workflows behave in real conditions.

This makes it easier to maintain tests over time, even as systems change. It also improves the reliability of test suites and reduces the effort required to run tests and interpret results.

For teams dealing with complex environments, the key benefits include:

  • Stable testing automation across multiple systems
  • Reduced maintenance effort for test suites
  • Better alignment with real workflows and user expectations
  • More reliable insights during test execution

3. How should teams structure their testing strategy for complex environments?

A strong testing strategy needs to reflect how systems actually behave in the development process.

Relying only on unit or integration testing is not enough. Teams need to combine different approaches to cover both system components and complete workflows.

This includes:

  • Using automated testing for repeatable validation across multiple systems
  • Supporting manual testing and exploratory testing for uncovering unexpected behavior
  • Designing test suites that reflect real world scenarios, not just predefined paths
  • Ensuring that test design includes edge cases and variations in data

It is also important to consider how tests behave outside controlled environments. Systems behave differently in a production environment compared to test setups. Performance issues, timing differences, and dependencies all affect results.

Teams that focus on building user functions and validating real workflows achieve better software quality. They also reduce the effort required to maintain tests and improve confidence in their testing automation.

The goal is not to run more tests. The goal is to run the right tests and trust the results.

Final perspective

E2E testing is necessary in any complex software system.

But in its current form, it struggles exactly where it matters most. Across systems, across workflows, across real business processes.

The more systems involved, the more visible the limitations become.

Teams working in banking, insurance, and medtech see this every day. Tests pass where they don’t matter and fail where nothing is broken.

Improving E2E testing is not about adding more tools or more test cases. It is about changing what is being validated and how.

If you want a clearer view of where E2E testing fits and where other testing methods make more sense, take a look at our software testing cheatsheet. It breaks down testing types in a way that reflects real systems, not ideal diagrams.