Seven shifts reshaping how software quality works — from AI agents writing your tests to the changing reality of what QA engineers do all day.
Something shifted in the last two years that didn’t get enough attention. Not the tools — though those changed too. The actual nature of the work changed.
QA engineers who spent most of their time writing scripts and running regression cycles are now being asked to govern AI-generated test output, validate non-deterministic systems, and translate quality risk into language that makes sense to a CFO. That’s a different job.
A lot of what’s driving this is the usual suspects: AI coding assistants, faster deployment cycles, microservices everywhere. But the less-discussed part is that the testing tools themselves are changing what’s possible. AI-powered agents can explore an application without being told what to test.
Defect prediction models flag high-risk code before a test is written. Self-healing scripts stop breaking every time a UI element moves three pixels.
None of this means QA is getting easier. If anything, it’s getting harder in the ways that matter — more judgment, more context, more communication with stakeholders who want answers about risk, not just defect counts.
The teams handling it well aren’t the ones with the most advanced tooling. They’re the ones who got clear on what they’re actually trying to accomplish with quality engineering, then picked tools accordingly.
The global software testing market sat at $55.8 billion in 2024. It’s projected to hit $112.5 billion by 2034 — a 7.2% CAGR. Automation testing specifically is growing from $28.1B to $55.2B between 2023 and 2028, nearly doubling in five years. (Thinksys QA Trends Report 2026)
Here are the seven trends worth paying attention to this year — not buzzwords, but things changing how QA teams work.
1. Agents Are Taking Over Test Execution. That’s Not Necessarily Bad.
For fifteen years, test automation meant one thing: you wrote scripts. Defined the path. Defined the expected output. Ran it. Checked the result. The frameworks got fancier, the tooling improved, but the model stayed the same — a human decided what to test and encoded that decision into code.
Agentic testing doesn’t work that way. Instead of running a predefined script, an AI testing agent reads the application through specs, API contracts, and prior user sessions, and starts exploring.
It generates its own test paths. Runs them. Watches what breaks. Adjusts. Runs more. It’s closer to what a good exploratory tester does manually, except it runs at machine speed and doesn’t need sleep.
What changes in practice: agents catch edge cases nobody thought to write a test for. When the UI shifts, they adapt rather than failing hard.
They can run continuously against production traffic, testing in conditions your staging environment can’t replicate. And they can chain API tests, UI checks, and performance assertions into single coordinated sequences that would take a human engineer half a day to wire up.
77.7% of QA teams have already moved to AI-first quality engineering approaches. 74.6% run two or more automation frameworks simultaneously — a sign of how complex the tooling landscape has become. (Thinksys QA Trends Report 2026)
The catch: agents still need oversight. They generate false positives. They optimise for the wrong things if you don’t define quality objectives clearly.
The work shifts from writing individual test cases to writing the goals that agents operate against — success criteria, risk thresholds, coverage priorities — and then reviewing what they produce. That’s a skill most QA teams are still developing.
Teams investing now in how to write testable quality objectives and how to govern AI-generated test output will be ahead of the curve. Teams still writing every test case by hand by the end of 2026 won’t be behind because they lack access to the tools. They’ll be behind on speed.
2. AI Writes a Lot of Your Code Now. Someone Still Has to Test It.
GitHub Copilot, Cursor, Claude — these are now standard developer tools in most organisations. A significant chunk of first-draft production code is AI-generated. The productivity argument is real: developers ship faster.
Here’s the problem nobody talks about enough. AI generates code that looks right before it is right. The model produces something syntactically clean, structurally plausible, and review-friendly. It gets committed.
Then, somewhere in integration, or in a production edge case, or in a security scan three sprints later, the cracks appear. The model didn’t know your codebase. Didn’t know your users. Didn’t know the business logic that lives in one engineer’s head and never got written into a doc.
Over 70% of developers say they routinely rewrite or refactor AI-generated code before it’s production-ready. More than half of AI-generated code samples studied contain logical or security flaws — even when the code looks correct on the surface. (Parasoft, 2025)
Some QA teams have started building specific review criteria for AI-assisted commits — things to check that standard code review misses. Hallucinated API calls referencing functions that don’t exist.
Security anti-patterns sourced from low-quality training data. Business logic is the model guessed at rather than inferred from context. Integration failures that emerge where AI-generated components meet human-written ones.
There’s also a volume problem. AI makes developers faster. The volume of code needing validation grows faster than testing capacity does — unless your automation pipeline is keeping up.
If AI-assisted commits aren’t triggering additional static analysis, security scanning, and targeted test runs, you’re accumulating quality debt quietly. It shows up in production incidents, not in sprint reviews.
Read Blog: Benefits of Digital Transformation for Small Businesses
3. Shift Left Is Half the Answer. Shift Right Is the Other Half.
The shift-left argument is still valid. Catch defects early, when fixing them is cheap. Don’t wait until the end of the sprint to find out the feature is broken. This hasn’t changed.
What has changed is the assumption that pre-release testing is sufficient. It isn’t. No amount of thorough test coverage in staging fully replicates what happens when real users hit the system. Traffic patterns are messier. Data is weirder. Users do things nobody scripted.
You find this out from your monitoring tools — or from your customers. Given the choice, most teams would prefer to find out from monitoring.
Shift right closes that gap. Testing in production, monitoring live behaviour, running canary releases with built-in quality gates, and feeding real user signals back into test planning.
The goal isn’t to replace pre-release testing. It’s to treat release as the beginning of a validation loop, not the end of one.
By 2026, 70% of DevOps-driven organisations are expected to run shift-left and shift-right together as a combined quality model. 40% of large enterprises will have AI assistants embedded directly in their CI/CD pipelines — selecting tests, analysing logs, triggering rollbacks. (IDC / Valido AI, 2025)
What this looks like operationally: quality gates at commit, integration, staging, canary, and production monitoring — each as automated as possible, with human review reserved for actual judgment calls.
The QA-DevOps boundary also blurs. In mature teams, testing doesn’t hand off to operations at deployment; it runs through the whole pipeline. The line between ‘running a test’ and ‘monitoring a service’ stops being meaningful.
4. How Do You Test a System That Gives Different Answers Every Time?
Traditional test automation rests on one assumption: same input, same output, every time. That’s the whole model. Define the expected result, run the test, and check whether it matches.
AI systems break that assumption completely. Ask a language model the same question twice, and you get two different answers. Both might be fine. One might be better. No string comparison tells you. The whole foundation of deterministic assertion testing doesn’t apply.
Teams are working around this in a few ways, none of them as clean as the old model.
Probabilistic assertion checks whether outputs fall within an acceptable statistical range rather than hitting an exact value. LLM-as-evaluator uses a separate model to judge whether a response is semantically correct, which works until you must explain a failed test to a compliance auditor who wants a binary pass/failure.
Prompt regression suites track how AI behaviour drifts over time as models get updated or fine-tuned.
Emerging standards like Model Context Protocol (MCP) and Agent2Agent (A2A) are formalising how AI components talk to each other across services — creating a new category of integration testing around tool permissions, agent coordination, and cross-service failure modes. (Parasoft, 2025)
Multi-agent pipelines make this harder again. When AI agents chain actions across multiple services — which is now common in enterprise automation — a bad output from one agent becomes the input to the next.
By the time a human sees the error, it’s three steps deep in a workflow. Testing these systems means understanding the interaction patterns between agents, not just how each one behaves in isolation.
Nobody has fully solved this yet. The teams doing it best are the ones treating non-determinism as a property to be managed, building prompt regression suites, confidence scoring, and model drift monitoring into their quality programs.
It’s a different discipline from functional QA. Most teams are still building the muscle.
5. Security Testing Can’t Sit in Its Own Lane Anymore
Security used to be a separate workstream. A specialist team ran it. A penetration test happened before a major release. A compliance checklist got ticked. QA ran the functional tests. The two teams worked in parallel and mostly didn’t talk.
That separation no longer holds, not when AI introduces attack surfaces that functional testing completely misses. Prompt injection. Model inversion. Data poisoning. Adversarial inputs designed to make a model ignore its instructions.
These aren’t theoretical scenarios. They’re being actively exploited in production systems. A test confirming your AI chatbot returns the right format for a customer query won’t catch whether that same chatbot can be jailbroken through a malicious input.
85%+ of organisations now cite AI-related vulnerabilities as their fastest-growing cyber risk. A ransomware breach now averages $4.91 million in damages — one of the clearest arguments for shifting security testing from a compliance activity to a business-risk priority. (TestGrid / Verizon, 2026)
Regulatory pressure is adding urgency. The EU AI Act puts testing and documentation obligations on organisations using high-risk AI systems. Financial services and healthcare regulators are piling on requirements around explainability and bias validation.
Treating this as a post-launch problem is expensive — missed windows, compliance rework under pressure, and documentation that must be reconstructed rather than built alongside the product.
For QA teams, the practical implication is that security validation must live in the pipeline, not bolt on before launch. SAST and DAST are embedded in CI/CD. Automated dependency scanning on every build.
AI-specific red-teaming is built into release cycles, not run ad-hoc once a year. QA engineers don’t need to become penetration testers. But they do need enough security literacy to catch AI-specific red flags as part of standard quality work, and to stop treating security as someone else’s problem.
6. The Role of Humans in Testing Is Shrinking in the Right Ways and Growing in the Important Ones
There’s a version of the AI testing story that ends with humans barely involved: AI writes the code, generates the tests, runs the tests, reviews the results, signs off on release. This is not where things are going, at least not in any timeframe that matters to decisions being made today.
AI testing tools are genuinely good at the things humans are bad at: running thousands of test variations simultaneously, maintaining consistency across sprawling test suites, catching regressions at a speed no human team matches. But they have blind spots.
They don’t know which tests matter. They can’t read between the lines of a requirements document. They don’t understand why something that technically passes is still wrong for the user.
Researchers looking at failed QA efforts found that 75% of testing problems trace back to ambiguous requirements — not the tools or frameworks being used. ‘AI output is non-deterministic and requires a human to apply critical thinking and repair differences between AI output and what a competent, expert human would provide.’ — Automation Guild 2026.
The organisations getting the best results from AI testing tools are the ones that decided explicitly where human judgment stays in the loop — not as an afterthought, but designed in.
They define the checkpoints where human review is required before automation proceeds. Not because they distrust the AI, but because they know accountability for quality can’t be fully handed off to a tool.
What changes is the shape of the human contribution. Less time running tests manually. More time defining what good looks like, reviewing AI-generated output for context blindness, interpreting ambiguous results, and explaining quality risk in terms a CTO or product director can act on.
That last skill — communication — turns out to be critical and chronically underrated. The teams that invest in building it alongside technical upskilling will outperform those that treat quality engineering as a purely technical function.
7. QAOps: When Quality Stops Being Downstream of Everything Else
A lot of engineering organisations have a structural problem that predates AI: QA is a gate. Code comes out of development, QA checks it, code goes to production. The testing team sits downstream of the real work.
This was already showing strain before deployment cadences accelerated. When teams ship multiple times a day, and the test suite takes four hours to run, you either wait on QA, or you skip it. Neither answer is good.
QAOps is the structural fix: quality engineering inside the same operational model as DevOps, not downstream of it. Testing is embedded in the pipeline at every stage rather than applied at the end.
Quality metrics shared across development, QA, and operations — not locked in a test management tool that only QA reads. Quality signals are treated as engineering intelligence that informs how developers write code, not just how QA validates it.
Gartner found that 81% of executives now connect software quality directly to customer satisfaction and revenue. And the composition of quality teams is already changing — the share of large QA groups grew from 17% in 2023 to 30% in 2025, as developers and DevOps engineers take on testing responsibility alongside dedicated QA roles. (TestGrid, 2025)
In teams where QAOps works, developers see the quality impact of their commits in real time. Defect patterns are visible to the whole engineering team, not hidden in QA backlogs. Coverage gaps get flagged before code review.
The testing function stops being a bottleneck and becomes shared infrastructure — something everyone benefits from and contributes to.
The ROI case is more straightforward than people expect. Teams that chronically underinvest in test infrastructure carry quality debt that compounds — slower releases, more production incidents, more time spent on rework that should have been caught earlier.
The organisations that treat quality as a delivery capability consistently see it pay back in incident rates and release confidence, not in some abstract quality score.
25% of companies that invested in test automation reported immediate ROI. 54% of enterprises now integrate test automation directly into Agile/DevOps workflows. Quality is continuous infrastructure, not a periodic event. (TestGrid, 2026)
The conversation worth having in most engineering organisations isn’t about which tools to buy. It’s about whether QA is a gate or a capability. That answer shapes everything: how the team is staffed, what gets prioritised, how quality gets measured, how fast the organisation can ship.
What the QA Engineer’s Job Actually Looks Like in 2026
Across these seven trends, the job has changed, not disappeared, but genuinely shifted. Less time scripting test cases. More time on work that needs judgment. Here’s what that looks like day to day:
-
- Working with agentic tools to set quality objectives rather than hand-coding every test scenario
-
- Reviewing AI-generated test output for context blindness — the cases the tool passed that shouldn’t have passed
-
- Building and maintaining quality gates in CI/CD, which means working more closely with platform engineers than with developers
-
- Applying enough security knowledge to catch AI-specific risk in standard QA workflows, not just flagging it for a separate team
-
- Translating defect trends and coverage data into language that makes sense to product directors and CTOs
-
- Making risk-based calls about where to focus testing effort — which requires domain knowledge, not just tool proficiency
Industry surveys consistently find QA teams feel underprepared for how fast this is moving. The teams that are investing now in AI tool fluency, pipeline engineering basics, and stakeholder communication skills will find themselves ahead of that curve.
The ones waiting for the landscape to stabilise may be waiting a while.
How Sthenos Approaches Quality Engineering
At Sthenos, quality engineering isn’t a workstream that reviews output at the end — it’s embedded in how we deliver. Our CMMI Level 5 model is built on the premise that quality must be present throughout the software lifecycle, not applied as a final check.
We work with engineering teams to build modern QA programs that match the delivery environment: AI-augmented automation in CI/CD pipelines, risk-based frameworks that allocate effort where it matters, and the organisational capability to work with the generation of testing tools coming online right now.
That spans functional and regression testing, security validation, performance engineering, and the newer discipline of AI system validation — including the non-deterministic testing challenges most existing QA playbooks don’t have answers for yet.
If you’re rebuilding your quality engineering function for 2026 — or just trying to figure out where to start — that’s the kind of conversation we have regularly.
Final Thoughts
The organisations that lead on quality in 2026 won’t be the ones with the most advanced testing tools. They’ll be the ones that got honest about what their QA function is built to do — and closed the gap between that and what the delivery environment now demands.
Agentic testing is real and worth acting on now. AI-generated code is a quality risk most teams are underestimating. Security and QA need to work together more than they do. And shift-left without shift-right is only watching half the game.
None of this is settled. The tooling space will look different in eighteen months. But the teams that build a clear model of what quality means to their organisation — and make decisions from that model — will adapt faster than those chasing whatever tool is trending. That’s always been true. It’s just more consequential now.


