Skip to content
Blog

The AppSec testing market split itself in two. Both halves are wrong.

Aravind Venkataraman

You are being pushed toward expensive-and-shallow or cheap-and-shallow. Neither gets you the coverage you need. Here is what the third option looks like.

Two models. Neither working.

If you lead security at a Global 2000 company or a critical infrastructure operator, you are being asked to defend a growing surface of applications with a testing budget that has not grown at the same rate. Your AppSec team is backlogged. Your applications change weekly. Your board wants evidence that your defenses work, not just documentation that you ran a test.

The vendor market has offered you two answers. Both of them are inadequate.

The first answer is expensive, manual-only penetration testing.

The quality is real. Senior practitioners find what automated scanners cannot. But the economics do not scale. You cannot hire enough senior testers, you cannot retain the ones you hire, and a human tester in a one-week engagement can only cover so much ground. Coverage is shallow by necessity, not by choice.

The second answer is autonomous AI testing.

The pitch is seductive. Replace the expensive humans with software that runs continuously and costs less. The reality, once you run these tools on production applications, is different. They lack the context to distinguish a real finding from a theoretical one. They flood reports with noise. Most enterprise buyers who have put them through serious evaluation have concluded the same thing: fine for CI/CD scanning, unsuited to Tier 1 assessments.

You are being asked to choose between expensive-and-shallow or cheap-and-shallow. That is a false choice. It should never have been the market's answer.

 

Reframing the Question

The question we have been trained to ask is: do we hire humans or buy software? That framing misses what has actually changed.

The real question is: what is the right division of labor between the human expert and the machine?

There is work in a penetration test that machines do superbly. Reconnaissance. Attack surface mapping. Running a hundred variations of an injection test in parallel. Packaging evidence. Drafting report language. And there is work that machines are not equipped to do. Reasoning about business logic. Judging exploitability in a specific enterprise context. Constructing a multi-step authenticated attack. Deciding what matters to this customer versus what is noise.

The firms marketing autonomous AI are trying to use machines for the second category. The firms sticking with manual-only testing are still using humans for the first. Both are using expensive labor in the wrong place.

The future of application security testing is not human versus AI. It is human plus AI, where the human provides expertise, judgment, and accountability, and the AI provides the scale, consistency, and memory that no individual practitioner can sustain across thousands of engagements.

 

What "Materially Better" Looks Like

At UltraViolet Cyber, our application penetration testing is now powered by a proprietary AI Solstice, built by our own practitioners and trained on five years of real-world penetration test results. Our own runbooks. Our own historical findings. The patterns our team has learned to identify across specific frameworks, industry verticals, and application architectures.

It is not a replacement for our testers. It is what our testers use to be faster, more thorough, and more consistent. The technology that powers it (large language models, agentic frameworks) is available to every firm in this market. What is not available elsewhere is what we have put inside it: institutional knowledge that compounds, accumulated across thousands of engagements, that cannot be purchased off a shelf or replicated overnight.

This is a production capability on real customer engagements, not a pilot. Our senior pentesters are the same experts you would hire today. What has changed is what they are able to deliver in the same window of time.

Faster engagements.

Reconnaissance, attack surface mapping, threat modeling, and routine test execution now happen continuously in the background. Report drafting is automated from evidence collected during testing. The path from kickoff to delivery is shorter. Your team gets findings they can act on sooner.

Broader coverage.

AI agents run specialist tests in parallel while human practitioners focus on the hardest areas of the application. Tests that would previously have been deprioritized due to time constraints now get executed. The coverage you pay for is the coverage you get.

Stronger findings.

Every finding in the report is linked to the specific HTTP evidence that proves it. No finding exists without evidence. Your development teams get fewer vague tickets and more actionable security work.

Compounding value over time.

The system retains institutional knowledge across engagements. Confirmed findings, dismissed false positives, application-specific behaviors. The AI that tests your application next year carries everything it learned this year. Recurring testing gets smarter year over year, in a way that point-in-time pentesting never has been before.

In Practice

What This Looks Like On A Real Engagement

Complex financial services portal · Multi-role API · Custom session management

Traditional engagement

In a traditional engagement, the first day is largely consumed by manual reconnaissance — mapping the application, understanding the attack surface, building a test plan from scratch. By mid-week the team is executing tests but has to triage which areas to prioritize against the time constraint. Some attack surface inevitably gets left on the table.

With AI augmentation

The attack surface is mapped in the first hour. The test plan reflects the application's specific technology stack before the briefing call ends. AI agents run authorization tests across all user-role combinations concurrently, work that previously took days of careful manual effort. The practitioners spend their time on what AI struggles to do without significant human direction, context, and trial-and-error: the nuanced, judgment-heavy testing that separates a thorough assessment from a mechanical one.

By the end of the engagement, coverage is materially broader. Human and agent testing reinforce each other. The AI surfaces patterns and gaps the practitioner hadn't reached yet, while the practitioner's judgments and context continuously sharpen what the agents focus on next. The result is a level of coverage that neither could achieve independently.

Meet

Two lanes operate inside every engagement. The practitioner browses, probes, and tests the application using industry-standard tools. AI agents run specialist tests concurrently in the background. Both lanes feed into a central engagement brain that captures every action, queryable in plain language at any time, and which carries learning forward to the next engagement.

Two-lane architecture: human pentester on left, Solstice agents on right, with bidirectional flows, a central engagement brain, and cross-engagement memory. person Human pentester smart_toy Solstice agents search Browses & probes the app Testing traffic & app context flow into the AI Works with industry-standard testing tools explore Builds app intelligence Site map · attack surface · threat model Generates structured test plan live traffic test plan assignment Reviews & refines plan Adds scope, context, business logic Approves the final test plan autorenew Refines & finalises plan Incorporates all human context Locks scope and priority order context + feedback refined plan engineering Runs manual tests Tests in Burp, guided by the plan Adds guidance & context anytime bar_chart Runs parallel agent tests XSS · SQLi · AuthN · IDOR & more Concurrent specialist agents guidance next-step hints check_circle Validates findings Approves or dismisses with rationale Feedback trains agent judgment description Narrator drafts report Evidence-linked findings Attack story · remediation list dismissal + rationale report draft psychology Engagement brain Listener agent continuously captures all human + agent activity into a connected knowledge graph Queryable in plain language · Drives orchestrator suggestions · Feeds narrator agent history Cross-engagement memory Learns from every engagement · trained on prior reports, runbooks & findings · gets smarter with every test Human → agent Agent → human Brain feedback Memory loop

Two lanes. One engagement. Constant feedback between them.

Two architectural commitments shape every part of how this works:

Human-Directed

Every high-stakes action requires practitioner approval before execution. The AI proposes. The expert decides. No findings are auto-published. No actions are taken without oversight.

Evidence-Linked

Every finding references the specific HTTP request and response that proves it. No finding exists without evidence. No evidence is asserted without proof.

Learn more about the full architecture, including pre-test intelligence, parallel agentic testing, just-in-time guidance, and engagement-brain reporting.

Meet Solstice →

 

What To Ask Your Next Appsec Vendor

When you are evaluating any application security testing partner this year, there are four questions worth asking. They will clarify quickly whether you are looking at an integrated capability or a bolted-on demo.

  • Is your AI integrated into the practitioner's workflow, or is it a separate product your testers run alongside the real engagement?
  • Do your findings come out of a single integrated system, or are you concatenating output from a scanner and a manual tester?
  • Does your system retain institutional knowledge across engagements, or does every assessment start from zero?
  • Who is in control of every sensitive action? A system where AI can publish findings or take actions without human approval is a system that will eventually embarrass you in front of your board.

The answers will tell you whether the vendor built something for their operators and customers, or built something for their marketing team.

 

Where This Fits In The Broader Ultraviolet Model

Application penetration testing does not exist in isolation. It is one part of a security operation that only works when its offensive and defensive sides speak the same language. When what the red team finds shapes what the blue team detects, and when what the blue team sees in live adversary behavior sharpens what the red team tests next. That closed loop is the Power of Purple, and it is what separates a security operations partner from a point-solution vendor.

When offensive testing gets faster and deeper, the whole loop benefits. Your SOC learns from what our practitioners find in your environment. Your defenses get validated against adversary behavior that actually reflects what attackers are doing to applications like yours. Confirmed findings inform detection engineering. Dismissed false positives sharpen the signal. Application-specific patterns become part of the intelligence that makes every subsequent test more targeted.

Faster, deeper application testing is not just a better pentest. It is better input for the entire security operation.

 

The Bottom Line For Security Leaders

You do not have to choose between expensive-and-shallow and cheap-and-shallow. You do not have to accept testing coverage that is capped by how many hours a senior human has in a week. You also do not have to hand your Tier 1 applications to an autonomous tool that does not understand your business.

There is a third option. Practitioners you trust, working with AI they built, on a system that gets smarter every time it runs. That is what application penetration testing should look like in 2026, and it is what we are now delivering on every AppSec engagement.

For existing UltraViolet clients: talk to your account team about Solstice AI-augmented testing on your next engagement.

For security leaders new to UltraViolet Cyber: request a capabilities briefing with our AppSec practice lead.