Skip to content
Blog

Inside UltraViolet Solstice: the architecture of human-led, AI-augmented application testing

UltraViolet Cyber

UltraViolet Cyber

June 22, 2026

AI agents can do certain parts of application penetration testing fast. They map attack surfaces in minutes that take a practitioner hours. They execute well-bounded test classes in parallel that a human runs sequentially. What they cannot do, on their own, is tell you whether a flagged vulnerability is a real risk in your environment. The judgment work doesn't compress.

Solstice is built around that distinction. AI handles the work that scales. Senior practitioners handle the work that requires judgment. The system runs on every UltraViolet Cyber application penetration test, built by our AppSec team on decades of operator experience from 30,000+ UltraViolet engagements.

For the AppSec programs we work with, what changes:

  • More attack surface gets covered in the same engagement window
  • Findings ship with the HTTP traffic that proves them, which cuts dev team back-and-forth
  • Operator-encoded context on your portfolio compounds across engagements
  • Practitioner approval over what gets executed and reported is captured in the engagement record

The design started from one question: where should AI sit in a penetration test, and where should a human stay in control?

The division of labor

After 30,000+ UltraViolet engagements, the division that produces the best outcome looks like this. AI handles the scaffolding work: attack surface mapping, parallel execution of well-understood test classes, evidence capture, draft reporting. Senior practitioners handle the judgment work: scope decisions, business logic, finding validation, the call on whether a technical finding is actually a risk in this environment.

Two columns showing what AI handles versus what senior practitioners handle in a Solstice engagement. AI handles Practitioners handle Attack surface mapping Parallel test execution (30+ classes) Evidence capture and draft reporting Scope decisions and business logic Finding validation Risk judgment for this environment

The line between those two columns shifts depending on the customer, the application, and the engagement. Solstice is built to make that line easy to move, and to keep the human in control of where it falls on any given test.

Two lanes, one engagement

Inside every Solstice-augmented engagement, two streams of work run in parallel.

The practitioner lane. The senior UltraViolet pentester runs the engagement using industry-standard tools like Burp. They refine the test plan with scope, business logic, and application context. They run manual tests on the parts of the application that require judgment, and validate or dismiss every finding before it lands in a report.

The agent lane. Specialist agents build application intelligence in the background. They map the site, generate a structured threat model, draft the initial test plan, and execute parallel tests across more than 30 vulnerability classes (cross-site scripting, SQL injection, IDOR variants, authentication and authorization flaws, server-side request forgery, more) while the practitioner handles the judgment-heavy work.

Between the two lanes sits the engagement brain. Every action either lane takes (a query the practitioner runs in Burp, a finding an agent reports, a decision either party makes) gets captured in a persistent knowledge graph. The brain is queryable in plain language. Both lanes can ask it for context and adjust based on what the other is doing.

Structural diagram showing the practitioner lane and agent lane running in parallel, with the engagement brain sitting between them. Practitioner lane Burp suite + manual tests Scope + business logic Finding validation Test plan approval Report sign-off Engagement brain Persistent knowledge graph Plain-language queryable Carries across engagements Agent lane Site mapping + threat model Initial test plan draft Parallel vuln testing Evidence capture Draft report generation

The engagement brain

The brain is what most distinguishes Solstice from any other AI-augmented testing approach. It's a listener agent and a persistent knowledge graph, running across every engagement. UltraViolet practitioners can ask it questions in plain English: What endpoints have we tested for IDOR so far? What's our coverage on the admin role? Has anyone tried the X workflow with elevated privileges? The orchestrator agent uses the same graph to suggest next steps, show coverage gaps, and reprioritize tests as the engagement progresses.

Three things matter about how the brain is built.

Persistence across engagements. The brain doesn't reset when an engagement ends. Testing methodology refined across many engagements informs how Solstice approaches similar workflows next time. Customer-specific data stays in its own engagement; what carries forward is operator-encoded knowledge about how to test.

Application-specific. Solstice builds application-specific understanding from live traffic on the application being tested. When it tests a payment flow on a financial services portal, it has the patterns of every financial services portal our operators have tested at hand, along with the specific traffic of the one in front of it.

Queryable by humans. The brain runs in plain language. A practitioner three days into an engagement can ask "where are our coverage gaps" and get a real answer, not a dashboard requiring interpretation.

Six commitments that shaped the design

Six labeled tiles showing the core design principles: human-directed, evidence-linked, learning from feedback, context-aware, persistent memory, and built to last. Human-directed Practitioner approves every action Evidence-linked HTTP proof ships with every finding Learning from feedback Dismissal rationale informs future tests Context-aware Live traffic, not generic databases Persistent memory Operator knowledge compounds over time Built to last Operating model stays; components evolve

Human-directed. Every high-stakes action requires practitioner approval before execution. The system drafts test plans, reports findings, and suggests next steps. The senior practitioner reviews and decides.

Evidence-linked. Every finding ships with the specific HTTP request and response that proves it. This makes validation fast for the practitioner and makes the resulting report defensible when a dev team asks "how do you know."

Learning from feedback. When a practitioner dismisses a finding, they give a reason. That rationale becomes part of how Solstice interprets similar findings going forward. The operators who run UltraViolet engagements encode their judgment as they work.

Context-aware. Solstice builds application-specific understanding from live traffic, not from generic vulnerability databases. A customer of ours put this best:

"Active authentication vulnerabilities might not be a risk to an organization that works with mainframes because they are not exposed out to the internet. An agent wouldn't know this."

That's the kind of context the brain holds, and the kind of judgment the practitioner brings.

Persistent memory. Nothing gets lost between sessions. The Solstice that tests your application next quarter has the operator-encoded judgment from this quarter to draw on. For customers running large portfolios with us, this compounds.

Built to last. Solstice is a mix of UltraViolet-built capability and industry-standard tooling. The operating model stays; the components evolve as better options emerge.

What this prevents: a real example

One of our senior pentesters walked us through a scenario from a recent engagement that captures why the design choices matter.

An autonomous pentest tool flagged a critical path to administrative takeover on a customer-facing SaaS application. The chain ran through an exposed internal API endpoint, weak scoping on an API key, and a privilege boundary the tool concluded it could cross. On paper, it looked legitimate enough that the pentester paused their manual workflow to validate.

Three hours later, the finding hinged on an API key for a deprecated billing integration. The integration had been removed from the application months earlier, but the key was still in the configuration table. Any request signed with that key routed to nothing in production.

The false positive itself was the smaller problem. The interruption cost was the bigger one: breaking focus during a live engagement, manually validating machine-generated assumptions, and explaining to the client why "critical" suddenly wasn't.

Here's where Solstice's design choices change the outcome.

Evidence-linked findings. Every Solstice finding has to ship with the HTTP request and response that proves it. A deprecated integration's API key doesn't authenticate against a live endpoint, so a real request returns nothing useful. No successful response, no finding. The artifact-based inference an autonomous tool made (key exists + endpoint exists + privilege boundary = critical path) never reaches the report.

Context-aware pre-test intelligence. Solstice ingests real application traffic and maps what's actually routing in production. A deprecated endpoint with no traffic, and a key that authenticates against nothing, shows up as inactive in the attack surface map. It wouldn't make the test plan the pentester approves.

Human-directed approval gate. Even if a suspicious pattern made it into the proposed plan, the pentester reviews and approves before any agent executes. They see "stale API key, deprecated integration path" and dismiss it or scope it down in 30 seconds, before three hours of live engagement time get burned.

Persistent memory. When the pentester dismisses the finding with rationale ("deprecated integration, key doesn't route"), Solstice's brain holds that learning. The Solstice that tests this customer's portfolio next year doesn't flag it again.

Flowchart showing the three independent gates a finding must pass before it reaches a client report. Candidate finding flagged Evidence-linked Successful HTTP response required to proceed Context-aware Endpoint active in production traffic required Human approval Practitioner reviews and approves test plan Flagged Filter 1 Filter 2 Filter 3

This is what we mean when we say AI should handle the scaffolding work and humans should handle the judgment work. Solstice requires three things before any pattern would reach a customer's report: behavioral evidence that the path actually works, an attack surface map built from live production traffic, and practitioner sign-off on the test plan. Any one of those filters catches the deprecated integration.

What changes for your testing program

More coverage in the same testing window. Parallel agents handle more than 30 well-bounded vulnerability classes concurrently while the senior practitioner works the parts of the application that require judgment. More attack surface gets tested in the same engagement window, without adding hours or headcount.

Findings dev teams accept faster. Every finding ships with the HTTP request and response that proves it. The dev team gets the evidence with the report, which cuts the validation cycle. Reports become defensible by default.

A testing relationship that compounds. Operator-encoded knowledge about how to test specific application types carries forward across engagements. The Solstice that tests your portfolio in year three has the context from year one and year two to draw on. For customers running large portfolios with us on subscription, this compounds visibly.

Verifiable practitioner control. Every high-stakes action requires explicit practitioner approval. Every finding gets reviewed before it lands in a report. Every dismissal includes a rationale captured in the engagement record. If your compliance program asks how you know AI didn't generate that finding without human review, the engagement record has the answer.

Solstice is included in every UltraViolet Cyber application penetration test. To see what it finds in your application, the fastest path is a 30-minute scoping conversation with one of our AppSec leads. Visit the Solstice page for the system overview, or contact us to set up a conversation.