The Framework for Testing AI Attack Surfaces

APEX (Agentic Penetration Exercise) is pentest.qa's proprietary methodology for systematically testing AI agents, LLM applications, and autonomous systems against real-world attack vectors - designed to integrate with your existing engineering and QA cycles.

Duration: Applied across all engagements Team: Human senior researchers + AI agent automation

You might be experiencing...

No industry standard exists for AI agent security testing - your team has no methodology to follow.
Traditional penetration testing methodology was designed for static software, not autonomous agents.
Compliance frameworks reference AI security controls but don't specify testing methodology.
You need a documented framework to present to boards, auditors, and enterprise customers.

Traditional penetration testing methodology was designed for a world where software was static, APIs were synchronous, and nothing acted autonomously. That world is gone.

APEX - the Agentic Penetration Exercise methodology - was built from first principles for the AI era. It systematically tests the attack vectors that emerge when software can read instructions, call tools, maintain memory, and take autonomous actions.

The Five Phases

PLAN - Every APEX engagement begins with a structured threat model. We identify your AI agent architecture, map trust boundaries between components, define the rules of engagement, and run automated OSINT to understand your AI stack’s external exposure. AI agents gather public intelligence in parallel; human researchers synthesize it into a focused attack plan.

SURFACE - We enumerate your complete AI attack surface: every agent, every tool connection, every API endpoint, every MCP server, every privilege scope. Most engineering teams discover agents and tool integrations they didn’t know existed. This phase produces the attack surface map that drives the exploitation phase.

EXPLOIT - Human senior researchers drive creative attack chaining while AI agents run automated prompt injection sweeps (Garak, PyRIT, PromptBench) in parallel. This combination covers attack surface that manual testing alone cannot enumerate - at the speed that automated tools alone cannot contextualize. Critical findings are reported within 48 hours of discovery.

PERSIST - We simulate long-term adversarial presence: persistent agent compromise, privilege escalation through agent tool chains, exfiltration path mapping. This phase answers the question that standard pentests don’t ask: if an adversary got in, how long could they stay, and what would they do?

REPORT - Every finding is documented with business impact, CVSS score, reproduction steps, and remediation guidance. The report maps findings to OWASP LLM Top 10, ISO 27001, SOC 2, and NIST AI RMF - structured for use as compliance and audit evidence.

Human-Led, AI-Augmented

APEX is not an automated scanner. AI agents in APEX automate the high-volume, systematic work - enumeration, fuzzing, injection sweeps. Human senior researchers drive creative attack chaining, business logic exploitation, and findings narrative.

This combination eliminates the false-positive noise that purely automated tools produce, while covering attack surface that purely manual testing cannot enumerate in a reasonable timeframe.

APEX and Your QA Cycle

APEX was designed to integrate with the engineering workflows your team already uses. Each phase maps directly to your sprint and QA cycle:

PLAN aligns with sprint planning and threat modeling sessions. We arrive having already reviewed your architecture - your sprint planning slot becomes a focused threat model review, not a discovery call.

SURFACE can integrate with your existing asset inventory. We map APEX asset discovery against your internal service catalog, CMDB, or infrastructure-as-code definitions. Gaps between what APEX finds and what your inventory shows are findings in themselves.

EXPLOIT runs in parallel with your QA sprint. While your QA engineers run functional and integration tests, APEX researchers run adversarial tests against the same build. Findings land in your issue tracker alongside your QA tickets.

PERSIST simulates real adversary persistence across deployment cycles. We test whether a compromise achieved in one sprint survives your next deployment - validating that your CI/CD pipeline does not re-introduce cleared vulnerabilities.

REPORT produces compliance evidence mapped to ISO 27001 / SOC 2 / NIST AI RMF. The APEX attestation letter and structured findings report give your compliance team audit-ready documentation without requiring a separate evidence-gathering exercise.

Engagement Phases

Engagement Week 1

PLAN

Scope definition, threat model development, AI architecture review, trust boundary mapping, rules of engagement, and automated OSINT gathering on AI stack exposure.

Engagement Week 2

SURFACE

Asset discovery, tool connection mapping, privilege scope enumeration, MCP server inventory, API endpoint discovery, and agent interaction pattern analysis.

Engagement Weeks 3-5

EXPLOIT

Manual prompt injection chaining, tool poisoning simulation, memory manipulation attempts, and unauthorized lateral movement. AI agents run Garak and PyRIT fuzzing sweeps in parallel.

Engagement Week 6

PERSIST

Persistent agent compromise simulation, privilege escalation through agent tool chains, exfiltration path mapping, and long-term persistence mechanism identification.

Engagement Weeks 7-8

REPORT

Narrative findings report with business impact, CVSS scores, OWASP LLM Top 10 compliance mapping, remediation roadmap, and ISO 27001 / SOC 2 / NIST AI RMF alignment evidence.

Deliverables

APEX engagement report following the five-phase structure
OWASP LLM Top 10 compliance mapping
AI attack surface map and privilege scope diagram
APEX framework attestation letter for audit and compliance purposes
Methodology documentation for your information security management system

Before & After

MetricBeforeAfter
vs Traditional PentestOWASP Top 10 - no AI-specific methodologyOWASP Top 10 + OWASP LLM Top 10 + Agent-specific vectors
vs Automated ScannersHigh false positive rate, no creative chainingHuman-led with AI augmentation - real attack chains
Compliance EvidenceGeneric penetration test reportAPEX attestation letter + ISO 27001 / SOC 2 / NIST AI RMF mapping

Tools We Use

Garak PyRIT PromptBench Burp Suite Pro BloodHound Custom APEX Toolchain

Frequently Asked Questions

What makes APEX different from standard penetration testing methodology?

Standard penetration testing methodology (PTES, OWASP Testing Guide) was designed for web applications, APIs, and network infrastructure. APEX adds five AI-specific testing domains: prompt injection systematization, tool chain attack surface mapping, memory and context manipulation, agentic privilege escalation path analysis, and AI-augmented attack automation. APEX is the only documented methodology that covers all these domains for engineering and QA teams shipping AI systems.

How does AI automation work within APEX?

In the SURFACE phase, AI agents run automated asset enumeration in parallel with human reconnaissance - covering attack surface that would take a human team weeks to enumerate manually. In the EXPLOIT phase, AI agents run continuous fuzzing sweeps (Garak, PyRIT) while human researchers focus on creative attack chaining and business logic exploitation. Human researchers review and validate all AI-generated findings before inclusion in the report.

Is APEX recognized by compliance frameworks?

APEX maps explicitly to OWASP LLM Top 10, ISO 27001 Annex A controls, SOC 2 Trust Services Criteria, and NIST AI RMF. We provide an APEX attestation letter that your compliance team can present to auditors as evidence of AI-specific security testing. All engagements are conducted under a signed authorization agreement (ATT) ensuring testing is explicitly authorized in writing.

Can we get APEX methodology documentation for our records?

Yes. All APEX engagements include the methodology documentation as a deliverable - structured for inclusion in your information security management system (ISMS) as evidence of systematic AI security testing. The documentation covers the five phases, tools used, and testing coverage against OWASP LLM Top 10, ISO 27001, SOC 2, and NIST AI RMF.

Ship Secure. Test Everything.

Book a free 30-minute security discovery call with our AI Security experts. We map your AI attack surface and identify your highest-risk vectors - actionable findings within days, CI/CD integration recommendations included.

Talk to an Expert