The Framework for Testing AI Attack Surfaces
APEX (Agentic Penetration Exercise) is pentest.qa's proprietary methodology for systematically testing AI agents, LLM applications, and autonomous systems against real-world attack vectors - designed to integrate with your existing engineering and QA cycles.
You might be experiencing...
Traditional penetration testing methodology was designed for a world where software was static, APIs were synchronous, and nothing acted autonomously. That world is gone.
APEX - the Agentic Penetration Exercise methodology - was built from first principles for the AI era. It systematically tests the attack vectors that emerge when software can read instructions, call tools, maintain memory, and take autonomous actions.
The Five Phases
PLAN - Every APEX engagement begins with a structured threat model. We identify your AI agent architecture, map trust boundaries between components, define the rules of engagement, and run automated OSINT to understand your AI stack’s external exposure. AI agents gather public intelligence in parallel; human researchers synthesize it into a focused attack plan.
SURFACE - We enumerate your complete AI attack surface: every agent, every tool connection, every API endpoint, every MCP server, every privilege scope. Most engineering teams discover agents and tool integrations they didn’t know existed. This phase produces the attack surface map that drives the exploitation phase.
EXPLOIT - Human senior researchers drive creative attack chaining while AI agents run automated prompt injection sweeps (Garak, PyRIT, PromptBench) in parallel. This combination covers attack surface that manual testing alone cannot enumerate - at the speed that automated tools alone cannot contextualize. Critical findings are reported within 48 hours of discovery.
PERSIST - We simulate long-term adversarial presence: persistent agent compromise, privilege escalation through agent tool chains, exfiltration path mapping. This phase answers the question that standard pentests don’t ask: if an adversary got in, how long could they stay, and what would they do?
REPORT - Every finding is documented with business impact, CVSS score, reproduction steps, and remediation guidance. The report maps findings to OWASP LLM Top 10, ISO 27001, SOC 2, and NIST AI RMF - structured for use as compliance and audit evidence.
Human-Led, AI-Augmented
APEX is not an automated scanner. AI agents in APEX automate the high-volume, systematic work - enumeration, fuzzing, injection sweeps. Human senior researchers drive creative attack chaining, business logic exploitation, and findings narrative.
This combination eliminates the false-positive noise that purely automated tools produce, while covering attack surface that purely manual testing cannot enumerate in a reasonable timeframe.
APEX and Your QA Cycle
APEX was designed to integrate with the engineering workflows your team already uses. Each phase maps directly to your sprint and QA cycle:
PLAN aligns with sprint planning and threat modeling sessions. We arrive having already reviewed your architecture - your sprint planning slot becomes a focused threat model review, not a discovery call.
SURFACE can integrate with your existing asset inventory. We map APEX asset discovery against your internal service catalog, CMDB, or infrastructure-as-code definitions. Gaps between what APEX finds and what your inventory shows are findings in themselves.
EXPLOIT runs in parallel with your QA sprint. While your QA engineers run functional and integration tests, APEX researchers run adversarial tests against the same build. Findings land in your issue tracker alongside your QA tickets.
PERSIST simulates real adversary persistence across deployment cycles. We test whether a compromise achieved in one sprint survives your next deployment - validating that your CI/CD pipeline does not re-introduce cleared vulnerabilities.
REPORT produces compliance evidence mapped to ISO 27001 / SOC 2 / NIST AI RMF. The APEX attestation letter and structured findings report give your compliance team audit-ready documentation without requiring a separate evidence-gathering exercise.
Engagement Phases
PLAN
Scope definition, threat model development, AI architecture review, trust boundary mapping, rules of engagement, and automated OSINT gathering on AI stack exposure.
SURFACE
Asset discovery, tool connection mapping, privilege scope enumeration, MCP server inventory, API endpoint discovery, and agent interaction pattern analysis.
EXPLOIT
Manual prompt injection chaining, tool poisoning simulation, memory manipulation attempts, and unauthorized lateral movement. AI agents run Garak and PyRIT fuzzing sweeps in parallel.
PERSIST
Persistent agent compromise simulation, privilege escalation through agent tool chains, exfiltration path mapping, and long-term persistence mechanism identification.
REPORT
Narrative findings report with business impact, CVSS scores, OWASP LLM Top 10 compliance mapping, remediation roadmap, and ISO 27001 / SOC 2 / NIST AI RMF alignment evidence.
Deliverables
Before & After
| Metric | Before | After |
|---|---|---|
| vs Traditional Pentest | OWASP Top 10 - no AI-specific methodology | OWASP Top 10 + OWASP LLM Top 10 + Agent-specific vectors |
| vs Automated Scanners | High false positive rate, no creative chaining | Human-led with AI augmentation - real attack chains |
| Compliance Evidence | Generic penetration test report | APEX attestation letter + ISO 27001 / SOC 2 / NIST AI RMF mapping |
Tools We Use
Frequently Asked Questions
What makes APEX different from standard penetration testing methodology?
Standard penetration testing methodology (PTES, OWASP Testing Guide) was designed for web applications, APIs, and network infrastructure. APEX adds five AI-specific testing domains: prompt injection systematization, tool chain attack surface mapping, memory and context manipulation, agentic privilege escalation path analysis, and AI-augmented attack automation. APEX is the only documented methodology that covers all these domains for engineering and QA teams shipping AI systems.
How does AI automation work within APEX?
In the SURFACE phase, AI agents run automated asset enumeration in parallel with human reconnaissance - covering attack surface that would take a human team weeks to enumerate manually. In the EXPLOIT phase, AI agents run continuous fuzzing sweeps (Garak, PyRIT) while human researchers focus on creative attack chaining and business logic exploitation. Human researchers review and validate all AI-generated findings before inclusion in the report.
Is APEX recognized by compliance frameworks?
APEX maps explicitly to OWASP LLM Top 10, ISO 27001 Annex A controls, SOC 2 Trust Services Criteria, and NIST AI RMF. We provide an APEX attestation letter that your compliance team can present to auditors as evidence of AI-specific security testing. All engagements are conducted under a signed authorization agreement (ATT) ensuring testing is explicitly authorized in writing.
Can we get APEX methodology documentation for our records?
Yes. All APEX engagements include the methodology documentation as a deliverable - structured for inclusion in your information security management system (ISMS) as evidence of systematic AI security testing. The documentation covers the five phases, tools used, and testing coverage against OWASP LLM Top 10, ISO 27001, SOC 2, and NIST AI RMF.
Ship Secure. Test Everything.
Book a free 30-minute security discovery call with our AI Security experts. We map your AI attack surface and identify your highest-risk vectors - actionable findings within days, CI/CD integration recommendations included.
Talk to an Expert