StealthByte Labs · AI Red Teaming

#01

what it is

A reasoning adversary, not a scanner.

// the discipline An agent that plans, adapts, and improvises

Obsidian-3.2 is our autonomous offensive agent. It runs reconnaissance, hypothesizes attack paths, executes exploit primitives, and re-plans based on what it learns — all without human-in-the-loop for routine decisions. Operators direct objectives; the agent decides how to reach them.

The result is a kind of pressure that no human team can sustain: 24/7 patient exploration of your attack surface at machine speed, with reasoning depth that grows the longer it runs.

// why you need it The threat is already here

Adversaries are already deploying AI to find your weaknesses. Researchers have demonstrated autonomous exploitation against CVEs within hours of disclosure. Threat actors are running LLM-driven phishing campaigns and ransomware affiliate workflows.

Your defenders need to know what an AI adversary will find before one finds you — and the only credible test is to run an actual AI adversary against your environment, under our control, with reportable findings.

// 001

Autonomous Exploit Chaining

The agent reasons across CVEs, misconfigurations, and access primitives to chain multi-step attacks toward defined objectives.

// 002

LLM Application Red Teaming

Prompt injection, jailbreaks, training data extraction, and tool-use abuse against your production LLM stack. Full OWASP LLM Top 10 coverage.

// 003

Adversarial ML Attacks

Model evasion, membership inference, model extraction, and data poisoning against your ML pipelines and deployed inference endpoints.

// 004

Agent Audit Trail

Every decision the agent makes is logged with reasoning, evidence, and remediation guidance. Reproducible, auditable, and admissible.

#02

how the agent works

Four-stage reasoning loop.

Perceive & Map

Agent ingests scope, performs reconnaissance, builds an internal graph of assets, identities, and trust relationships. Memory persists across sessions.

Avg duration 4–8 hours

Hypothesize & Plan

Agent generates candidate attack paths ranked by feasibility and impact. Tree-search across exploit primitives. Chain-of-thought reasoning visible in audit log.

Plans evaluated ~280 / hour

Execute & Adapt

Agent runs primitives under tight RoE. Each result feeds back into the planner. Failed paths get re-weighted; successful paths get extended toward the objective.

Median to DA 3h 47m

Explain & Hand-off

Operators review the agent's full reasoning trace. Every decision is documented. Findings translated into engineering-ready remediation with auto-generated detections.

Audit log Full trace

#03

inside the agent

Reasoning, made auditable.

// obsidian_agent.graph topology · sub-agent constellation

// agent.telemetry obsidian-3.2 · live

Plans Evaluated

1,847

+ 284 / hour

Active Chains

parallel reasoning

Memory Depth

28k

facts indexed

Time to DA

3h 47m

below baseline

Objective progress73%

Reasoning depth62%

Confidence score91%

// agent_loop.py obsidian-3.2 · core

# reasoning loop — perceive → plan → act → reflect
from stealthbyte.obsidian import Agent, ToolBelt

async def run(target, objective, max_steps=256):
    agent = Agent(model="obsidian-3.2", temp=0.3)
    tools = ToolBelt([recon, scan, exploit, pivot])

    for step in range(max_steps):
        observation = await agent.perceive(target)
        plan = await agent.plan(observation, objective)
        result = await tools.execute(plan.next_action)
        agent.remember(plan, result)

        if agent.objective_reached(objective):
            return agent.audit_trace

// agent.reasoning_trace step 247 · live thinking

▸ thought

The exposed Jenkins instance at build.acme.corp uses default credentials. This is a viable entry point but likely heavily monitored.

▸ thought

Cross-referencing with the dev portal phishing result, I have valid SSO creds for dev-001. Less noisy than Jenkins. Higher confidence.

▸ action

Pivot: authenticate to internal gitlab via SSO, enumerate accessible repositories, search for hardcoded secrets in .git history.

▸ result

Found AWS access key in commit 8b2d11 of acme-prod-infra (2023-04). Key still valid. Permissions: AdministratorAccess.

▸ thought

Direct path to objective. Should I escalate to operator review before exfil staging? RoE permits — proceeding.

#04

what you get

Three artifacts. One reasoning trace.

// 01 · executive 20–35 pages

The Adversary Brief

Plain-language account of what the agent did, what it found, and what it means. Written for executives who need to make AI-risk decisions.

Agent capabilities & limitations summary
Findings ranked by exploitability
AI-specific risk register
Investment recommendations

// 02 · technical full audit trace

The Reasoning Trace

Every thought, action, tool call, and observation the agent generated — fully reproducible. Engineering-ready and admissible.

Complete chain-of-thought log
Tool-call telemetry & outputs
Decision-tree visualization
Reproducible seed + agent config

// 03 · remediation auto-generated

Detection Playbook

For every finding, agent-generated detection rules tested against your SIEM. Sigma, Splunk SPL, KQL, and Elastic queries delivered ready-to-deploy.

Sigma rules for each finding
Vendor-specific detections (Splunk, Sentinel)
False-positive validation results
Coverage map vs. MITRE ATT&CK

#05

engagement models

Three ways to deploy the agent.

// scoped

LLM audit

Targeted adversarial test of one production LLM application. Prompt injection, jailbreaks, tool abuse, training data leakage. Full OWASP LLM Top 10.

1-week agent engagement
Single LLM application scope
OWASP LLM Top 10 coverage
Three-artifact deliverable
30-day retest included

brief operators →

// flagship

Autonomous campaign

Full-spectrum agent deployment against your environment. Network, identity, cloud, and AI surface — chained together by the agent over 4–8 weeks.

4–8 week agent campaign
Multi-domain attack surface
Custom objective definition
Operator-supervised execution
90-day retest + detection tuning

deploy agent →

// continuous

Always-on adversary

Persistent agent deployment. New attack paths surfaced weekly. Detection content updated continuously. Monthly executive briefings.

Continuous agent operation
Weekly findings digest
Monthly board briefing
Dedicated agent operator
Emergency response SLA

discuss retainer →

#06

common questions

About the agent.

01 Is the agent fully autonomous? Will it act without human approval? +

The agent operates under explicit rules of engagement that you co-author. Routine decisions (which port to scan, which credential to test) are autonomous; consequential actions (lateral movement to a new subnet, exfiltration staging) require operator approval. You see every decision before and after, in the audit trail.

02 What underlying model powers Obsidian-3.2? +

Obsidian-3.2 is built on top of a frontier base model — fine-tuned and aligned for offensive reasoning, with proprietary tool-use scaffolding and a security-specific memory subsystem. We disclose the base model family under NDA during scoping. We do not send your environment data to any third-party API; inference runs in our own infrastructure.

03 Can the agent hallucinate findings or generate false positives? +

Every finding is validated against ground truth before reporting — exploits are confirmed by execution, credentials by use, vulnerabilities by reproduction. The agent has a separate "critic" sub-agent that adversarially reviews each claim. Findings that can't be reproduced are dropped from the final report. Our verified false positive rate is under 2%.

04 How is this different from running automated scanners? +

Scanners enumerate findings; the agent reasons about them. A scanner reports 200 medium-severity issues — the agent identifies the three that chain together into domain admin. The reasoning, prioritization, and chain construction is what makes the output usable; raw scanner output is what your team already has and ignores.

05 Will the agent damage production systems? +

Destructive techniques are excluded by RoE. The agent is constrained to read-only and persistence-free actions by default. We maintain a 24/7 abort channel, and the agent itself has built-in safety constraints that refuse irreversible actions without explicit operator approval. In 180+ AI engagements we've caused zero production outages.

06 Can we audit the agent's training data and alignment? +

Under NDA, yes. We provide alignment evaluations, refusal-rate benchmarks across CBRN/cyber categories, and full red-team test results from internal alignment work. The agent is constrained to defensive disclosure norms — it will refuse to generate weaponizable malware or assist in attacks against unauthorized targets.

AI Red
Teaming.

A reasoning adversary, not a scanner.

// the discipline An agent that plans, adapts, and improvises

// why you need it The threat is already here

Autonomous Exploit Chaining

LLM Application Red Teaming

Adversarial ML Attacks

Agent Audit Trail

Four-stage reasoning loop.

Perceive & Map

Hypothesize & Plan

Execute & Adapt

Explain & Hand-off

Reasoning, made auditable.

Three artifacts. One reasoning trace.

The Adversary Brief

The Reasoning Trace

Detection Playbook

Three ways to deploy the agent.

LLM audit

Autonomous campaign

Always-on adversary

About the agent.

An AI adversary is already looking at you.
The question is whether you've met ours first.

AI RedTeaming.

A reasoning adversary, not a scanner.

// the discipline An agent that plans, adapts, and improvises

// why you need it The threat is already here

Autonomous Exploit Chaining

LLM Application Red Teaming

Adversarial ML Attacks

Agent Audit Trail

Four-stage reasoning loop.

Perceive & Map

Hypothesize & Plan

Execute & Adapt

Explain & Hand-off

Reasoning, made auditable.

Three artifacts. One reasoning trace.

The Adversary Brief

The Reasoning Trace

Detection Playbook

Three ways to deploy the agent.

LLM audit

Autonomous campaign

Always-on adversary

About the agent.

An AI adversary is already looking at you.The question is whether you've met ours first.

AI Red
Teaming.

An AI adversary is already looking at you.
The question is whether you've met ours first.