Documentation

Everything you need to know about Shield LLM — attack methodology, scoring, scan presets, and more.

Getting Started

Shield LLM grades your chatbot's defenses against real-world attacks, right from your terminal. You'll need an account and an API key (the CLI is a Pro feature); grab the key from your dashboard under Settings. Prefer to stay hands-off? Ask the coding agent you already use to run these steps for you.

1
Install the CLI
Install Shield LLM globally with npm: `npm install -g shield-llm`.
2
Authenticate
Run `shield-llm login --key sk_shield_...` with the API key from your dashboard. You'll see your plan and scans remaining, so you know you're connected.
3
Initialize the config
Run `shield-llm init` to generate a shield.config.json describing your chatbot endpoint.
4
Run your first scan
Run `shield-llm scan` to launch a security audit against your chatbot. Results stream to your dashboard.

CLI Tool

Scan chatbot APIs directly from the command line: REST endpoints, SSE streaming, custom HTTP shapes. The CLI is a thin client. Authentication, the attack catalogue, judging and scoring all live on the backend (SaaS or self-hosted License). New attacks reach existing CI installations on the next scan, with no upgrade.

Installation

npm install -g shield-llm

# Or use without installing:
npx shield-llm scan --help

Quick Start

# 1. Authenticate (SaaS or License instance)
shield-llm login --key sk_shield_xxx

# 2. Generate a config for your chatbot
shield-llm init

# 3. Run a scan
shield-llm scan -c shield.config.json

# 4. (CI mode) JSON to stdout, status to stderr, exit-code policy
shield-llm scan -c shield.config.json --ci --min-score 75

Commands

Command	Description
shield-llm login --key '<'sk_shield_...'>' [--api-url '<'url'>']	Authenticate against the SaaS (default) or a self-hosted License instance via --api-url.
shield-llm init	Interactive wizard. Generates a shield.config.json for an HTTP chatbot endpoint (URL, method, auth, body template, response field, preset).
shield-llm scan [options]	Run a security scan. Pulls the plan-filtered attack catalogue from the backend and routes responses through the server-side judge.
shield-llm retest '<'scanId'>' [--finding '<'id'>'] [--ci]	Replay the exact exploits of a past scan's findings after you ship a fix. Each finding is stamped Fixed & verified or Still vulnerable on the dashboard. Exit 0 = all fixed, 1 = still vulnerable.
shield-llm report '<'scanId'>' [--output json\|markdown]	Fetch a stored scan report from the dashboard for audit trails, Slack post-mortems, or CI artifacts.
shield-llm validate [-c '<'path'>']	Validate a shield.config.json without sending any traffic.
shield-llm tests [--remote]	List custom tests from the local config and/or your dashboard.
shield-llm attacks [--severity '<'level'>']	Show the attack catalogue summary for your plan (counts by severity; the live catalogue is fetched from the backend).
shield-llm presets	List the available scan presets (fetched from the backend).
shield-llm logout	Remove stored credentials from ~/.shield-llm/credentials.json.

Scan Flags

Flag	Description
-c, --config <path>	Path to shield.config.json (default: ./shield.config.json)
-e, --endpoint <url>	Target chatbot HTTP endpoint URL (overrides config)
--auth <type>	Auth type: none, bearer, api-key, oauth2
--token <value>	Bearer token or API key value (use $ENV_VAR for secrets)
--auth-header <name>	Custom auth header name (default: X-API-Key)
--response-field <path>	Dot-notation path to response text (e.g. response, choices[0].message.content)
--request-body <json>	JSON body template with {{prompt}} placeholder
--preset <name>	Scan preset: dev, quick, owasp, standard, full
--system-prompt <text>	System prompt to prepend before each attack
--system-prompt-file <path>	Path to a file containing the system prompt
--crescendo	Include multi-turn escalation (Crescendo) attacks
--load-custom-tests	Fetch active custom tests from the dashboard
--eu-ai-act	Show EU AI Act compliance assessment after the scan
--compliance-threshold <n>	Fail if compliance score is below n (0–100)
--min-score <n>	Minimum score threshold (0–100)
--fail-on-critical	Fail if any CRITICAL vulnerability is found
--ci	CI mode: JSON to stdout, status logs to stderr, no spinner
-o, --output <format>	Output format: json, markdown, sarif, pdf
--output-file <path>	Write the report to a file instead of stdout
--no-progress	Suppress the progress spinner
--api-url <url>	Backend API URL override (defaults to the URL stored at login)

Configuration File

Create a shield.config.json in your project root. The CLI auto-discovers it, or pass -c path explicitly.

Your chatbot has a POST endpoint that takes a message and returns a response.

{
  "endpoint": {
    "url": "https://api.mycompany.com/chatbot",
    "request": {
      "body": { "message": "{{prompt}}" }
    },
    "response": {
      "field": "response"
    }
  }
}

{{prompt}} — replaced with the attack payload. {{history}} — replaced with conversation turns (for stateless multi-turn).

$ENV_VAR syntax resolves values from environment variables (e.g., tokens, secrets).

Architecture: thin client

Each scan walks through four steps. The CLI never holds an LLM key, never decides what is vulnerable, and never bundles the attack catalogue: the backend is the single source of truth.

1
Authenticate
shield-llm login stores an API key (SaaS) or license key + URL (self-hosted) in ~/.shield-llm/credentials.json. Every subsequent call carries the Bearer token.
2
Fetch the attack catalogue
GET /api/cli/attacks?preset=… returns the attacks the user's plan is allowed to run, plus an optional set of custom tests. The CLI ships no attacks of its own, so a backend connection is required.
3
Send attacks to the target
For each attack, the CLI POSTs the prompt to the configured endpoint (HTTP, SSE, multi-turn). The target only sees a request shaped like its production traffic.
4
Backend judges & persists
Captured responses go to /api/cli/evaluate. The backend runs regex + LLM judge using its own credentials, scores the run, persists it to the dashboard, and returns the verdict. The CLI prints the result and exits.

Output Formats & CI/CD

Format	Description
json	Full scan report with thresholds. Default format.
markdown	Human-readable table report for documentation.
sarif	GitHub Security tab integration. Maps vulnerabilities to SARIF results.
pdf	Printable report with OWASP breakdown and remediation advice.

Exit codes: 0 passed · 1 threshold violation · 2 config or auth error · 3 runtime or network error. Use --ci to send JSON to stdout and status logs to stderr so CI logs stay parseable.

GitHub Action example:

- uses: ./.github/actions/shield-llm-scan
  with:
    api-key: ${{ secrets.SHIELD_LLM_API_KEY }}
    config: shield.config.json
    preset: owasp
    min-score: 75
    fail-on-critical: true
    output-format: sarif

MCP Server

Shield exposes its red-teaming to AI coding agents (Claude Code, Cursor, and any MCP client) through a Model Context Protocol server. An agent can scan the chatbot it is building, read the fixes, and re-prove them without leaving the editor.

Available on npm as the shield-llm-mcp package. Install it, then register it with your MCP client.

Tools

Tool	What it does
`shield_init_config`	Generate and validate a scan config by reading your app's chat route (JSON, form, or multipart bodies).
`shield_scan`	Red-team the endpoint and return score, grade, and findings, optionally with an EU AI Act compliance readout.
`shield_get_findings`	Read each finding with a layered, defense-in-depth remediation recipe.
`shield_verify`	Replay the exact exploits and prove the fix held.
`shield_scan_diff`	Compare two scans to catch any new vulnerability a fix may have introduced.

Setup

npm install -g shield-llm-mcp

# or run without installing (your MCP client can use npx):
npx shield-llm-mcp

MCP client configuration

{
  "mcpServers": {
    "shield-llm": { "command": "shield-llm-mcp" }
  }
}

How It Works

Shield LLM runs from your environment, testing chatbots through their HTTP endpoint — no SDK or code changes required.

1. Install

Install the Shield LLM CLI and point it at your chatbot endpoint.

2. Scan

Shield LLM sends attack prompts to the chatbot endpoint and captures responses in real-time.

3. Report

Get a detailed security report with severity ratings, OWASP mappings, and remediation advice.

Attack Categories

Shield LLM's test suite is mapped to the OWASP LLM Top 10, the industry standard for AI/LLM security risks.

LLM01

Prompt Injection

Manipulation of LLMs through crafted prompts, causing unintended actions. This includes direct injections that override system prompts and indirect injections embedded in external content.

37 tests

LLM02

Sensitive Information Disclosure

LLMs may reveal confidential data in responses, including PII, system internals, proprietary algorithms, training data, or other sensitive information.

19 tests

LLM03

Supply Chain

Risks from compromised third-party models, training data, or components such as those from model registries and external data sources.

8 tests

LLM04

Data and Model Poisoning

Tampered training data can impair LLM models, leading to responses that compromise security, accuracy, or ethical behavior.

5 tests

LLM05

Improper Output Handling

Occurs when LLM output is accepted without scrutiny, exposing backend systems. Can lead to XSS, CSRF, SSRF, privilege escalation, or remote code execution.

5 tests

LLM06

Excessive Agency

LLM-based systems may undertake actions leading to unintended consequences. The issue arises from granting too much functionality, permissions, or autonomy.

24 tests

LLM07

System Prompt Leakage

System prompts containing sensitive instructions, credentials, or operational logic can be exposed through crafted inputs, error messages, or agent-to-agent communication.

3 tests

LLM08

Vector and Embedding Weaknesses

Vulnerabilities in RAG systems and embedding-based methods that can be exploited to inject harmful content, manipulate outputs, or access sensitive information.

15 tests

LLM09

Misinformation

LLMs can produce false or misleading information that appears credible, including hallucinations and biased outputs. Overreliance without verification amplifies this risk.

8 tests

LLM10

Unbounded Consumption

Overloading LLMs with resource-heavy operations can cause service disruptions, increased costs, and denial of service.

8 tests

Scoring Methodology

Your chatbot gets a score from 0 to 100 and a letter grade from A to F, like a report card. Higher is safer. Each vulnerability subtracts from a perfect score, weighted by how serious it is, so one critical hole costs far more than a minor finding.

Score = 100 × (1 − ObservedPenalty / TotalWeight)

Severity Weights

Severity	Weight	Examples
CRITICAL	10	System prompt leaks, indirect injection, sensitive data extraction
CRESCENDO	7	Multi-turn escalation attacks that build trust before exploiting
COMBO	6	Multi-technique attacks combining multiple bypass methods
HIGH	5	Jailbreaks, encoding bypasses, social engineering recon
MEDIUM	2	False authority, legal/financial advice, capability probing
LOW	1	Minor information disclosure, low-impact findings

Grade Thresholds

90 – 100

Excellent — Minimal vulnerabilities detected

75 – 89

Good — Some minor issues found

60 – 74

Moderate — Several vulnerabilities need attention

40 – 59

Poor — Significant security gaps

0 – 39

Failing — Critical vulnerabilities present

Scan Presets

Choose a scan preset based on your needs. Each preset runs a curated set of attacks optimized for different use cases.

Dev Mode

Fast feedback loop for development. Tests jailbreak, system prompt leak, and roleplay jailbreak.

Quick Scan

Rapid assessment covering top critical and high-severity attacks. Good for routine checks.

OWASP Coverage

One representative test per OWASP LLM Top 10 category. Ideal for compliance checks.

Standard Scan

Comprehensive OWASP coverage. Recommended for production chatbots.

Full Scan

The full attack catalogue plus multi-turn (crescendo / memory) attacks. Maximum coverage for critical assessments.

Custom Tests

Define your own security test scenarios tailored to your chatbot's specific use case and compliance requirements.

Create a Custom Scan

Navigate to Custom Tests in the sidebar. Create a new scan and give it a descriptive name.

Add Test Cases

Define individual test cases with a prompt, expected behavior, severity level, and detection patterns.

Run from the CLI

Your custom tests run alongside the preset attacks. Add --load-custom-tests to shield-llm scan to pull your active tests from the dashboard, or --custom-only to run just your tests.

Frequently Asked Questions

Which chatbots does Shield LLM support?

Shield LLM works with any AI chatbot exposed over HTTP — ChatGPT, Claude, custom chatbots built with frameworks like Langchain, and enterprise chatbot deployments. The CLI talks to your chatbot endpoint directly via shield.config.json.

Does Shield LLM send data to external servers?

The CLI runs locally and sends attack prompts straight to your chatbot. To grade the results, your chatbot's replies are sent to the Shield LLM backend, which runs the regex + LLM judge and returns a score; the scan summary is then saved to your dashboard. Replies are used only to score the scan, not to train models.

How does the scoring work?

Shield LLM uses Weighted Impact Scoring. Each test has a severity weight (CRITICAL=10, HIGH=5, etc.). The score is calculated as: Score = 100 × (1 - ObservedPenalty / TotalWeight). This means critical vulnerabilities have a much larger impact on the score than medium or low findings.

What is the difference between regex and LLM evaluation?

Shield LLM uses a hybrid evaluation approach. First, regex pattern matching quickly detects obvious vulnerability indicators. Then, an LLM judge analyzes ambiguous responses for nuanced detection (e.g., partial compliance, subtle information leaks). This two-stage approach maximizes both speed and accuracy.

Can I add my own test scenarios?

Yes! Custom Tests let you define your own attack prompts, detection patterns, and evaluation criteria. This is useful for testing domain-specific vulnerabilities or compliance requirements unique to your organization.

How much does Shield LLM cost?

Shield LLM is self-hosted — you run it on your own infrastructure with every feature unlocked (LLM evaluation, PDF reports, custom tests, compliance). The only cost is your own LLM provider API usage for the AI judge.

How do I scan my own API endpoint?

Log in, then point the CLI at your endpoint: shield-llm login --key sk_shield_..., then shield-llm scan -e https://your-api.com/chat. For full control (authentication, custom headers, request/response mapping), run shield-llm init to generate a shield.config.json. The CLI auto-detects SSE streaming responses.

Can I use Shield LLM in CI/CD pipelines?

Yes! The CLI outputs SARIF format for GitHub Security tab integration. Use shield-llm scan -o sarif --min-score 75 --fail-on-critical to fail builds when vulnerabilities are found. A ready-made GitHub Action is also available.

Ready to secure your AI chatbot?

Get Started