Everything you need to know about Shield LLM — attack methodology, scoring, scan presets, and more.
Shield LLM is a CLI that performs automated security testing on AI chatbots. Follow these steps to get started:
Scan chatbot APIs directly from the command line: REST endpoints, SSE streaming, custom HTTP shapes. The CLI is a thin client. Authentication, the attack catalogue, judging and scoring all live on the backend (SaaS or self-hosted License). New attacks reach existing CI installations on the next scan, with no upgrade.
npm install -g @shield-llm/cli # Or use without installing: npx shield-llm scan --help
# 1. Authenticate (SaaS or License instance) shield-llm login --key sk_shield_xxx # 2. Generate a config for your chatbot shield-llm init # 3. Run a scan shield-llm scan -c shield.config.json # 4. (CI mode) JSON to stdout, status to stderr, exit-code policy shield-llm scan -c shield.config.json --ci --min-score 75
| Command | Description |
|---|---|
| shield-llm login --key '<'sk_shield_...'>' [--api-url '<'url'>'] | Authenticate against the SaaS (default) or a self-hosted License instance via --api-url. |
| shield-llm init | Interactive wizard. Generates a shield.config.json for an HTTP chatbot endpoint (URL, method, auth, body template, response field, preset). |
| shield-llm scan [options] | Run a security scan. Pulls the plan-filtered attack catalogue from the backend and routes responses through the server-side judge. |
| shield-llm report '<'scanId'>' [--output json|markdown] | Fetch a stored scan report from the dashboard for audit trails, Slack post-mortems, or CI artifacts. |
| shield-llm validate [-c '<'path'>'] | Validate a shield.config.json without sending any traffic. |
| shield-llm tests [--remote] | List custom tests from the local config and/or your dashboard. |
| shield-llm attacks [--severity '<'level'>'] | List bundled attack payloads (informational; the live catalogue served to scans lives on the backend). |
| shield-llm presets | List bundled preset definitions with their attack counts. |
| shield-llm logout | Remove stored credentials from ~/.shield-llm/credentials.json. |
| Flag | Description |
|---|---|
| -c, --config <path> | Path to shield.config.json (default: ./shield.config.json) |
| -e, --endpoint <url> | Target chatbot HTTP endpoint URL (overrides config) |
| --auth <type> | Auth type: none, bearer, api-key, oauth2 |
| --token <value> | Bearer token or API key value (use $ENV_VAR for secrets) |
| --auth-header <name> | Custom auth header name (default: X-API-Key) |
| --response-field <path> | Dot-notation path to response text (e.g. response, choices[0].message.content) |
| --request-body <json> | JSON body template with {{prompt}} placeholder |
| --preset <name> | Scan preset: dev, quick, owasp, standard, full |
| --system-prompt <text> | System prompt to prepend before each attack |
| --system-prompt-file <path> | Path to a file containing the system prompt |
| --crescendo | Include multi-turn escalation (Crescendo) attacks |
| --load-custom-tests | Fetch active custom tests from the dashboard |
| --eu-ai-act | Show EU AI Act compliance assessment after the scan |
| --compliance-threshold <n> | Fail if compliance score is below n (0–100) |
| --min-score <n> | Minimum score threshold (0–100) |
| --fail-on-critical | Fail if any CRITICAL vulnerability is found |
| --ci | CI mode: JSON to stdout, status logs to stderr, no spinner |
| -o, --output <format> | Output format: json, markdown, sarif, pdf |
| --output-file <path> | Write the report to a file instead of stdout |
| --no-progress | Suppress the progress spinner |
| --api-url <url> | Backend API URL override (defaults to the URL stored at login) |
Create a shield.config.json in your project root. The CLI auto-discovers it, or pass -c path explicitly.
Your chatbot has a POST endpoint that takes a message and returns a response.
{
"endpoint": {
"url": "https://api.mycompany.com/chatbot",
"request": {
"body": { "message": "{{prompt}}" }
},
"response": {
"field": "response"
}
}
}{{prompt}} — replaced with the attack payload. {{history}} — replaced with conversation turns (for stateless multi-turn).
$ENV_VAR syntax resolves values from environment variables (e.g., tokens, secrets).
Each scan walks through four steps. The CLI never holds an LLM key, never decides what is vulnerable, and never bundles the attack catalogue: the backend is the single source of truth.
| Format | Description |
|---|---|
| json | Full scan report with thresholds. Default format. |
| markdown | Human-readable table report for documentation. |
| sarif | GitHub Security tab integration. Maps vulnerabilities to SARIF results. |
| Printable report with OWASP breakdown and remediation advice. |
Exit codes: 0 passed · 1 threshold violation · 2 config or auth error · 3 runtime or network error. Use --ci to send JSON to stdout and status logs to stderr so CI logs stay parseable.
GitHub Action example:
- uses: ./.github/actions/shield-llm-scan
with:
api-key: ${{ secrets.SHIELD_LLM_API_KEY }}
config: shield.config.json
preset: owasp
min-score: 75
fail-on-critical: true
output-format: sarifShield LLM runs from your environment, testing chatbots through their HTTP endpoint — no SDK or code changes required.
Install the Shield LLM CLI and point it at your chatbot endpoint.
Shield LLM sends attack prompts to the chatbot endpoint and captures responses in real-time.
Get a detailed security report with severity ratings, OWASP mappings, and remediation advice.
Shield LLM's test suite is mapped to the OWASP LLM Top 10, the industry standard for AI/LLM security risks.
Manipulation of LLMs through crafted prompts, causing unintended actions. This includes direct injections that override system prompts and indirect injections embedded in external content.
LLMs may reveal confidential data in responses, including PII, system internals, proprietary algorithms, training data, or other sensitive information.
Risks from compromised third-party models, training data, or components such as those from model registries and external data sources.
Tampered training data can impair LLM models, leading to responses that compromise security, accuracy, or ethical behavior.
Occurs when LLM output is accepted without scrutiny, exposing backend systems. Can lead to XSS, CSRF, SSRF, privilege escalation, or remote code execution.
LLM-based systems may undertake actions leading to unintended consequences. The issue arises from granting too much functionality, permissions, or autonomy.
System prompts containing sensitive instructions, credentials, or operational logic can be exposed through crafted inputs, error messages, or agent-to-agent communication.
Vulnerabilities in RAG systems and embedding-based methods that can be exploited to inject harmful content, manipulate outputs, or access sensitive information.
LLMs can produce false or misleading information that appears credible, including hallucinations and biased outputs. Overreliance without verification amplifies this risk.
Overloading LLMs with resource-heavy operations can cause service disruptions, increased costs, and denial of service.
Shield LLM uses Weighted Impact Scoring to calculate a security score from 0 to 100. Critical vulnerabilities have a much larger impact than minor findings.
| Severity | Weight | Examples |
|---|---|---|
| CRITICAL | 10 | System prompt leaks, indirect injection, sensitive data extraction |
| CRESCENDO | 7 | Multi-turn escalation attacks that build trust before exploiting |
| COMBO | 6 | Multi-technique attacks combining multiple bypass methods |
| HIGH | 5 | Jailbreaks, encoding bypasses, social engineering recon |
| MEDIUM | 2 | False authority, legal/financial advice, capability probing |
| LOW | 1 | Minor information disclosure, low-impact findings |
Choose a scan preset based on your needs. Each preset runs a curated set of attacks optimized for different use cases.
Fast feedback loop for development. Tests jailbreak, system prompt leak, and roleplay jailbreak.
Rapid assessment covering top critical and high-severity attacks. Good for routine checks.
One representative test per OWASP LLM Top 10 category. Ideal for compliance checks.
Comprehensive OWASP coverage. Recommended for production chatbots.
All 74 MVP attacks + 5 multi-turn crescendo attacks. Maximum coverage for critical assessments.
Define your own security test scenarios tailored to your chatbot's specific use case and compliance requirements.
Ready to secure your AI chatbot?
Get Started Free