mirror of
https://github.com/github/codeql.git
synced 2026-06-19 03:41:07 +02:00
Mirror the JavaScript layout from PR #21953: - Move SystemPromptInjection.ql / UserPromptInjection.ql to src/Security/CWE-1427 - Move customizations, query and framework libs to python/ql/lib - Move the AIPrompt concept to the production Concepts.qll - Drop the experimental tag; py/system-prompt-injection (high precision) now joins the code-scanning, security-extended and security-and-quality suites, while py/user-prompt-injection (low precision) stays out of the default suites - Move query tests to python/ql/test/query-tests/Security Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
48 lines
2.7 KiB
XML
48 lines
2.7 KiB
XML
<!DOCTYPE qhelp PUBLIC
|
|
"-//Semmle//qhelp//EN"
|
|
"qhelp.dtd">
|
|
<qhelp>
|
|
|
|
<overview>
|
|
<p>If untrusted input is included in a user-role prompt sent to an AI model, an attacker can inject
|
|
instructions that manipulate the model's behavior. This is known as <i>indirect prompt injection</i>
|
|
when the malicious content arrives through data the model processes, or <i>direct prompt injection</i>
|
|
when the attacker controls the prompt directly.</p>
|
|
|
|
<p>Unlike system prompt injection, user prompt injection targets the user-role messages. Although
|
|
user messages are expected to carry user input, passing unsanitized data directly into structured
|
|
prompt templates can still allow an attacker to override intended instructions, extract sensitive
|
|
context, or trigger unintended tool calls.</p>
|
|
</overview>
|
|
|
|
<recommendation>
|
|
<p>To mitigate user prompt injection:</p>
|
|
<ul>
|
|
<li>Ensure that all data flowing into user input is intended and necessary for the purpose of the AI system.</li>
|
|
<li>Ensure the system prompt clearly describes the purpose, scope and boundaries of the AI system. Instruct the system to deny input that falls outside these boundaries.</li>
|
|
<li>If creating a prompt out of multiple user-controlled values, assume that each of them can be malicious. Ensure the range of possible values is restricted and validated.
|
|
For example, if a prompt includes a question and the intended language to respond in, validate that the language is one of the supported options.</li>
|
|
<li>Consider using guardrails on the input like the OpenAI guardrails library to enforce constraints and prevent malicious content from being processed.</li>
|
|
<li>Apply output filtering to detect and block responses that indicate prompt injection attempts.</li>
|
|
</ul>
|
|
</recommendation>
|
|
|
|
<example>
|
|
<p>In the following example, user-controlled data is inserted directly into a user-role prompt
|
|
without any validation, allowing an attacker to inject arbitrary instructions.</p>
|
|
<sample src="examples/user-prompt-injection.py" />
|
|
|
|
<p>The following example applies multiple mitigations together, and only includes data that is
|
|
necessary for the task in the prompt: the value that selects behavior (the response language) is
|
|
validated against a fixed allowlist before it is used, and the system prompt clearly describes the
|
|
assistant's scope and instructs it to ignore embedded instructions.</p>
|
|
<sample src="examples/user-prompt-injection_fixed.py" />
|
|
</example>
|
|
|
|
<references>
|
|
<li>OWASP: <a href="https://genai.owasp.org/llmrisk/llm01-prompt-injection/">LLM01: Prompt Injection</a>.</li>
|
|
<li>MITRE CWE: <a href="https://cwe.mitre.org/data/definitions/1427.html">CWE-1427: Improper Neutralization of Input Used for LLM Prompting</a>.</li>
|
|
</references>
|
|
|
|
</qhelp>
|