Input validation for LLM
As I was building checklist.devops.security, I needed to perform input validation to check whether the user input actually corresponds to a valid technology in order to mitigate Prompt Injection attacks. The issue is that it's an input="text"
, so the user can fill in anything. How can this input be validated?
The good old input validation measures are still helpful: enforcing a maximum size and allowing only certain characters, but that's not enough.
The next step I thought of was to use the LLM itself to validate the input. Here's a similar prompt:
{
"user_input": "Java", // escape the user input
"question": "Is the \"user_input\" a valid technology?",
"instructions": "Don't give explanations",
"output_values": [true, false]
}
Relying on an LLM is not a perfect control because slight modifications to the input can confuse the LLM.
The ideal scenario is to have an allow-list of user inputs, but this isn't applicable to many applications.
For more resources on the topic, check out the OWASP Top 10 for Large Language Model Applications. It's still a draft, but it provides food for thought in any case.