Input validation for LLM

As I was building checklist.devops.security, I needed to perform input validation to check whether the user input actually corresponds to a valid technology in order to mitigate Prompt Injection attacks. The issue is that it's an input="text", so the user can fill in anything. How can this input be validated?

The good old input validation measures are still helpful: enforcing a maximum size and allowing only certain characters, but that's not enough.

The next step I thought of was to use the LLM itself to validate the input. Here's a similar prompt:

    "user_input": "Java", // escape the user input
    "question": "Is the \"user_input\" a valid technology?",
    "instructions": "Don't give explanations",
    "output_values": [true, false]

Relying on an LLM is not a perfect control because slight modifications to the input can confuse the LLM.

The ideal scenario is to have an allow-list of user inputs, but this isn't applicable to many applications.

For more resources on the topic, check out the OWASP Top 10 for Large Language Model Applications. It's still a draft, but it provides food for thought in any case.

