How to design refusal into your LLM

What is becoming increasingly clear in product design is that refusals from an LLM are not simply a moral or technical quirk of a model: it is a conscious first-class design choice on when a model should refuse help. The system is built around a boundary of operation, however hazy, and pushing this boundary should result in a resolute refusal.

Therefore, a model saying “No” is a valid interaction pattern, and I think it is a fascinating peek into how models reason and deduce.

I break down model refusals into three categories:

Hard Block This is easy to understand. The model has decided, based on it’s policy or constitution, that it will not respond. This is non-negotiable and a complete refusal of the instruction
Deflection A redirect away from the original request, to an adjacent problem space that may still be relevant to the asker
Constrained Reply A correct answer, and observability would dictate that this is a successful resolution, within a very tight scope. There is no refusal here, but a clear unwillingness to elaborate

The question now becomes: what is the risk appetite of your product? Finding the right balance of the three cases above is how you successfully deploy a LLM

For a high risk domain question (like health and critical care, financial advice, violent acts, illegal activity, etc), the default move is a hard block. The system must show complete refusal under all circumstances. The response is the refusal, a brief justification, and no loopholes via rephrasing or follow-ups. Additionally, flag the conversation for review in the future.

When we talk about agents that can take action, within a specific domain (like finance trading Assistants, coding assistants, HR assistants, enterprise copilots), the right combination is hard blocks on prohibited actions with deflection and constrained replies. This needs careful calibrarion because the model must be able to distiguish between direct but harmful instructions (“Execute this trade now” -> No), while safely doing related work (“Draft a risky trade action for review”). There can exist clearly defined deflection routes (“Request permission from compliance authority”). The constrained replies keep the model inside pre-determined action/conversation flows, or policies.

For consumer and creative tools (think chatbots, writing assistants, educational apps), the risk appetite shifts. Here, aggressive hard blocks actually erode utility and trust. Here, constrained replies are what shine: answers lay within guidelines but are useful and actionable, as much as possible. In edge cases or on being pressed, the way out is deflection. By redirecting a request to official sources or offline channels, your product still gets to maintain trust and usability.

Finally, a new category is appearing in the open web. Social, gaming, and entertainment products tend to have a looser policy around risk, but still need gaurdrails around hate, harassment and sensitive topics. A good approach here is deflection first: your model should refuse to mirror a toxic line of questioning, and instead channel them into neutral or informative responses. For example, a targeted hate request can be channeled into an opportunity of learning for the user, and a more neutral tone will go a long way.

Of course, the right move is the one that works for you. Use these patterns in your product specification, incorporate it into every micro decision. For every new feature, decide what refusal pattern should be allowed, should be prioritised, or should be avoided. Make it part of your spec.

A model’s refusal is just another system primitive. Design it.

LLM Refusal: A Product Design Perspective

How to design refusal into your LLM