Sanitizing input for parameterized queries

No, it's not necessary. But please, read on.

Input sanitization is a horrible term that pretends you can wave a magic wand at data and make it "safe data". The problem is that the definition of "safe" changes when the data is interpreted by different pieces of software.

Data that may be safe to be embedded in an SQL query may not be safe for embedding in HTML. Or JSON. Or shell commands. Or CSV. And stripping (or outright rejecting) values so that they are safe for embedding in all those contexts (and many others) is too restrictive.

So what should we do? Make sure the data is never in a position to do harm.

The best way to achieve this is to avoid interpretation of the data in the first place. Parametrized SQL queries is an excellent example of this; the parameters are never interpreted as SQL, they're simply put in the database as, well, data.

For many other situations, the data still needs to be embedded in other formats, say, HTML. In that case, the data should be escaped for that particular language at the moment it's embedded. So, to prevent XSS, data is HTML-escaped at view time. Not at input time. The same applies to other embedding situations.

So, should we just pass anything we get straight to the database?

Maybe. It depends.

There are definitely things you can check about user input, but this is highly context-dependent. Because sanitization is ill-defined and mis-used, I prefer to call this validation.

  • For example, if some field is an supposed to be an integer, you can certainly validate this field to ensure it contains an integer (or maybe NULL).
  • You can certainly do some validation on email fields (although some people argue there's not much you can do besides checking for the presence of a @, and they have a good point).
  • You can require comments to have a minimum and maximum length.
  • You should probably verify that any string contains only valid characters for its encoding (e.g., no invalid UTF-8 sequences).
  • You could restrict a username to certain characters, if that makes sense for your userbase.
  • A minimum length for passwords is, of course, incredibly common.

As you can see, these checks are very context-dependent. And all of them are to help increase the odds you end up with data that makes sense. They are not to protect your application from malicious input (SQL injection, XSS, command injection, etc), because this is not the place to do that.

Users should be free to type '; DROP TABLE users; -- without their post being rejected or mangled to \'; DROP TABLE users; --. Note that I'm able to include such "malicious" content on sec.SE!

So, to answer your original question:

... is it still necessary and/or security-relevant to somehow sanitize input?

No, it is not. But please do properly escape the data where needed before outputting it. And consider validations, where applicable.

I think of benign, but not-so-well-engineered 3rd party tools (maybe self-written scripts by the admin, or some fancy CrystalReports made by a non-technician) trying to consume unsanitized data from our database.

Then escape or filter the data before outputting to those tools, but don't mangle data in your database.

But really, those scripts should be fixed, or rewritten with security in mind.

(MySQL seems to have problems with Emojis)

Little off-topic, but have a look at the utfmb4 charset for MySQL ;)


If I use fully parameterized queries everywhere, is it still necessary and/or security-relevant to somehow sanitize input?

Yes. It's always a good idea to sanitize the input before sending it to the database.

Parameterized queries might save you from SQL injection attacks, but might not prove beneficial in case of stored XSS attacks. If a user sends a malicious javascript code into your form, and you store it successfully in your database, and you display the same field elsewhere, the malicious script might run on the victim's browser.

It's always a good idea to sanitize the input before sending it ahead, and sanitizing the output before sending it to client's browser.

EDIT: As pointed out in comments, sanitization is different from encoding/escaping values. Sanitization, here, meant that the inputs can be validated for some basic checks before passing it on to the database(or any further layer). For example, a field which expects an e-mail address can be validated for a valid e-mail address, a field for age can be validated for integer only, etc. Regular expressions prove to be of immense help here. It should be noted that these checks should be placed on front-end(client-side) as well as back-end.

In case of fields which expect rich text, you can implement Content Security Policy on pages wherever the user input will be displayed. You can also use an HTML sanitizer to sanitize the user input.


Yes, you should always sanitize input data. Sanitation isn't just about protecting you from injection, but also to validate types, restricted value (enums), ranges, etc.. While an attacker might not be able to manipulate your sql, they can still cause undesired behavior in the rest of your application.

For example, if an attacker changed an enum value, could they manipulate the system? Did they just change a role, user type, etc... ? Alternatively they could input a value larger than can be accepted in your schema or the data type (byte v int32, int64), could this cause the application to crash that exposes information or leaves orphaned data changed (not transacted).