Wordpress - Why escape if the_content isnt?

If I were a hacker with access to the database, wouldn't I just add my code to a post's content?

If you've got access to the database, chances are that you've got enough access that escaping isn't going to stop you. Escaping is not going to help you if you've been hacked. It's not supposed to. There's other reasons to escape. The two main ones that I can think of are:

To deal with unsanitized input

WordPress post content is sanitized when it's saved, but not everything else is. Content passed via a query string in the URL isn't sanitized, for example. Neither is content in translation files, necessarily. Both those are sources of content that have nothing to do with the site being compromised. So translatable text and content pulled from the URL need to be escaped.

To prevent users accidentally breaking markup

Escaping isn't just for security. You also need it to prevent users accidentally breaking their site's markup. For example, if the user placing quotes or > symbols in some content in your plugin would break the markup, then you should escape that output. You don't want to be over-aggressive in sanitising on input, because there's perfectly valid reasons a user might want to use those characters.


“Escaping isn’t only about protecting from bad guys. It’s just making our software durable. Against random bad input, against malicious input, or against bad weather.”

That's from the WordPress VIP guidelines on escaping. It has a lot more to say on this matter, and you should give it a read.


I'm actually an engineer at VIP who does a lot of code review :) I flag a lot of missing escaping.

but does not escape output

Not quite, it doesn't escape on output, which is surprising to most people. This is because if you're a super admin you have the unfiltered_html capability, so it can't escape on output. Instead it runs it through wp_kses_post on input. Ideally you would remove that capability though.

Here is the implementation at the current time:

function the_content( $more_link_text = null, $strip_teaser = false ) {
    $content = get_the_content( $more_link_text, $strip_teaser );

    /**
     * Filters the post content.
     *
     * @since 0.71
     *
     * @param string $content Content of the current post.
     */
    $content = apply_filters( 'the_content', $content );
    $content = str_replace( ']]>', ']]>', $content );
    echo $content;
}

The ideal mechanism for escaping anything that goes through the_content filter on the other hand is:

echo apply_filters( 'the_content', wp_kses_post( $content ) );

This way we make the content safe, then run it through the filter, avoiding the embeds etc being stripped out.

So Why Escape

The point of escaping is to generate valid HTML, the added security it provides is just a nice side effect.

To prevent users accidentally breaking markup

There are many reasons to escape, but fundamentally, you're enforcing expectations. Take the following code:

<a href="<?=$url?>">

We expect $url to contain a URL suitable for a href attribute, but what if it isn't? Well why leave it to chance, lets enforce it:

<a href="<?=esc_url( $url )?>">

It is now always going to be a URL. It doesn't matter if a hacker puts an image in $url, or if a user types in the wrong field, or there's a malicious script. It will always be a valid URL because we said it's going to be a URL. Sure it might be a very strange URL, but it will always meet the expectation that a URL will be there. This is very handy, be it for markup validation, for security, etc

Having said that, escaping is not validation, escaping is not sanitisation. Those are separate steps that happen at different points in the life cycle. Escaping forces things to meet expectations, even if it mangles them to do so.

Sometimes I like to think of escaping as one of those Japanese gameshows with the giant foam wall with the cut out. Contestants have to fit in the dog shape or they get discarded, only for our purposes there are lasers and knives around the hole. Whatever is left at the end will be dog shaped, and it will be unforgiving and strict if you're not already dog shaped.

Remember:

  • sanitise early
  • validate early
  • escape late
  • escape often

Security is a multiple step, multiple layer onion of defences, escaping is one of the outer layers of defence on output. It can mangle attack code on a compromised site rendering it useless, thwart open exploits, and make sure your client doesn't break a site by putting tags in a field they shouldn't. It's not a substitute for the other things, and it's by far and away the most underused security tool in a developers handbook.

As for why to escape if the_content doesn't? If you have a flood coming, and 5 holes in a wall, but only time to fix 3, do you shrug and fix none? Or do you mitigate the risk and reduce the attack area?

Perhaps I can help fix those final 2 holes with this snippet:

add_filter( 'the_content' function( $content ) {
    return wp_kses_post( $content );
}, PHP_INT_MAX + 1 );

Here we set the priority to the highest possible number in PHP, then add 1 so it overflows to the lowest possible number that can be represented. This way all calls to the_content will escape the value prior to any other filters. This way embeds etc still work, but users can't sneak in dangerous HTML via the database. Additionally, look into removing the unfiltered_html capability from all roles


The point of escaping is to generate valid HTML, the added security it provides is just a nice side effect.

The filters applied on the content, generate a valid HTML from something that is a mix of HTML and some other text which have some other syntax like shortcodes. The fact that some of the content is already valid HTML prevents applying escaping on all of it.

As for kses related functions, you can not apply them mainly because you do not have enough context to know which one to use. For example, there might be some process which uses the the_content filter to add JS to the post content therefor core can not guess based on things like the post author if the JS is legit or not.

So...why is it helping anything to escape elsewhere? If I were a hacker with access to the database, wouldn't I just add my code to a post's content?

Again, escaping is for generating valid HTML. From a security POV it is not that escaping provides security but that a code which lucks escaping should be suspicious as it might be easier to exploit. For example, the way core uses _e and '__` for translations means that anyone that can convince you to install a non-official translation might be able to add hard to detect JS in the translation file and hack your site. This is a good example of "do what I say and not what I do".

Tags:

Security