Simple HTML sanitizer in Javascript

Another hint: as of May 2021 there is am upcoming Sanitizer API in Firefox.

const inputString = 'Some text <b><i>with</i></b> <blink>tags</blink>,, including a rogue script <script>alert(1)</script> def.';
const result = new Sanitizer().sanitizeToString(inputString);
console.log(result);
// Logs "Some text <b><i>with</i></b>, including a rogue script def."

(MDN example)

See: https://developer.mozilla.org/en-US/docs/Web/API/HTML_Sanitizer_API

If this feature is accepted by other vendors as well, it might help us get rid of JS-sanitizer-implementations.


Here is a 2kb (depends on Snarkdown, which is a 1kb markdown renderer, replace with what you need) vue component that will render escaped markdown, optionally even translating B & I tags for content that may include those tags with formatting...

<template>
  <div v-html="html">
  </div>
</template>

<script>
import Snarkdown from 'snarkdown'
export default {
  props: ['code', 'bandi'],
  computed: {
    html () {
      // Convert b & i tags if flagged...
      const unsafe = this.bandi ? this.code
        .replace(/<b>/g, '**')
        .replace(/<\/b>/g, '**')
        .replace(/<i>/g, '*')
        .replace(/<\/i>/g, '*') : this.code

      // Process the markdown after we escape the html tags...
      return Snarkdown(unsafe
        .replace(/&/g, '&amp;')
        .replace(/</g, '&lt;')
        .replace(/>/g, '&gt;')
        .replace(/"/g, '&quot;')
        .replace(/'/g, '&#039;')
      )
    }
  }
}
</script>

As a comparison, vue-markdown is over 100kb. This won't render math formulas and such, but 99.99% of people won't use it for those things, so not sure why the most popular markdown components are so bloated :(

This is safe to XSS attacks and super fast.

Why did I use &#039; and not &apos;? Because: Why shouldn't `&apos;` be used to escape single quotes?

And now for something completely different, but related...

Not sure why this hasn't been mentioned yet... but your browser can sanitize for you.

Here is the 3-line HTML sanitizer that can sanitize 30x faster than any JavaScript variant by using the assembly language version that comes with your browser... This is used in Vue/React/Angular and many other UI frameworks. Note this does NOT escape HTML, it removes it.

const decoder = document.createElement('div')
decoder.innerHTML = YourXSSAttackHere
const sanitized = decoder.textContent

As proof this method is accepted and fast, here is a live link to the decoder used in Vue.js which uses the same pattern: https://github.com/vuejs/vue/blob/dev/src/compiler/parser/entity-decoder.js


We've developed a simple HtmlSantizer and opensourced it here: https://github.com/jitbit/HtmlSanitizer

Usage

var result = HtmlSanitizer.SanitizeHtml(input);

[Disclaimer! I'm one of the authors!]