Find and replace specific text characters across a document with JS

My own suggestion is as follows:

function nativeSelector() {
    var elements = document.querySelectorAll("body, body *");
    var results = [];
    var child;
    for(var i = 0; i < elements.length; i++) {
        child = elements[i].childNodes[0];
        if(elements[i].hasChildNodes() && child.nodeType == 3) {
            results.push(child);
        }
    }
    return results;
}

var textnodes = nativeSelector(),
    _nv;
for (var i = 0, len = textnodes.length; i<len; i++){
    _nv = textnodes[i].nodeValue;
    textnodes[i].nodeValue = _nv.replace(/£/g,'€');
}

JS Fiddle demo.

The nativeSelector() function comes from an answer (posted by Anurag) to this question: getElementsByTagName() equivalent for textNodes.


ECMAScript 2015+ approach

Pitfalls when solving this task

This seems like an easy task, but you have to take care of several things:

  • Simply replacing the entire HTML kills all DOM functionality, like event listeners
  • Replacing the HTML may also replace <script> or <style> contents, or HTML tags or attributes, which is not always desired
  • Changing the HTML may result in an xss attack
  • You may want to replace attributes like title and alt (in a controlled manner) as well

Guarding against xss attacks generally can’t be solved by using the approaches below. E.g. if a fetch call reads a URL from somewhere on the page, then sends a request to that URL, the functions below won’t stop that, since this scenario is inherently unsafe.

Replacing the text contents of all elements

This basically selects all elements that contain normal text, goes through their child nodes — among those are also text nodes —, seeks those text nodes out and replaces their contents.

You can optionally specify a different root target, e.g. replaceOnDocument(/€/g, "$", { target: someElement });; by default, the <body> is chosen.

const replaceOnDocument = (pattern, string, {target = document.body} = {}) => {
  // Handle `string` — see the last section
  [
    target,
    ...target.querySelectorAll("*:not(script):not(noscript):not(style)")
  ].forEach(({childNodes: [...nodes]}) => nodes
    .filter(({nodeType}) => nodeType === document.TEXT_NODE)
    .forEach((textNode) => textNode.textContent = textNode.textContent.replace(pattern, string)));
};

replaceOnDocument(/€/g, "$");

Replacing text nodes, element attributes and properties

Now, this is a little more complex: you need to check three cases: whether a node is a text node, whether it’s an element and its attribute should be replaced, or whether it’s an element and its property should be replaced. A replacer object provides methods for text nodes and for elements.

Before replacing attributes and properties, the replacer needs to check whether the element has a matching attribute; otherwise new attributes get created, undesirably. It also needs to check whether the targeted property is a string, since only strings can be replaced, or whether the matching property to the targeted attribute is not a function, since this may lead to an xss attack.

In the example below, you can see how to use the extended features: in the optional third argument, you may add an attrs property and a props property, which is an iterable (e.g. an array) each, for the attributes to be replaced and the properties to be replaced, respectively.

You’ll also notice that this snippet uses flatMap. If that’s not supported, use a polyfill or replace it by the reduceconcat, or mapreduceconcat construct, as seen in the linked documentation.

const replaceOnDocument = (() => {
    const replacer = {
      [document.TEXT_NODE](node, pattern, string){
        node.textContent = node.textContent.replace(pattern, string);
      },
      [document.ELEMENT_NODE](node, pattern, string, {attrs, props} = {}){
        attrs.forEach((attr) => {
          if(typeof node[attr] !== "function" && node.hasAttribute(attr)){
            node.setAttribute(attr, node.getAttribute(attr).replace(pattern, string));
          }
        });
        props.forEach((prop) => {
          if(typeof node[prop] === "string" && node.hasAttribute(prop)){
            node[prop] = node[prop].replace(pattern, string);
          }
        });
      }
    };

    return (pattern, string, {target = document.body, attrs: [...attrs] = [], props: [...props] = []} = {}) => {
      // Handle `string` — see the last section
      [
        target,
        ...[
          target,
          ...target.querySelectorAll("*:not(script):not(noscript):not(style)")
        ].flatMap(({childNodes: [...nodes]}) => nodes)
      ].filter(({nodeType}) => replacer.hasOwnProperty(nodeType))
        .forEach((node) => replacer[node.nodeType](node, pattern, string, {
          attrs,
          props
        }));
    };
})();

replaceOnDocument(/€/g, "$", {
  attrs: [
    "title",
    "alt",
    "onerror" // This will be ignored
  ],
  props: [
    "value" // Changing an `<input>`’s `value` attribute won’t change its current value, so the property needs to be accessed here
  ]
});

Replacing with HTML entities

If you need to make it work with HTML entities like &shy;, the above approaches will just literally produce the string &shy;, since that’s an HTML entity and will only work when assigning .innerHTML or using related methods.

So let’s solve it by passing the input string to something that accepts an HTML string: a new, temporary HTMLDocument. This is created by the DOMParser’s parseFromString method; in the end we read its documentElement’s textContent:

string = new DOMParser().parseFromString(string, "text/html").documentElement.textContent;

If you want to use this, choose one of the approaches above, depending on whether or not you want to replace HTML attributes and DOM properties in addition to text; then simply replace the comment // Handle `string` — see the last section by the above line.

Now you can use replaceOnDocument(/Güterzug/g, "G&uuml;ter&shy;zug");.

NB: If you don’t use the string handling code, you may also remove the { } around the arrow function body.

Note that this parses HTML entities but still disallows inserting actual HTML tags, since we’re reading only the textContent. This is also safe against most cases of xss: since we’re using parseFromString and the page’s document isn’t affected, no <script> gets downloaded and no onerror handler gets executed.

You should also consider using \xAD instead of &shy; directly in your JavaScript string, if it turns out to be simpler.


How about this, replacing @ with $:

$("body").children().each(function () {
    $(this).html( $(this).html().replace(/@/g,"$") );
});

http://jsfiddle.net/maximua/jp96C/1/