How to parse a small subset of Markdown into React components?

It looks like you are looking for a small very basic solution. Not "super-monsters" like react-markdown-it :)

I would like to recommend you https://github.com/developit/snarkdown which looks pretty lightweight and nice! Just 1kb and extremely simple, you can use it & extend it if you need any other syntax features.

Supported tags list https://github.com/developit/snarkdown/blob/master/src/index.js#L1

Update

Just noticed about react components, missed it in the beginning. So that's great for you I believe to take the library as an example and implement your custom required components to get it done without setting HTML dangerously. The library is pretty small and clear. Have fun with it! :)


How it works?

It works by reading a string chunk by chunk, which might not be the best solution for really long strings.

Whenever the parser detects a critical chunk is being read, i.e. '*' or any other markdown tag, it starts parsing chunks of this element until the parser finds its closing tag.

It works on multi-line strings, see the code for example.

Caveats

You haven't specified, or I could have misuderstood your needs, if there's the necessity to parse tags that are both bold and italic, my current solution might not work in this case.

If you need, however, to work with the above conditions just comment here and I'll tweak the code.

First update: tweaks how markdown tags are treated

Tags are no longer hardcoded, instead they are a map where you can easily extend to fit your needs.

Fixed the bugs you've mentioned in the comments, thanks for pointing this issues =p

Second update: multi-length markdown tags

Easiest way of achieving this: replacing multi-length chars with a rarely used unicode

Though the method parseMarkdown does not yet support multi-length tags, we can easily replace those multi-length tags with a simple string.replace when sending our rawMarkdown prop.

To see an example of this in practice, look at the ReactDOM.render, located at the end of the code.

Even if your application does support multiple languages, there are invalid unicode characters that JavaScript still detects, ex.: "\uFFFF" is not a valid unicode, if I recall correctly, but JS will still be able to compare it ("\uFFFF" === "\uFFFF" = true)

It might seems hack-y at first but, depending on your use-case, I don't see any major issues by using this route.

Another way of achieving this

Well, we could easily track the last N (where N corresponds to the length of the longest multi-length tag) chunks.

There would be some tweaks to be made to the way the loop inside method parseMarkdown behaves, i.e. checking if current chunk is part of a multi-length tag, if it is use it as a tag; otherwise, in cases like ``k, we'd need to mark it as notMultiLength or something similar and push that chunk as content.

Code

// Instead of creating hardcoded variables, we can make the code more extendable
// by storing all the possible tags we'll work with in a Map. Thus, creating
// more tags will not require additional logic in our code.
const tags = new Map(Object.entries({
  "*": "strong", // bold
  "!": "button", // action
  "_": "em", // emphasis
  "\uFFFF": "pre", // Just use a very unlikely to happen unicode character,
                   // We'll replace our multi-length symbols with that one.
}));
// Might be useful if we need to discover the symbol of a tag
const tagSymbols = new Map();
tags.forEach((v, k) => { tagSymbols.set(v, k ); })

const rawMarkdown = `
  This must be *bold*,

  This also must be *bo_ld*,

  this _entire block must be
  emphasized even if it's comprised of multiple lines_,

  This is an !action! it should be a button,

  \`\`\`
beep, boop, this is code
  \`\`\`

  This is an asterisk\\*
`;

class App extends React.Component {
  parseMarkdown(source) {
    let currentTag = "";
    let currentContent = "";

    const parsedMarkdown = [];

    // We create this variable to track possible escape characters, eg. "\"
    let before = "";

    const pushContent = (
      content,
      tagValue,
      props,
    ) => {
      let children = undefined;

      // There's the need to parse for empty lines
      if (content.indexOf("\n\n") >= 0) {
        let before = "";
        const contentJSX = [];

        let chunk = "";
        for (let i = 0; i < content.length; i++) {
          if (i !== 0) before = content[i - 1];

          chunk += content[i];

          if (before === "\n" && content[i] === "\n") {
            contentJSX.push(chunk);
            contentJSX.push(<br />);
            chunk = "";
          }

          if (chunk !== "" && i === content.length - 1) {
            contentJSX.push(chunk);
          }
        }

        children = contentJSX;
      } else {
        children = [content];
      }
      parsedMarkdown.push(React.createElement(tagValue, props, children))
    };

    for (let i = 0; i < source.length; i++) {
      const chunk = source[i];
      if (i !== 0) {
        before = source[i - 1];
      }

      // Does our current chunk needs to be treated as a escaped char?
      const escaped = before === "\\";

      // Detect if we need to start/finish parsing our tags

      // We are not parsing anything, however, that could change at current
      // chunk
      if (currentTag === "" && escaped === false) {
        // If our tags array has the chunk, this means a markdown tag has
        // just been found. We'll change our current state to reflect this.
        if (tags.has(chunk)) {
          currentTag = tags.get(chunk);

          // We have simple content to push
          if (currentContent !== "") {
            pushContent(currentContent, "span");
          }

          currentContent = "";
        }
      } else if (currentTag !== "" && escaped === false) {
        // We'll look if we can finish parsing our tag
        if (tags.has(chunk)) {
          const symbolValue = tags.get(chunk);

          // Just because the current chunk is a symbol it doesn't mean we
          // can already finish our currentTag.
          //
          // We'll need to see if the symbol's value corresponds to the
          // value of our currentTag. In case it does, we'll finish parsing it.
          if (symbolValue === currentTag) {
            pushContent(
              currentContent,
              currentTag,
              undefined, // you could pass props here
            );

            currentTag = "";
            currentContent = "";
          }
        }
      }

      // Increment our currentContent
      //
      // Ideally, we don't want our rendered markdown to contain any '\'
      // or undesired '*' or '_' or '!'.
      //
      // Users can still escape '*', '_', '!' by prefixing them with '\'
      if (tags.has(chunk) === false || escaped) {
        if (chunk !== "\\" || escaped) {
          currentContent += chunk;
        }
      }

      // In case an erroneous, i.e. unfinished tag, is present and the we've
      // reached the end of our source (rawMarkdown), we want to make sure
      // all our currentContent is pushed as a simple string
      if (currentContent !== "" && i === source.length - 1) {
        pushContent(
          currentContent,
          "span",
          undefined,
        );
      }
    }

    return parsedMarkdown;
  }

  render() {
    return (
      <div className="App">
        <div>{this.parseMarkdown(this.props.rawMarkdown)}</div>
      </div>
    );
  }
}

ReactDOM.render(<App rawMarkdown={rawMarkdown.replace(/```/g, "\uFFFF")} />, document.getElementById('app'));

Link to the code (TypeScript) https://codepen.io/ludanin/pen/GRgNWPv

Link to the code (vanilla/babel) https://codepen.io/ludanin/pen/eYmBvXw


var table = {
  "*":{
    "begin":"<strong>",
    "end":"</strong>"
    },
  "_":{
    "begin":"<em>",
    "end":"</em>"
    },
  "!":{
    "begin":"<MyComponent onClick={this.action}>",
    "end":"</MyComponent>"
    },

  };

var myMarkdown = "hello *asdf* *how* _are_ you !doing! today";
var tagFinder = /(?<item>(?<tag_begin>[*|!|_])(?<content>\w+)(?<tag_end>\k<tag_begin>))/gm;

//Use case 1: direct string replacement
var replaced = myMarkdown.replace(tagFinder, replacer);
function replacer(match, whole, tag_begin, content, tag_end, offset, string) {
  return table[tag_begin]["begin"] + content + table[tag_begin]["end"];
}
alert(replaced);

//Use case 2: React components
var pieces = [];
var lastMatchedPosition = 0;
myMarkdown.replace(tagFinder, breaker);
function breaker(match, whole, tag_begin, content, tag_end, offset, string) {
  var piece;
  if (lastMatchedPosition < offset)
  {
    piece = string.substring(lastMatchedPosition, offset);
    pieces.push("\"" + piece + "\"");
  }
  piece = table[tag_begin]["begin"] + content + table[tag_begin]["end"];
  pieces.push(piece);
  lastMatchedPosition = offset + match.length;

}
alert(pieces);

The result: Running result

Regexp test result

Explanation:

/(?<item>(?<tag_begin>[*|!|_])(?<content>\w+)(?<tag_end>\k<tag_begin>))/
  • You can define your tags in this section: [*|!|_], once one of them is matched, it will be captured as a group and named as "tag_begin".

  • And then (?<content>\w+) captures the content wrapped by the tag.

  • The ending tag must be as same as the previously matched one, so here uses \k<tag_begin>, and if it passed the test then capture it as a group and give it a name "tag_end", that's what (?<tag_end>\k<tag_begin>)) is saying.

In the JS you've set up a table like this:

var table = {
  "*":{
    "begin":"<strong>",
    "end":"</strong>"
    },
  "_":{
    "begin":"<em>",
    "end":"</em>"
    },
  "!":{
    "begin":"<MyComponent onClick={this.action}>",
    "end":"</MyComponent>"
    },

  };

Use this table to replace the matched tags.

Sting.replace has an overload String.replace(regexp, function) which can take captured groups as it's parameters, we use these captured items for looking up the table and generate the replacing string.

[Update]
I have updated the code, I kept the first one in case someone else doesn't need react components, and you can see there is little difference between them. React Components