Split string into array without deleting delimiter?

Try clean-split:

const cleanSplit = require("clean-split");

cleanSplit("a-b-c", "-");
//=> ["a", "-", "b", "-", "c"]

cleanSplit("a-b-c", "-", { anchor: "before" });
//=> ["a-", "b-", "c"]

cleanSplit("a-b-c", "-", { anchor: "after" });
//=> ["a", "-b", "-c"]

Under the hood, it uses logic adapted from:

  • Kai's non-anchored splitting regex
  • Amadan's positive lookahead regex
  • The positive lookbehind regex which was added in ES2018
  • escape-string-regexp to smoothen things out.

In your case, you can do something like this:

const cleanSplit = require("clean-split");

cleanSplit("asdf a  b c2 ", " ");
//=> ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

Instead of splitting, it might be easier to think of this as extracting strings comprising either the delimiter or consecutive characters that are not the delimiter:

'asdf a  b c2 '.match(/\S+|\s/g)
// result: ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]
'asdf a  b. . c2% * '.match(/\S+|\s/g)
// result: ["asdf", " ", "a", " ", " ", "b.", " ", ".", " ", "c2%", " ", "*", " "]

A more Shakespearean definition of the matches would be:

'asdf a  b c2 '.match(/ |[^ ]+/g)

To or (not to )+.


I'm surprised no one has mentioned this yet, but I'll post this here for the sake of completeness. If you have capturing groups in your expression, then .split will include the captured substring as a separate entry in the result array:

"asdf a  b c2 ".split(/( )/)  // or /(\s)/
// ["asdf", " ", "a", " ", "", " ", "b", " ", "c2", " ", ""]

Note, this is not exactly the same as the desired output you specified, as it includes an empty string between the two contiguous spaces and after the last space.

If necessary, you can filter out all empty strings from the result array like this:

"asdf a  b c2 ".split(/( )/).filter(String)
// ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

However, if this is what you're looking for, I'd probably recommend you go with @Jack's solution.


Use positive lookahead:

"asdf a  b c2 ".split(/(?= )/)
// => ["asdf", " a", " ", " b", " c2", " "]

Post-edit EDIT: As I said in comments, the lack of lookbehind makes this a bit trickier. If all the words only consist of letters, you can fake lookbehind using \b word boundary matcher:

"asdf a  b c2 ".split(/(?= )|\b/)
// => ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

but as soon as you get some punctuation in, it breaks down, since it does not only break on spaces:

"asdf-eif.b".split(/(?= )|\b/)
// => ["asdf", "-", "eif", ".", "b"]

If you do have non-letters you don't want to break on, then I will also suggest a postprocessing method.

Post-think EDIT: This is based on JamesA's original idea, but refined to not use jQuery, and to correctly split:

function chop(str) {
  var result = [];
  var pastFirst = false;
  str.split(' ').forEach(function(x) {
    if (pastFirst) result.push(' ');
    if (x.length) result.push(x);
    pastFirst = true;
  });
  return result;
}
chop("asdf a  b c2 ")
// => ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]