Remove duplicate words from a sentence

JavaScript (ES6), 98

Note while I found it myself, it's annoyingly similar to @Neil's, just with the additional logic to split the whole input string in sentences.

s=>s.replace(/[^\n.!?]+/g,s=>s.replace(/ *([a-z]+)/ig,(r,w)=>k[w=w.toUpperCase()]?'':k[w]=r,k=[]))

Test

f=s=>s.replace(/[^\n.!?]+/g,s=>s.replace(/ *([a-z]+)/ig,(r,w)=>k[w=w.toUpperCase()]?'':k[w]=r,k=[]))

console.log=x=>O.textContent+=x+'\n'

;[['Hello Hello, World!','Hello, World!']
,['Code Code! Golf Code','Code! Golf Code']
,['Hello  hello   World','Hello   World']
,['Programming Golf Programming!','Programming Golf!']]
.forEach(t=>{
  var i=t[0],k=t[1],r=f(i)
  console.log((r==k?'OK ':'KO ')+i+' -> '+r)
})

<pre id=O></pre>

Retina, 66 46 bytes

Byte count assumes ISO 8859-1 encoding.

i`[a-z]+
·$0·
i` *(·[a-z]+·)(?<=\1[^.!?¶]+)|·

Try it online!

Explanation

Since only letters should be considered word characters (but regex treats digits and underscores as word characters, too), we need to make our own word boundaries. Since the input is guaranteed to contain only ASCII characters, I'm inserting · (outside of ASCII, but inside ISO 8859-1) around all words and remove them again with the duplicates. That saves 20 bytes over using lookarounds to implement generic word boundaries.

i`[a-z]+
·$0·

This matches every word and surrounds it in ·.

i` *(·[a-z]+·)(?<=\1[^.!?¶]+)|·

This is two steps compressed into one. <sp>*(·[a-z]+·)(?<=\1[^.!?¶]+) matches a full word (ensured by including the · in the match), along with any spaces preceding it, provided that (as ensured by the lookbehind) we can find the same word somewhere earlier in the sentence. (The ¶ matches a linefeed.)

The other part is simply the ·, which matches all artificial word boundaries that weren't matched as part of the first half. In either case, the match is simply removed from the string.

C, 326 bytes

Who needs regular expressions?

#include <ctype.h>
#define a isalpha
#define c(x)*x&&!strchr(".?!\n",*x)
#define f(x)for(n=e;*x&&!a(*x);++x);
main(p,v,n,e,o,t)char**v,*p,*n,*e,*o,*t;{for(p=v[1];*p;p=e){f(p)for(e=p;c(e);){for(;a(*++e););f(n)if(c(n)){for(o=p,t=n;a(*o)&&(*o-65)%32==(*t-65)%32;o++,t++);if(a(*t))e=n;else memmove(e,t,strlen(t)+1);}}}puts(v[1]);}

Remove duplicate words from a sentence

JavaScript (ES6), 98

Retina, 66 46 bytes

Explanation

C, 326 bytes

Tags:

String

Code Golf

Related

Recent Posts