Best HashTag Regex

After looking at the previous answers here and making some test tweets to see what Twitter liked, I think I've come up with a solid regular expression that should do the trick. It requires lookaround functionality in the regular expression engine, so it might not work with all engines out there. It should still work fine for .NET and PCRE.

(?:(?<=\s)|^)#(\w*[A-Za-z_]+\w*)

According to RegexBuddy, this does the following: RegexBuddy Create View

And again, according to RegexBuddy, here is what it matches: RegexBuddy Test View

Anything highlighted is part of the match. The darker highlighted part indicates what is returned from the capture.

Edit Dec 2014:
Here's a slightly simplified version from zero323 that should be functionally equivalent:

(?<=\s|^)#(\w*[A-Za-z_]+\w*)

It depends on whether you want to match hashtags inside other strings ("Some#Word") or things that probably aren't hashtags ("We're #1"). The regex you gave #\w+ will match in both these cases. If you slightly modify your regex to \B#\w\w+, you can eliminate these cases and only match hashtags of length greater than 1 on word boundaries.


If you are pulling statuses containing hashtags from Twitter, you no longer need to find them yourself. You can now specify the include_entities parameter to have Twitter automatically call out mentions, links, and hashtags.

For example, take the following call to statuses/show:

http://api.twitter.com/1/statuses/show/60183527282577408.json?include_entities=true

In the resultant JSON, notice the entities object.

"entities":{"urls":[{"expanded_url":null,"indices":[68,88],"url":"http:\/\/bit.ly\/gWZmaJ"}],"user_mentions":[],"hashtags":[{"text":"wordpress","indices":[89,99]}]}

You can use the above to locate the specific entities in the tweet (which occur between the string positions denoted by the indices property) and transform them appropriately.

If you just need the regular expression to locate the hashtags, Twitter provides these in an open source library.

Hashtag Match Pattern

(^|[^&\p{L}\p{M}\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7])(#|\uFF03)(?!\uFE0F|\u20E3)([\p{L}\p{M}\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7]*[\p{L}\p{M}][\p{L}\p{M}\p{Nd}_\u200c\u200d\ua67e\u05be\u05f3\u05f4\u309b\u309c\u30a0\u30fb\u3003\u0f0b\u0f0c\u00b7]*)

The above pattern can be pieced together from this java file (retrieved 2015-11-23). Validation tests for this pattern are located in this file around line 128.


I tweeted a string with randomly placed hash tags, saw what Twitter did with it, and then tried to match it with a regular expression. Here's what I got:

\B#\w*[a-zA-Z]+\w*

#face #Fa!ce something #iam#1 #1 #919 #jifdosaj somethin#idfsjoa 9#9#98 9#9f9j#9jlasdjl #jklfdsajl34 #34239 #jkf #a *#1j3rj3