Retrieve all hashtags from a tweet in a PHP function

Don't forget about hashtags that contain unicode, numeric values and underscores:

$tweet = "Valid hashtags include: #hashtag #NYC2016 #NYC_2016 #gøypålandet!";

preg_match_all('/#([\p{Pc}\p{N}\p{L}\p{Mn}]+)/u', $tweet, $matches);

print_r( $matches );

\p{Pc} - to match underscore

\p{N} - numeric character in any script

\p{L} - letter from any language

\p{Mn} - any non marking space (accents, umlauts, etc)


I created my own solution. It does:

  • Finds all hashtags in a string
  • Removes duplicate ones
  • Sorts hashtags regarding to count of the existence in text
  • Supports unicode characters

    function getHashtags($string) {  
        $hashtags= FALSE;  
        preg_match_all("/(#\w+)/u", $string, $matches);  
        if ($matches) {
            $hashtagsArray = array_count_values($matches[0]);
            $hashtags = array_keys($hashtagsArray);
        }
        return $hashtags;
    }
    

Output is like this:

(
    [0] => #_ƒOllOw_
    [1] => #FF
    [2] => #neslitükendi
    [3] => #F_0_L_L_O_W_
    [4] => #takipedeğerdost
    [5] => #GönüldenTakipleşiyorum
)

$tweet = "this has a #hashtag a  #badhash-tag and a #goodhash_tag";

preg_match_all("/(#\w+)/", $tweet, $matches);

var_dump( $matches );

*Dashes are illegal chars for hashtags, underscores are allowed.

Tags:

Php

Regex

Twitter