Does Hive have a String split function?

There does exist a split function based on regular expressions. It's not listed in the tutorial, but it is listed on the language manual on the wiki:

split(string str, string pat)
   Split str around pat (pat is a regular expression) 

In your case, the delimiter "|" has a special meaning as a regular expression, so it should be referred to as "\\|".


Just a clarification on the answer given by Bkkbrad.

I tried this suggestion and it did not work for me.

For example,

split('aa|bb','\\|')

produced:

["","a","a","|","b","b",""]

But,

split('aa|bb','[|]')

produced the desired result:

["aa","bb"]

Including the metacharacter '|' inside the square brackets causes it to be interpreted literally, as intended, rather than as a metacharacter.

For elaboration of this behaviour of regexp, see: http://www.regular-expressions.info/charclass.html


Another interesting usecase for split in Hive is when, for example, a column ipname in the table has a value "abc11.def.ghft.com" and you want to pull "abc11" out:

SELECT split(ipname,'[\.]')[0] FROM tablename;

Tags:

Hadoop

Hive