Word Count program in Hive

If you want a simple one see the following:

SELECT word, COUNT(*) FROM input LATERAL VIEW explode(split(text, ' ')) lTable as word GROUP BY word;

I use a lateral view to enable the use of a table valued function (explode) which takes the list that comes out of split function and outputs a new row for every value. In practice I use a UDF that wraps IBM's ICU4J word breaker. I generally don't use transform scripts and use UDFs for everything. You don't need a temporary words table.


CREATE TABLE docs (line STRING);
LOAD DATA INPATH 'text' OVERWRITE INTO TABLE docs;
CREATE TABLE word_counts AS
SELECT word, count(1) AS count FROM
(SELECT explode(split(line, '\s')) AS word FROM docs) w
GROUP BY word
ORDER BY word;

Tags:

Mapreduce

Hive