How to set range for limit clause in hive

The LIMIT clause is used to set a ceiling on the number of rows in the result set. You are getting a syntax error because of an incorrect usage of this HQL clause.

The query could be written as the following to return no more than 2000 rows:

SELECT * FROM table LIMIT 2000;

You could also write it like so to return no more than 1000 rows:

SELECT * FROM table LIMIT 1000;

However you cannot combine both into the same argument for LIMIT. The LIMIT argument must evaluate to a constant value.

I will try and expand on this information a bit to try and help solve your problem. If you are attempting to "paginate" your results the following may be of use.

FIRST I would recommend against leaning on HQL for pagination, in most situations that would be more efficiently implemented on the application logic side (query large result set, cache what you need, paginate with application logic). If you have no choice but to pull out ranges of rows you can get the desired effect through a combination of the LIMIT, ORDER BY, and OFFSET clauses.

LIMIT : This will limit your result set to a maximum number of row

ORDER BY: This will sort/order your result set based on one or more columns

OFFSET: This will start your result set at a certain row after the logical first entry in the table.

You may combine these three clauses to effectively query "pages" of your table. For example the following three queries show how to get the first 3 blocks of data from a table where each block contains 1000 rows and the target table's 'column1' is used to determine logical order.

SELECT title as "Page 1", column1, column2, ... FROM table
  ORDER BY column1 LIMIT 1000 OFFSET 0;
SELECT title as "Page 2", column1, column2, ... FROM table
  ORDER BY column1 LIMIT 1000 OFFSET 1000;
SELECT title as "Page 3", column1, column2, ... FROM table
  ORDER BY column1 LIMIT 1000 OFFSET 2000;

Each query declares 'column1' as the sorting value with ORDER BY. The queries will return no more than 1000 rows due to the LIMIT clause. Each result set will start at a different row due to the OFFSET being incremented by the "page size" for each query.


You can use Row_Number window function and set the range limit.

Below Query will result only the first 20 records from the table

   hive> select * from 
        (
        SELECT *,ROW_NUMBER() over (Order by id) as rowid FROM <tab_name>
        )t 
    where rowid > 0 and rowid <=20;

Using Between operator to specify range

 hive> select * from 
            (
            SELECT *,ROW_NUMBER() over (Order by id) as rowid FROM <tab_name>
            )t 
        where rowid between 0 and 20;

To fetch rows from 20 to 40 then increase the lower/upper bound values

  hive> select * from 
            (
            SELECT *,ROW_NUMBER() over (Order by id) as rowid FROM <tab_name>
            )t 
        where rowid > 20 and rowid <=40;   

I am not sure what you are trying to achieve, but ...

That will return the 1001 and the 2001 record in the query results set only if you are using hive a hive version greater than 2.0.0

hive --version

(https://issues.apache.org/jira/browse/HIVE-11531)

Tags:

Hadoop

Hive