Hive query to quickly find table size (number of rows)

Here is the quick command

ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan];

For Example,If table is partitioned

 hive> ANALYZE TABLE ops_bc_log PARTITION(day) COMPUTE STATISTICS noscan;

output is

Partition logdata.ops_bc_log{day=20140523} stats: [numFiles=37, numRows=26095186, totalSize=654249957, rawDataSize=58080809507]

Partition logdata.ops_bc_log{day=20140521} stats: [numFiles=30, numRows=21363807, totalSize=564014889, rawDataSize=47556570705]

Partition logdata.ops_bc_log{day=20140524} stats: [numFiles=35, numRows=25210367, totalSize=631424507, rawDataSize=56083164109]

Partition logdata.ops_bc_log{day=20140522} stats: [numFiles=37, numRows=26295075, totalSize=657113440, rawDataSize=58496087068]


tblproperties will give the size of the table and can be used to grab just that value if needed.

-- gives all properties
show tblproperties yourTableName

-- show just the raw data size
show tblproperties yourTableName("rawDataSize")

Tags:

Hadoop

Hive