Analyze queries in Hive

Before running a query Hive Cost-Based Optimizer (CBO) creates an execution plan — a sequence of tasks that have to be completed in order to fetch the requested data in the most efficient manner. Using Hive EXPLAIN and ANALYZE commands, you can view execution plans and get detailed statistics about Hive execution to optimize your Hive queries.

Test database

Throughout this article, the following Hive tables transactions and accounts are used to demonstrate usage examples.

SELECT * FROM transactions;
+----------------------+----------------------+--------------------------+------------------------+
| transactions.txn_id  | transactions.acc_id  | transactions.txn_amount  | transactions.txn_date  |
+----------------------+----------------------+--------------------------+------------------------+
| 1                    | 1002                 | 10.00                    | 2023-01-01             |
| 8                    | 1003                 | 50.00                    | 2023-01-01             |
| 3                    | 1002                 | 30.00                    | 2023-01-02             |
| 4                    | 1001                 | 100.50                   | 2023-01-02             |
| 2                    | 1002                 | 20.00                    | 2023-01-03             |
| 6                    | 1001                 | 200.50                   | 2023-01-03             |
| 7                    | 1003                 | 50.00                    | 2023-01-03             |
| 5                    | 1001                 | 150.50                   | 2023-01-04             |
| 9                    | 1003                 | 75.00                    | 2023-01-04             |
+----------------------+----------------------+--------------------------+------------------------+

SELECT * FROM accounts;
+--------------+---------------------+
| accounts.id  | accounts.full_name  |
+--------------+---------------------+
| 1001         | John Smith          |
| 1002         | Sarah Connor        |
| 1003         | Rick Sanchez        |
+--------------+---------------------+
SQL for creating test table

 
The following SQL can be used to create and populate the test Hive tables using /bin/beeline.

CREATE DATABASE IF NOT EXISTS explain_demo;
USE explain_demo;

DROP TABLE IF EXISTS transactions;
CREATE TABLE transactions(txn_id int, acc_id int, txn_amount decimal(10,2)) PARTITIONED BY (txn_date date);
INSERT INTO transactions VALUES
(1, 1002, 10.00, '2023-01-01'),
(2, 1002, 20.00, '2023-01-03'),
(3, 1002, 30.00, '2023-01-02'),
(4, 1001, 100.50, '2023-01-02'),
(5, 1001, 150.50, '2023-01-04'),
(6, 1001, 200.50, '2023-01-03'),
(7, 1003, 50.00, '2023-01-03'),
(8, 1003, 50.00, '2023-01-01'),
(9, 1003, 75.00, '2023-01-04');

DROP TABLE IF EXISTS accounts;
CREATE TABLE accounts(id int, full_name string);
INSERT INTO accounts VALUES
(1001, 'John Smith'),
(1002, 'Sarah Connor'),
(1003, 'Rick Sanchez');

EXPLAIN

The EXPLAIN command shows the execution plan for a given Hive query. The result includes a detailed breakdown of how Hive plans to execute the query.

The general syntax is as follows:

EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION|VECTORIZATION|FORMATTED] <query>

Where:

  • EXTENDED — returns extended information about the execution plan.

  • DEPENDENCY — provides details about tables and partitions that the query depends on.

  • AUTHORIZATION — returns information about all entities that have to be authorized, including any permission issues (if any).

  • VECTORIZATION — returns information about non-vectorized Map/Reduce stages.

  • FORMATTED — formats the output as JSON.

  • <query> — the query to analyze.

EXPLAIN

At its basic, EXPLAIN returns general information about the query execution plan.

CREATE TABLE IF NOT EXISTS top_txns_per_acc(acc_id int, max_txn_value decimal(10,2));

EXPLAIN
    FROM transactions INSERT OVERWRITE TABLE top_txns_per_acc
    SELECT acc_id, MAX(txn_amount) GROUP BY acc_id;

The result:

+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| Plan optimized by CBO.                             |
|                                                    |
| Vertex dependency in root stage                    |
| Reducer 2 <- Map 1 (SIMPLE_EDGE)                   |
| Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE)        |
|                                                    |
| Stage-3                                            |
|   Stats Work{}                                     |
|     Stage-0                                        |
|       Move Operator                                |
|         table:{"name:":"explain_demo.top_txns_per_acc"} |
|         Stage-2                                    |
|           Dependency Collection{}                  |
|             Stage-1                                |
|               Reducer 3                            |
|               File Output Operator [FS_13]         |
|                 Group By Operator [GBY_11] (rows=1 width=552) |
|                   Output:["_col0","_col1"],aggregations:["compute_stats(VALUE._col0)","compute_stats(VALUE._col1)"] |
|                 <-Reducer 2 [CUSTOM_SIMPLE_EDGE]   |
|                   File Output Operator [FS_6]      |
|                     table:{"name:":"explain_demo.top_txns_per_acc"} |
|                     Group By Operator [GBY_4] (rows=4 width=12) |
|                       Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 |
|                     <-Map 1 [SIMPLE_EDGE] vectorized |
|                       SHUFFLE [RS_16]              |
|                         PartitionCols:_col0        |
|                         Group By Operator [GBY_15] (rows=9 width=12) |
|                           Output:["_col0","_col1"],aggregations:["max(txn_amount)"],keys:acc_id |
|                           Select Operator [SEL_14] (rows=9 width=12) |
|                             Output:["acc_id","txn_amount"] |
|                             TableScan [TS_0] (rows=9 width=12) |
|                               explain_demo@transactions,transactions,Tbl:COMPLETE,Col:NONE,Output:["acc_id","txn_amount"] |
|                   PARTITION_ONLY_SHUFFLE [RS_10]   |
|                     Group By Operator [GBY_9] (rows=1 width=536) |
|                       Output:["_col0","_col1"],aggregations:["compute_stats(acc_id, 'hll')","compute_stats(max_txn_value, 'hll')"] |
|                       Select Operator [SEL_8] (rows=4 width=12) |
|                         Output:["acc_id","max_txn_value"] |
|                          Please refer to the previous Group By Operator [GBY_4] |
|                                                    |
+----------------------------------------------------+

The EXPLAIN output may significantly vary depending on the actual query, however, the major information blocks are the following:

  • Stage dependencies. Includes a list of dependent stages that make up the entire execution plan. Each stage can be a Map/Reduce job, an HDFS I/O operation, a Metastore access action, etc.

  • Stage plans. Each block describes the details of individual stage, including involved operators, sort orders, etc.

EXPLAIN EXTENDED

The EXPLAIN EXTENDED command outputs more details about Hive execution plans. The output includes additional information about HDFS I/O operations, data formats, bucketing, advanced statistics, and so on.

EXPLAIN EXTENDED
    SELECT * FROM transactions
    WHERE txn_amount > 100;
EXPLAIN EXTENDED sample output
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| STAGE DEPENDENCIES:                                |
|   Stage-0 is a root stage                          |
|                                                    |
| STAGE PLANS:                                       |
|   Stage: Stage-0                                   |
|     Fetch Operator                                 |
|       limit: -1                                    |
|       Partition Description:                       |
|           Partition                                |
|             input format: org.apache.hadoop.mapred.TextInputFormat |
|             output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|             partition values:                      |
|               txn_date 2023-01-01                  |
|             properties:                            |
|               COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"acc_id":"true","txn_amount":"true","txn_id":"true"}} |
|               bucket_count -1                      |
|               column.name.delimiter ,              |
|               columns txn_id,acc_id,txn_amount     |
|               columns.comments                     |
|               columns.types int:int:decimal(10,2)  |
|               file.inputformat org.apache.hadoop.mapred.TextInputFormat |
|               file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|               location hdfs://adh/apps/hive/warehouse/explain_demo.db/transactions/txn_date=2023-01-01 |
|               name explain_demo.transactions       |
|               numFiles 1                           |
|               numRows 2                            |
|               partition_columns txn_date           |
|               partition_columns.types date         |
|               rawDataSize 24                       |
|               serialization.ddl struct transactions { i32 txn_id, i32 acc_id, decimal(10,2) txn_amount} |
|               serialization.format 1               |
|               serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|               totalSize 26                         |
|               transient_lastDdlTime 1702847838     |
|             serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|                                                    |
|               input format: org.apache.hadoop.mapred.TextInputFormat |
|               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|               properties:                          |
|                 bucket_count -1                    |
|                 bucketing_version 2                |
|                 column.name.delimiter ,            |
|                 columns txn_id,acc_id,txn_amount   |
|                 columns.comments                   |
|                 columns.types int:int:decimal(10,2) |
|                 file.inputformat org.apache.hadoop.mapred.TextInputFormat |
|                 file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|                 location hdfs://adh/apps/hive/warehouse/explain_demo.db/transactions |
|                 name explain_demo.transactions     |
|                 partition_columns txn_date         |
|                 partition_columns.types date       |
|                 serialization.ddl struct transactions { i32 txn_id, i32 acc_id, decimal(10,2) txn_amount} |
|                 serialization.format 1             |
|                 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|                 transient_lastDdlTime 1702846232   |
|               serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|               name: explain_demo.transactions      |
|             name: explain_demo.transactions        |
|           Partition                                |
|             input format: org.apache.hadoop.mapred.TextInputFormat |
|             output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|             partition values:                      |
|               txn_date 2023-01-02                  |
|             properties:                            |
|               COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"acc_id":"true","txn_amount":"true","txn_id":"true"}} |
|               bucket_count -1                      |
|               column.name.delimiter ,              |
|               columns txn_id,acc_id,txn_amount     |
|               columns.comments                     |
|               columns.types int:int:decimal(10,2)  |
|               file.inputformat org.apache.hadoop.mapred.TextInputFormat |
|               file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|               location hdfs://adh/apps/hive/warehouse/explain_demo.db/transactions/txn_date=2023-01-02 |
|               name explain_demo.transactions       |
|               numFiles 1                           |
|               numRows 2                            |
|               partition_columns txn_date           |
|               partition_columns.types date         |
|               rawDataSize 25                       |
|               serialization.ddl struct transactions { i32 txn_id, i32 acc_id, decimal(10,2) txn_amount} |
|               serialization.format 1               |
|               serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|               totalSize 27                         |
|               transient_lastDdlTime 1702847838     |
|             serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|                                                    |
|               input format: org.apache.hadoop.mapred.TextInputFormat |
|               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|               properties:                          |
|                 bucket_count -1                    |
|                 bucketing_version 2                |
|                 column.name.delimiter ,            |
|                 columns txn_id,acc_id,txn_amount   |
|                 columns.comments                   |
|                 columns.types int:int:decimal(10,2) |
|                 file.inputformat org.apache.hadoop.mapred.TextInputFormat |
|                 file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|                 location hdfs://adh/apps/hive/warehouse/explain_demo.db/transactions |
|                 name explain_demo.transactions     |
|                 partition_columns txn_date         |
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
|                 partition_columns.types date       |
|                 serialization.ddl struct transactions { i32 txn_id, i32 acc_id, decimal(10,2) txn_amount} |
|                 serialization.format 1             |
|                 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|                 transient_lastDdlTime 1702846232   |
|               serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|               name: explain_demo.transactions      |
|             name: explain_demo.transactions        |
|           Partition                                |
|             input format: org.apache.hadoop.mapred.TextInputFormat |
|             output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|             partition values:                      |
|               txn_date 2023-01-03                  |
|             properties:                            |
|               COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"acc_id":"true","txn_amount":"true","txn_id":"true"}} |
|               bucket_count -1                      |
|               column.name.delimiter ,              |
|               columns txn_id,acc_id,txn_amount     |
|               columns.comments                     |
|               columns.types int:int:decimal(10,2)  |
|               file.inputformat org.apache.hadoop.mapred.TextInputFormat |
|               file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|               location hdfs://adh/apps/hive/warehouse/explain_demo.db/transactions/txn_date=2023-01-03 |
|               name explain_demo.transactions       |
|               numFiles 1                           |
|               numRows 3                            |
|               partition_columns txn_date           |
|               partition_columns.types date         |
|               rawDataSize 37                       |
|               serialization.ddl struct transactions { i32 txn_id, i32 acc_id, decimal(10,2) txn_amount} |
|               serialization.format 1               |
|               serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|               totalSize 40                         |
|               transient_lastDdlTime 1702847838     |
|             serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|                                                    |
|               input format: org.apache.hadoop.mapred.TextInputFormat |
|               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|               properties:                          |
|                 bucket_count -1                    |
|                 bucketing_version 2                |
|                 column.name.delimiter ,            |
|                 columns txn_id,acc_id,txn_amount   |
|                 columns.comments                   |
|                 columns.types int:int:decimal(10,2) |
|                 file.inputformat org.apache.hadoop.mapred.TextInputFormat |
|                 file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|                 location hdfs://adh/apps/hive/warehouse/explain_demo.db/transactions |
|                 name explain_demo.transactions     |
|                 partition_columns txn_date         |
|                 partition_columns.types date       |
|                 serialization.ddl struct transactions { i32 txn_id, i32 acc_id, decimal(10,2) txn_amount} |
|                 serialization.format 1             |
|                 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|                 transient_lastDdlTime 1702846232   |
|               serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|               name: explain_demo.transactions      |
|             name: explain_demo.transactions        |
|           Partition                                |
|             input format: org.apache.hadoop.mapred.TextInputFormat |
|             output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|             partition values:                      |
|               txn_date 2023-01-04                  |
|             properties:                            |
|               COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"acc_id":"true","txn_amount":"true","txn_id":"true"}} |
|               bucket_count -1                      |
|               column.name.delimiter ,              |
|               columns txn_id,acc_id,txn_amount     |
|               columns.comments                     |
|               columns.types int:int:decimal(10,2)  |
|               file.inputformat org.apache.hadoop.mapred.TextInputFormat |
|               file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|               location hdfs://adh/apps/hive/warehouse/explain_demo.db/transactions/txn_date=2023-01-04 |
|               name explain_demo.transactions       |
|               numFiles 1                           |
|               numRows 2                            |
|               partition_columns txn_date           |
|               partition_columns.types date         |
|               rawDataSize 25                       |
|               serialization.ddl struct transactions { i32 txn_id, i32 acc_id, decimal(10,2) txn_amount} |
|               serialization.format 1               |
|               serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|               totalSize 27                         |
|               transient_lastDdlTime 1702847838     |
|             serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|                                                    |
|               input format: org.apache.hadoop.mapred.TextInputFormat |
|               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|               properties:                          |
|                 bucket_count -1                    |
|                 bucketing_version 2                |
|                 column.name.delimiter ,            |
|                 columns txn_id,acc_id,txn_amount   |
|                 columns.comments                   |
|                 columns.types int:int:decimal(10,2) |
|                 file.inputformat org.apache.hadoop.mapred.TextInputFormat |
|                 file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
|                 location hdfs://adh/apps/hive/warehouse/explain_demo.db/transactions |
|                 name explain_demo.transactions     |
|                 partition_columns txn_date         |
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
|                 partition_columns.types date       |
|                 serialization.ddl struct transactions { i32 txn_id, i32 acc_id, decimal(10,2) txn_amount} |
|                 serialization.format 1             |
|                 serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|                 transient_lastDdlTime 1702846232   |
|               serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
|               name: explain_demo.transactions      |
|             name: explain_demo.transactions        |
|       Processor Tree:                              |
|         TableScan                                  |
|           alias: transactions                      |
|           GatherStats: false                       |
|           Filter Operator                          |
|             isSamplingPred: false                  |
|             predicate: (txn_amount > 100) (type: boolean) |
|             Select Operator                        |
|               expressions: txn_id (type: int), acc_id (type: int), txn_amount (type: decimal(10,2)), txn_date (type: date) |
|               outputColumnNames: _col0, _col1, _col2, _col3 |
|               ListSink                             |
|                                                    |
+----------------------------------------------------+

EXPLAIN DEPENDENCY

The EXPLAIN DEPENDENCY command returns extra information about all the input sources (Hive tables, partitions, HDFS files) that will be accessed in order to fetch the requested data.

EXPLAIN DEPENDENCY
    SELECT t.txn_id, a.full_name
    FROM transactions t
    JOIN accounts a ON (t.acc_id=a.id)

The output:

+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| {"input_tables":[{"tablename":"explain_demo@transactions","tabletype":"MANAGED_TABLE"},{"tablename":"explain_demo@accounts","tabletype":"MANAGED_TABLE"}],"input_partitions":[{"partitionName":"explain_demo@transactions@txn_date=2023-01-01"},{"partitionName":"explain_demo@transactions@txn_date=2023-01-02"},{"partitionName":"explain_demo@transactions@txn_date=2023-01-03"},{"partitionName":"explain_demo@transactions@txn_date=2023-01-04"}]} |
+----------------------------------------------------+

EXPLAIN AUTHORIZATION

Using EXPLAIN AUTHORIZATION, you can get authorization information for a query, including permission issues (if any).

For example:

EXPLAIN AUTHORIZATION
    SELECT a.full_name, t.txn_date, t.txn_id from accounts a
    JOIN transactions t ON (a.id=t.acc_id);

The result:

+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| INPUTS:                                            |
|   explain_demo@accounts                            |
|   explain_demo@transactions                        |
|   explain_demo@transactions@txn_date=2023-01-01    |
|   explain_demo@transactions@txn_date=2023-01-02    |
|   explain_demo@transactions@txn_date=2023-01-03    |
|   explain_demo@transactions@txn_date=2023-01-04    |
| OUTPUTS:                                           |
|   hdfs://adh/tmp/hive/hive/de6b68f3-0c62-40e9-84ac-8e34916558c9/hive_2023-12-18_09-26-13_342_1170854594803480484-104/-mr-10001 |
| CURRENT_USER:                                      |
|   hive                                             |
| OPERATION:                                         |
|   QUERY                                            |
+----------------------------------------------------+

EXPLAIN VECTORIZATION

The VECTORIZATION keyword adds details to the EXPLAIN output indicating why specific Map/Reduce jobs are not vectorized (if any).

The syntax is as follows:

EXPLAIN VECTORIZATION [ONLY] [SUMMARY|OPERATOR|EXPRESSION|DETAIL]

Where:

  • ONLY — suppresses most non-vectorization elements.

  • SUMMARY — shows if vectorization is enabled and adds a summary of Map and Reduce work.

  • OPERATOR — shows vectorization information for operators.

  • EXPRESSION — shows vectorization information for expressions.

  • DETAIL — the most detailed information level. Includes all of the above.

Collect statistics with ANALYZE

When processing a query, Hive CBO optimizer makes extensive use of Hive statistics to choose the optimal query plan in terms of system resources. This statistics (number of rows, number of files, raw data size) is stored in Hive Metastore and by default is re-generated automatically when new data is inserted into the table (hive.stats.autogather=true).

Compute statistics

To collect statistics for a given table or a partition at any time, use the ANALYZE command. This command computes the statistics on demand and writes it to Hive Metastore. Hive supports statistics at the table, partition, and column level.

The syntax is as follows:

ANALYZE TABLE <tbl_name> [PARTITION(<partition>)] COMPUTE STATISTICS
[FOR COLUMNS c1, c2, ...]
[NOSCAN];

Where:

  • FOR COLUMNS — computes column statistics either for all table columns (if no columns are specified) or only for specific columns of the table.

  • NOSCAN — if specified, the command does not scan all the files. Instead, the statistics will include the number of bytes and physical size in bytes.

NOTE
By default, statistics gathering is enabled for newly created tables/partitions, so there is no need of computing the statistics manually unless you changed the default hive.stats.autogather=true property.

Once the statistics is written to Hive Metastore, you can view the stored statistics by issuing the DESCRIBE command. The syntax is as follows:

DESCRIBE [EXTENDED|FORMATTED] <tbl_name> [PARTITION(<partition>)]

The following example computes and shows Hive statistics for entire table:

ANALYZE TABLE transactions COMPUTE STATISTICS;
DESCRIBE EXTENDED transactions;

The output:

+-----------------------------+----------------------------------------------------+----------+
|          col_name           |                     data_type                      | comment  |
+-----------------------------+----------------------------------------------------+----------+
| txn_id                      | int                                                |          |
| acc_id                      | int                                                |          |
| txn_amount                  | decimal(10,2)                                      |          |
| txn_date                    | date                                               |          |
|                             | NULL                                               | NULL     |
| # Partition Information     | NULL                                               | NULL     |
| # col_name                  | data_type                                          | comment  |
| txn_date                    | date                                               |          |
|                             | NULL                                               | NULL     |
| Detailed Table Information  | Table(tableName:transactions, dbName:explain_demo, owner:hive, createTime:1702839122, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:txn_id, type:int, comment:null), FieldSchema(name:acc_id, type:int, comment:null), FieldSchema(name:txn_amount, type:decimal(10,2), comment:null), FieldSchema(name:txn_date, type:date, comment:null)], location:hdfs://adh/apps/hive/warehouse/transactions, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[FieldSchema(name:txn_date, type:date, comment:null)], parameters:{totalSize=120, numRows=9, rawDataSize=111, COLUMN_STATS_ACCURATE={\"BASIC_STATS\":\"true\"}, numFiles=4, numPartitions=4, transient_lastDdlTime=1702839122, bucketing_version=2}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER) |          |
+-----------------------------+----------------------------------------------------+----------+

To gather statistics for a specific partition, consider the example below:

ANALYZE TABLE transactions PARTITION(txn_date='2023-01-03') COMPUTE STATISTICS;
DESCRIBE EXTENDED transactions PARTITION(txn_date='2023-01-03');

The partition statistics is present in the Detailed Partition Information block.

+---------------------------------+----------------------------------------------------+----------+
|            col_name             |                     data_type                      | comment  |
+---------------------------------+----------------------------------------------------+----------+
| txn_id                          | int                                                |          |
| acc_id                          | int                                                |          |
| txn_amount                      | decimal(10,2)                                      |          |
| txn_date                        | date                                               |          |
|                                 | NULL                                               | NULL     |
| # Partition Information         | NULL                                               | NULL     |
| # col_name                      | data_type                                          | comment  |
| txn_date                        | date                                               |          |
|                                 | NULL                                               | NULL     |
| Detailed Partition Information  | Partition(values:[2023-01-03], dbName:explain_demo, tableName:transactions, createTime:0, lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:txn_id, type:int, comment:null), FieldSchema(name:acc_id, type:int, comment:null), FieldSchema(name:txn_amount, type:decimal(10,2), comment:null), FieldSchema(name:txn_date, type:date, comment:null)], location:hdfs://adh/apps/hive/warehouse/transactions/txn_date=2023-01-03, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:{totalSize=40, numRows=3, rawDataSize=37, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true","COLUMN_STATS":{"acc_id":"true","txn_amount":"true","txn_id":"true"}}, numFiles=1, transient_lastDdlTime=1702844180}, catName:hive) |          |
+---------------------------------+----------------------------------------------------+----------+
Found a mistake? Seleсt text and press Ctrl+Enter to report it