Use impala-shell to connect to Impala

You can connect to Impala through the shell tool — impala-shell. Impala shell is available on hosts where you install the Impala Client component. The tool can connect remotely to any instance of the Impala Daemon.

Impala shell allows you to create and manage databases and tables, insert data, and issue queries. You can also submit SQL statements in an interactive session. It is possible to specify command-line options to process a single statement or a script file that contains a large number of statements.

In addition, you can store a set of default settings for impala-shell in a configuration file.

Run impala-shell

To run the impala-shell, set the IMPALA_SHELL_HOME environment variable to path to impala-shell:

$ export IMPALA_SHELL_HOME=/usr/lib/impala/impala-shell

$ impala-shell

To execute queries, you need to connect to an impalad daemon. The impala-shell command connects to an impalad running on the same machine. For information on how to connect to an impalad daemon on another host, see Connect to impalad on a specific host.

NOTE
If the following error occurs: ImportError: no module named pkg_resources, install the python-setuptools package. To do this, use the command: yum install python-setuptools.

Configuration options

You can use configuration options to tune impala-shell for your tasks. The table below lists options that can be passed as a command line parameter or defined in a configuration file.

Impala shell configuration options
Command-line option Configuration file setting Description

-B or --delimited

write_delimited=true

Prints query results in plain format as a delimited text file. It is useful for creating data files to be used with other Hadoop components. It also helps to avoid the performance overhead of the pretty-printing output, when running benchmark tests that use queries returning large result sets. To specify the delimiter character, use the --output_delimiter option. To define a file name, utilize the -o or --output_file option

-b or --kerberos_host_fqdn

kerberos_host_fqdn=load-balancer-hostname

Overrides the hostname of the Impala daemon’s Kerberos service principal. impala-shell checks whether the server’s principal matches this hostname. It can be used when impalad is accessed via a load-balancer, but impala-shell should connect to a specific impalad directly

--print_header

print_header=true

Determines whether the header row should be printed

-o filename or --output_file filename

output_file=filename

Specifies an output file name. It is typically used to store the results of a single query executed from the command line with the -q option. It also works for interactive sessions — the messages such as number of rows fetched are displayed but not the actual result set. To suppress these messages (when the -q and -o options are combined), redirect stderr to /dev/null

--output_delimiter=character

output_delimiter=character

Specifies the character to use as a delimiter between fields when query results are printed in plain format (the -B option). The default delimiter is tab (\t). If an output value contains the delimiter character, that field can be quoted, or escaped by doubling quotation marks, or both

-E or --vertical

vertical=true

Prints the output of a query (rows) vertically. This option has no effects if -B is used

-p or --show_profiles

show_profiles=true

Displays the query execution plan (the same output as the EXPLAIN statement) and more detailed execution steps

-h or --help

 — 

Displays help information

 — 

history_max=1000

Sets the maximum number of queries executed in impala-shell to be stored to the history file

-i hostname or

--impalad=hostname[:port_num]

impalad=hostname[:port_num]

Connects to the impalad daemon installed on the specified host. If the port_number parameter is not defined, the default value 21050 is used. This option allows you to connect to any host in the cluster. If you access an impalad instance that is running on a different port (the fe_port parameter is changed), add this port to the option parameter

-q query or --query=query

query=query

Allows you to issue a single query from the command line, without starting the interactive interpreter. You can use this option to run impala-shell from a shell script or when the command is invoked from a Python, Perl, or other scripts. The query should contain a single statement, for example, SELECT, CREATE TABLE, SHOW TABLES, or any other statement recognized by impala-shell. Since you cannot pass the USE statement and another query, fully qualify the names for any table that does not belong to the default database. Alternatively, use the -f option to pass a file with the USE statement followed by other queries

-f query_file or

--query_file=query_file

query_file=path_to_query_file

Allows you to process a file containing multiple SQL statements. For example, a set of DDL statements to create a group of tables and views. Multiple statements must be delimited by the semicolon (;). You can specify a filename as - to represent the standard input. This feature allows you to use impala-shell as part of a Unix pipeline where SQL statements are generated dynamically by other tools

--query_option="option=value" or -Q "option=value"

The header line — [impala.query_options] followed by subsequent lines option=value, one option per line

Sets default query options for impala-shell commands. To set multiple query options at once, use multiple instances of this command-line option. The query option names are not case-sensitive

-k or --kerberos

use_kerberos=true

Determines whether the Kerberos authentication should be used when the shell connects to impalad. If Kerberos is not enabled on the instance of impalad to which impala-shell is connecting, an error is displayed. See Enabling Kerberos Authentication for Impala

-s kerberos_service_name or

--kerberos_service_name=name

kerberos_service_name=name

Specifies the impala-shell name to authenticate to a particular impalad service principal. If kerberos_service_name is not specified, impala is used by default. If this option is used for a connection in which Kerberos is not supported, an error is returned

-V or --verbose

verbose=true

Enables the verbose output

--quiet

verbose=false

Disables the verbose output

-v or --version

version=true

Displays the version information of impala-shell

-c

ignore_query_failure=true

Determines whether to ignore the query failure

-d default_db or

--database=default_db

default_db=default_db

Specifies the initial database. The same as running the USE statement after the connection. If not specified, the DEFAULT database is used

-ssl

ssl=true

Enables TLS/SSL for impala-shell

--ca_cert=path_to_certificate

ca_cert=path_to_certificate

Specifies the local path to the third-party CA certificate, or to a copy of the server certificate for self-signed server certificates. If --ca_cert is not set, impala-shell enables TLS/SSL, but does not validate the server certificate. This is useful for connecting to a trusted Impala that is only running over TLS/SSL, when a copy of the certificate is not available, for example, when debugging customer installations

-l

use_ldap=true

Enables LDAP authentication

-u

user=user_name

Specifies the username, when LDAP authentication is enabled by the -l option. Utilize the short username, not the full LDAP distinguished name. After using this option, the shell prompts for a password

--ldap_password_cmd=command

 — 

Specifies a command to retrieve the LDAP password, when the LDAP authentication is enabled by the -l option. If the command includes space-separated arguments, enclose the command and its arguments in quotation marks

--config_file=path_to_config_file

 — 

Specifies the path of the file containing impala-shell configuration settings. The default is /etc/impalarc. This setting can only be specified on the command line

--live_progress

live_progress=true

Displays a progress bar showing the approximate percentage of completion for each query. The information is updated interactively as the query is processed. See LIVE_PROGRESS Query Option

--disable_live_progress

live_progress=false

Allows you to disable live_progress in the interactive mode

--live_summary

 — 

Shows a detailed report, similar to the SUMMARY command, that shows progress details for each phase of the query execution. The information is updated interactively as the query is processed. See LIVE_PROGRESS Query Option

-var=variable_name=value

 — 

Defines a substitution variable that can be used within the impala-shell session. The variable can be substituted into statements processed by the -q or -f options, or in an interactive shell session. Within a SQL statement, use the following syntax: ${var:variable_name}

--auth_creds_ok_in_clear

 — 

Allows you to use LDAP authentication for an insecure connection to the shell. Note that authentication credentials are sent unencrypted, and may be vulnerable to attack

--protocol=protocol

 — 

The protocol to use for the connection to Impala. Valid protocol values are:

  • hs2 — impala-shell connects to impalad via HiveServer2 over TCP binary. This is the default setting.

  • hs2-http — impala-shell connects to impalad via HiveServer2 over HTTP.

  • beeswax — impala-shell connects to impalad via Beeswax over TCP binary.

--hs2_fp_format=HS2_FP_FORMAT

hs2_fp_format=HS2_FP_FORMAT

Sets the printing format specification for floating point values when using HS2 protocol. The default behavior makes the values handled by Python’s str() built-in method. Use 16G to match Beeswax protocol’s floating-point output format

The impala-shell configuration file

As mentioned above, you can store a set of impala-shell default settings in a configuration file.

The default path to the global impala-shell configuration file is /etc/impalarc. The default path to the user-level configuration file is ~/.impalarc. Note that the global-level file name does not include a dot (.). To change the default path to the global configuration file, set the $IMPALA_SHELL_GLOBAL_CONFIG_FILE environment variable.

Since ADH Impala configuration files are located in the /etc/impala/conf directory, you can create an impala-shell configuration file in this directory and set the $IMPALA_SHELL_GLOBAL_CONFIG_FILE variable:

$ export IMPALA_SHELL_GLOBAL_CONFIG_FILE=/etc/impala/conf

It is also possible to start impala-shell with the --config_file option set to the custom path of the configuration file.

The options specified in the user configuration file have higher priority than options from the global configuration file. In turn, any options you specify on the impala-shell command line override any corresponding options within the configuration file.

An impala-shell configuration file (global or user) must contain the header — [impala], followed by the impala-shell options.

The names of the options in the configuration file are similar to the long-form command-line arguments to the impala-shell command. They are listed in the table above.

You can specify key/value pair options using keyval, similar to the --var command-line option. For example, keyval=variable1=value1.

The configuration file can also contain query options in the [impala.query_options] section. Options specified in the [impala] section override the options from the [impala.query_options] section.

The example below shows a configuration file where the verbose mode is enabled, so each SQL query output is followed by timing information. impala-shell operates with the test database by default. The query output is formatted as delimited text and stored in a file. The configuration file also enables the show_profiles option. This option prints detailed performance information after each query.

[impala]
verbose=true
default_db=test
write_delimited=true
output_delimiter=,
output_file=/home/test_result.csv
show_profiles=true
keyval=var1=value1,keyval=var2=value2

The following example shows a configuration file that connects to a specific Impala node, runs a single query within the test database, then exits. Any query options predefined under the [impala.query_options] section are applied during the session.

[impala]
impalad=ees-adh1.ru-central1.internal
default_db=test
# Run the query and exit
query=select count(*) from customers

[impala.query_options]
mem_limit=32g

You can use this type of configuration file with the command-line option --config_file, to easily select between many predefined queries that could be run against different databases, hosts, or even different clusters. To run a sequence of statements instead of a single query, create a file with statements and specify the configuration option query_file=path_to_query_file of the query text.

Connect to impalad on a specific host

To connect to an impalad daemon on another host, you need to specify the connection parameters in one of the ways described below.

Use command-line options

You can use command-line options when you run the impala-shell command:

$ impala-shell -i ees-1adh3.ru-central1.internal:28000 --protocol='hs2-http'

This example connects to the impalad on the host ees-1adh3.ru-central1.internal, on the port 28000, via HiveServer2 over HTTP

Utilize a configuration file

It is possible to specify connection settings in a file. Create a file with connection settings as described in The impala-shell configuration file section. For example:

$ cat > ~/custom_conf1
[impala]
impalad=ees-1adh2.ru-central1.internal
query=select version()

When using the cat command, press CTRL+D to exit.

Run impala-shell:

$ impala-shell --config_file=~/custom_conf1

The result:

Starting Impala Shell with no authentication using Python 2.7.5
Warning: live_progress only applies to interactive shell sessions, and is being skipped for now.
Opened TCP connection to ees-1adh2.ru-central1.internal:21050
Connected to ees-1adh2.ru-central1.internal:21050
Server version: impalad version 4.2.0-RELEASE RELEASE (build 978afcfae9aa626de182b2872c4469646f42e0f6)
Query: select version()
Query submitted at: 2023-07-25 14:36:57 (Coordinator: http://ees-1adh2.ru-central1.internal:25000)
Query progress can be monitored at: http://ees-1adh2.ru-central1.internal:25000/query_plan?query_id=084b59535cac1baf:5f5e61f500000000
+----------------------------------------------------------------------------------------+
| version()                                                                              |
+----------------------------------------------------------------------------------------+
| impalad version 4.2.0-RELEASE RELEASE (build 978afcfae9aa626de182b2872c4469646f42e0f6) |
| Built on Mon Jun  5 16:33:03 UTC 2023                                                  |
+----------------------------------------------------------------------------------------+
Fetched 1 row(s) in 0.44s

Use the CONNECT command

Execute the CONNECT command during an impala-shell session:

$ impala-shell
connect ees-1adh3.ru-central1.internal;

The result:

Opened TCP connection to ees-1adh3.ru-central1.internal:21050
Connected to ees-1adh3.ru-central1.internal:21050
Server version: impalad version 4.2.0-RELEASE RELEASE (build 978afcfae9aa626de182b2872c4469646f42e0f6)
[ees-1adh3.ru-central1.internal:21050] default>

Connect to a specific database

You can use all the same connection options as in previous examples, and add the -d option with the database name:

$ impala-shell -i localhost -d my_db

localhost refers to the same host where impala-shell is running.

Execute SQL statements in non-interactive mode

Invoke impala-shell with the -q option to run a single statement, or the -f option to run a sequence of statements from a file. The impala-shell command returns a result immediately, without going into the interactive mode:

$ impala-shell -i localhost -d my_db -q 'show tables'

$ impala-shell -i localhost -d my_db -f recreate_tables.sql

Run commands and SQL statements

The impala-shell interpreter accepts the SQL statements described in the Impala SQL Statements article and shell commands that you can use for tuning performance and diagnosing problems.

In non-interactive mode, query output is printed to stdout or to the file specified by the -o option. Errors are printed to stderr. So, you can process the query output as a part of a Unix pipeline.

In interactive mode, impala-shell uses the Readline library to recall and edit previous commands.

You can use the following keyboard shortcuts and syntax rules for running commands and SQL statements in impala-shell:

  • To see the full list of available commands, press TAB twice.

  • To display previous commands, click the up-arrow and down-arrow keys.

  • Use the standard set of keyboard shortcuts in GNU Readline library for editing and cursor movement, such as Ctrl+A for the beginning of line and Ctrl+E for the end of line.

  • Commands and SQL statements can span multiple lines and must be terminated by a semicolon (;).

  • Use -- to comment out a single line comment and /* <content> */ to comment multi lines. A comment is considered as a part of the statement it precedes. When you type a comment, you get a continuation prompt until you enter a semicolon. For example:

    default> -- Show tables starting with t
    > SHOW TABLES LIKE 't*';

    If a comment contains the ${variable_name} and it is not for a variable substitution, the $ character must be escaped, for example — \${hello}.

The example below uses impala-shell commands and the test database and the table1 table:

create database test;
use test;
create table table1 (field1 int, field2 boolean, field3 string);
insert into table1 values ((1,true,"row1"), (2,true,"row2"), (3,false,"row3"));
select * from table1;

The result:

+--------+--------+--------+
| field1 | field2 | field3 |
+--------+--------+--------+
| 1      | true   | row1   |
| 2      | true   | row2   |
| 3      | false  | row3   |
+--------+--------+--------+

Substitute variables

You can define variables to be used within SQL statements processed by impala-shell. To do this on the command line, specify the --var=variable_name=value option. Within an interactive session or a script file processed by the -f option, use the SET VAR:variable_name=value command. To pass a variable in an SQL statement, use the notation: ${VAR:variable_name}.

The examples below contain multiple commands that define variables and use them in SQL statements executed through the -q option. The -q argument strings containing variable references (${var:value}) are single-quoted.

Example 1:

$ impala-shell --var=tname=table1 --var=colname=col1 --var=coltype=string -q 'CREATE TABLE ${var:tname} (${var:colname} ${var:coltype}) STORED AS PARQUET'

The result:

Query: CREATE TABLE table1 (col1 string) STORED AS PARQUET
+-------------------------+
| summary                 |
+-------------------------+
| Table has been created. |
+-------------------------+

Example 2:

$ impala-shell --quiet --var=tname=table1
use test;
select count(*) from ${var:tname};

The result:

+----------+
| count(*) |
+----------+
| 3        |
+----------+

Example 3:

$ impala-shell
set var:tname=table1;
select count(*) from ${var:tname};

The result:

+----------+
| count(*) |
+----------+
| 3        |
+----------+
Found a mistake? Seleсt text and press Ctrl+Enter to report it