Hadoop CLI

Overview

This article includes the reference documentation for the Hadoop shell command-line tool.

NOTE
Currently, the hadoop dfs command is deprecated, use hdfs dfs instead

All the Hadoop commands and subprojects follow the same basic structure:

The usage is as follows:

$ shellcommand [SHELL_OPTIONS] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]

Hadoop shell basic structure

shellcommand

The command of the project being invoked. For example, Hadoop commonly uses hadoop, HDFS uses hdfs, and YARN uses yarn

SHELL_OPTIONS

Options that the shell processes before executing Java

COMMAND

Action to perform

GENERIC_OPTIONS

The common set of options supported by multiple commands

COMMAND_OPTIONS

Various command options for the Hadoop common subprojects

All the shell commands accept a common set of options. For some commands, these options are ignored. For example, passing --hostnames on a command that only executes on a single host will be ignored.

Shell options

--buildpaths

Enables developer versions of JARs

--config confdir

Overwrites the default configuration directory. The default directory is $HADOOP_HOME/etc/hadoop

--daemon mode

If the command supports daemonization (e.g., hdfs namenode), executes in the appropriate mode.

Supported modes are start to start the process in a daemon mode, stop to stop the process, and status to determine the active status of the process.

The status will return an LSB-compliant result code.

If no option is provided, commands that support daemonization will run in the foreground.

For commands that don’t support daemonization, this option is ignored

--debug

Enables shell-level configuration debugging information

--help

Displays shell script usage information

--hostnames

When --workers is used, overrides the workers file with a whitespace-delimited list of hostnames where to execute a multi-host subcommand.

If --workers isn’t used, this option is ignored

--hosts

When --workers is used, overrides the workers file with another file that contains a list of hostnames where to execute a multi-host subcommand.

If --workers isn’t used, this option is ignored

--loglevel loglevel

Overrides the log level.

Valid log levels are FATAL, ERROR, WARN, INFO, DEBUG, and TRACE. Default is INFO

--workers

If possible, executes this command on all hosts in the workers file

Many subcommands share a common set of configuration options to alter their behavior.

Generic options

-archives <comma separated list of archives>

Specifies comma-separated archives to be extracted onto the compute machines. Applies only to a job

-conf <configuration file>

Specifies an application configuration file

-D <property>=<value>

Sets a value for a given property

-files <comma separated list of files>

Specifies comma-separated files to be copied to the MapReduce cluster. Applies only to a job

-fs <file:///> or <hdfs://namenode:port>

Specifies the default file system URL to use.

Overrides fs.defaultFS property from configurations

-jt <local> or <resourcemanager:port>

Specifies a ResourceManager. Applies only to a job

-libjars <comma separated list of jars>

Specifies comma-separated JAR files to include in the classpath. Applies only to a job

User commands

These commands are helpful for Hadoop cluster users.

Command Description

archive

Creates a Hadoop archive

checknative

Checks the availability of the Hadoop native code

CLASSNAME

Runs an arbitrary Java class

classpath

Prints the classpath

credential

Manages credentials, passwords, and secrets

distch

Changes the ownership and permissions on files

distcp

Copies file or directories recursively

dtutil

Utility to fetch and manage Hadoop tokens

envvars

Displays computed Hadoop environment variables

fs

It’s a synonym for hdfs dfs when HDFS is in use

gridmix

Benchmark tool for Hadoop cluster

jar

Runs a JAR file

jnipath

Prints the computed java.library.path

kerbname

Converts the named principal via the auth_to_local rules to the Hadoop username

kdiag

Diagnoses Kerberos problems

key

Manages keys via the KeyProvider

kms

Runs the Key Management Server

trace

Views and modifies Hadoop tracing settings

version

Prints the version

Administration commands

Commands are useful for administrators of a Hadoop cluster.

Command Description

daemonlog

Gets/sets the log level

Found a mistake? Seleсt text and press Ctrl+Enter to report it