Spark and Hive

By default, Spark is already configured to work with Hive. Hive settings for Spark are located in the home directory — /etc/spark/conf. If your Spark application is interacting with Hadoop or both with Hive, you need to put Hadoop configuration files in the Spark’s classpath.

Multiple running applications might require different Hadoop/Hive client side configurations. You can copy and modify hdfs-site.xml, core-site.xml, yarn-site.xml in Spark’s classpath for each application.

NOTE
In a Spark cluster running on YARN, these configuration files are set cluster-wide and cannot safely be changed by the application.

The best choice is to use Spark Hadoop properties in the form of spark.hadoop.*, and Spark Hive properties in the form of spark.hive.*. For example, adding spark.hadoop.abc.def=xyz is the same as adding the abc.def=xyz Hadoop property; adding spark.hive.abc=xyz is equivalent to the hive.abc=xyz Hive property. They can be considered as same as normal Spark properties which can be set in $SPARK_HOME/conf/spark-defaults.conf. Default home directory for Spark is /etc/spark/conf. Here you can keep all Spark configurations.

In some cases, you may want to avoid hard-coding certain configurations in a SparkConf. For instance, Spark allows you to simply modify or add configurations at runtime:

Passing parameters to spark-submit
./bin/spark-submit \
  --name "My app" \
  --master local[4] \
  --conf spark.eventLog.enabled=false \
  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \
  --conf spark.hadoop.abc.def=xyz \
  --conf spark.hive.abc=xyz
  myApp.jar

You can find all Spark parameters description in the Spark documentation.

Custom configuration

If you need to make some custom updates in Spark applications for Hive, there are two ways:

  • One way is by adding custom properties into the spark-defaults.conf file and adding this file to the Hive classpath.

  • The other way is to set configuration properties in the hive-site.xml Hive configuration file. All configuration files are stored in the Spark home directory /etc/spark/conf/hive-site.xml.

For more information about Hive parameters for Spark, please, refer to Hive on Spark parameters.

Found a mistake? Seleсt text and press Ctrl+Enter to report it