What are Apache Pig execution Modes?

When the Big Data developers develop the Apache Pig scripts and execute them, the code gets compiled through the Pig Latin compiler. Once the code gets compiled, it goes through various stages to get the results. We can run Apache Pig Latin code and Pig statements using various modes. We will go through all the Apache Pig execution modes in detail in this blog post.

Execution ModesInteractiveBatch
Local ModeYesYes
Tez Local ModeExperimentalExperimental
Spark Local ModeYesYes
MapReduce ModeYesYes
Tez ModeNoYes
Apache Pig Execution Modes

With the latest version of Apache Pig, it supports six execution modes or executives. As some of these modes are experimental modes, it might be available for all the versions.

Local Mode

In order for users to run Pig in local mode, we need access to a single machine. All the files are installed and run using the local host and file system.

Use the below command to run pig in local mode.

pig -x local

It is useful to debug and check any syntactical error from pig script using a small subset of data.

Tez Local Mode

It runs Pig in local mode with Tez as a runtime engine. Use the below command to run pig in local mode with Tez as a runtime engine.

pig -x tez_local

Note: As Tez local mode is experimental, there might be some queries that can just error out on bigger data in local mode.

Spark Local Mode

  • Spark Local Mode – It runs pig in local mode with Apache Spark as a runtime engine. pig -x spark_local Note: Spark local mode is experimental. There are some queries that just error out on bigger data in local mode.

Map Reduce Mode

We use this mode to run Apache Pig in MapReduce mode. It is the default mode in Pig, which needs a Hadoop cluster and HDFS installation

#Two ways to invoke pig
pig
pig -x mapreduce

Tez Mode

It is used to run Pig in Tez mode. Apache Hadoop needs to be installed in the cluster and HDFS needs to be configured to use this. Use pig -x tez to run Pig in this mode.

pig -x tez

Spark Mode

It is used to run Pig in Spark mode. We need access to Spark, Yarn, or Mesos cluster, and HDFS installation to run Pig in Spark mode. We also need to enable Yarn auxiliary service to use this mode.

pig -x spark

Ways to run Apache Pig Commands

We can run the Pig commands in three ways, as given below.

  • Interactive Shell (Grunt Shell)

We can run Apache pig in interactive mode using the grunt shell. It gives the output of a Pig Latin statement in the shell itself using the dump operator.

  • Batch Mode using Pig Script

This is the batch mode of Apache Pig by writing Pig Latin scripts in a file ending with .pig extension

  • Embedded Mode

Apache Pig provides the mechanism of writing your own Program in Java or another language, known as User Defined Mode.

Conclusion

In this blog post, we learned about Apache Pig, and its different execution modes

Please share this blog post on social media and leave a comment with any questions or suggestions.