Hadoop Yarn and Its Commands

YARN stands for Yet Another Resource Negotiator. It is a centralized cluster resource management and job scheduling platform to deliver scalable operations across the cluster. It was introduced in Hadoop 2 to help MapReduce and is the next-generation computation and resource management framework. Furthermore, it allows multiple data processing engines such as SQL (Structured Query Language), real-time streaming, data science, graph processing, and batch processing to handle data stored in a single platform.

We can utilize the available resources efficiently and run multiple applications by using YARN. All the applications running within YARN will share a common resource, making the cluster efficient. We can run other jobs that do not follow the MapReduce model using YARN. It does not care about the type of application being executed and also does not keep historical information about the execution on the cluster.

In the Hadoop Stack, Apache Yarn sits on top of HDFS (Hadoop distributed File system) and acts as a mediator between HDFS and processing engines(Tez and Spark).

Application and System Logs in HDFS

Application, System, as well as Container Logs in Hadoop, are important for debugging applications that experience failure. These logs are stored in the default file system of the individual data nodes once the job is finished. Even though the application can run on one or many machines, logs for all the YARN containers are aggregated into a single file. YARN provides different commands using the Command Line Interface for aggregating and accessing logs by using the application ID, which is generated when the job starts on the cluster.

We can use the YARN CLI (Command Line Interface) to view log files for running applications.

We can also access container log files using the YARN Resource Manager web UI, but more options are available when we use the yarn logs CLI command.

Check Logs for running applications

When we run an application in Hadoop, it assigns a unique application ID to that job. We can use this application ID to view all logs for a running application

$yarn logs -applicationId <Application ID>

View specific Log Types for a Running Application

~$yarn logs -applicationId <Application ID> -log_files <log_file_type>

View only the Standard Error logs in Yarn

We can use the stderr option to get only Standard Error logs in Yarn

$yarn logs -applicationId <Application ID> -log_files stderr

The -logFiles option also supports Java regular expressions. So the following format would return all types of log files.

$yarn logs -applicationId <Application ID> -log_files .* 

View only the Standard Output logs in Yarn

$yarn logs -applicationId <Application ID> -log_files stdout

View Application Master Log Files

Use the following command format to view all Application Master container log files for a running application:

yarn logs -applicationId <Application ID> -am ALL

Use the following command format to view only the first Application Master container log files:

yarn logs -applicationId <Application ID> -am 1

List Container IDs

Use the following command format to list all container IDs for a running application:

yarn logs -applicationId <Application ID> -show_application_log_info

View Log Files for One Container

Once you have the container IDs, you can use the following command format to list the log files for a particular container:

yarn logs -applicationId <Application ID> -containerId <Container ID>

Show Container Log File Information

Use the following command format to list all the container log file names (types) for a running application:

yarn logs -applicationId <Application ID> -show_container_log_info

You can then use the -logFiles option to view a particular log type.

View a Portion of the Log Files for One Container

When you run an application in a distributed environment, it produces a lot of logs. We can use the below command to list only a portion of the log files for a particular container. You need to first find out the container ID of your application.

yarn logs -applicationId <Application ID> -containerId <Container ID> -size <bytes>

This command displays the first, 10000 bytes by default.

yarn logs -applicationId <Application ID> -containerId <Container ID> -size 10000

If we want to view the last 10000 bytes of logs, we can use the negative sign(-) as a prefix to size.

yarn logs -applicationId <Application ID> -containerId <Container ID> -size -1000

Download Logs for a Running Application

There are times when we need to download logs to the local file system. We can use the following command format to download logs to a local folder.

yarn logs -applicationId <Application ID> -out <path_to_local_folder>

The container log files are organized in parent folders labeled with the applicable node ID.

Display Help for YARN Logs

If you come across any issues or get confused about any of the commands, you can use the help command to display help.

yarn logs -help