What is Spark Shared Variables?
Shared variables are an abstraction in Apache Spark which is used in parallel operations in different nodes. When Spark runs a function in parallel as a set of tasks on…
Most Useful Apache Hadoop HDFS Commands
This post describes some of the most useful Apache Hadoop HDFS commands one would need when working in a Hadoop Cluster
Introduction to Hadoop Mapreduce framework
Hadoop Mapreduce framework is a Big data processing framework which consists of MapReduce programming model and Hadoop Distributed File System.
What are Hadoop Execution Modes?
Apache Hadoop can be used in multiple modes to achieve a different set of tasks. There are three modes in which a Hadoop Mapreduce application can be executed.
Introduction to Hadoop Distributed File System(HDFS)
HDFS is a distributed file system that is designed for storing very large files with streaming data access patterns running on clusters of commodity hardware.
Understanding the World of Linux Operating Systems
Linux is open-source and one of the most popular operating systems. It is one of the most important technological advancements of the last century. It has made a huge impact…
What are the Sources of Big Data and How it gets generated?
Digital data is now everywhere—in every sector, in every economy, in every organization, and user of digital technology. While this topic might once have concerned only a few data geeks,…
What is Apache Hadoop? An In-depth Look at This Big Data Tool
What exactly is Apache Hadoop? Apache Hadoop is an open-source distributed processing framework that is used to store and process large datasets whose size ranges from gigabytes to petabytes of…