Introduction to Hadoop Mapreduce framework
Hadoop Mapreduce framework is a Big data processing framework which consists of MapReduce programming model and Hadoop Distributed File System.
What are Hadoop Execution Modes?
Apache Hadoop can be used in multiple modes to achieve a different set of tasks. There are three modes in which a Hadoop Mapreduce application can be executed.
Introduction to Hadoop Distributed File System(HDFS)
HDFS is a distributed file system that is designed for storing very large files with streaming data access patterns running on clusters of commodity hardware.
Understanding the World of Linux Operating Systems
Linux is open-source and one of the most popular operating systems. It is one of the most important technological advancements of the last century. It has made a huge impact…
What are the Sources of Big Data and How it gets generated?
Digital data is now everywhere—in every sector, in every economy, in every organization, and user of digital technology. While this topic might once have concerned only a few data geeks,…
What is Apache Hadoop? An In-depth Look at This Big Data Tool
What exactly is Apache Hadoop? Apache Hadoop is an open-source distributed processing framework that is used to store and process large datasets whose size ranges from gigabytes to petabytes of…
Important Git Command Cheat Sheet
Git is a version control system for tracking changes in computer files and coordinating work on those files among multiple people. This post is a collection of important Git command Cheat Sheet that i use in my day to day basis.
Installing Apache Spark on Linux
Apache Spark is an open-source cluster-computing framework. This post will explain the steps for installing prebuilt version of Apache Spark 2.1.1 as a stand alone cluster in a Linux system. I have used Ubuntu as a debains based OS for this post.
What is Big Data and Why it is important to understand? Introduction and Properties
The amount of data in our world has been exploding. Different Companies capture trillions of bytes of information about their customers, suppliers, and operations, and millions of networked sensors are being embedded in the physical world in devices such as mobile phones and automobiles, sensing, creating, and communicating data.
