Introduction to Java Programming Language
Java is a general-purpose computer-programming language that is concurrent, class-based, object-oriented, and specifically designed to have as few implementation dependencies as possible.
Java is a general-purpose computer-programming language that is concurrent, class-based, object-oriented, and specifically designed to have as few implementation dependencies as possible.
Relational database is a computer-software application that interacts with end-users, other applications, and the database itself to capture and analyze data. A general-purpose DBMS allows the definition, creation, querying, update, and administration of databases.
Apache Spark Streaming execution model is advantageous over traditional streaming systems for its fast recovery from failures, dynamic load balancing, streaming and interactive analytics, and native integration.
What is Data? Before we learn about databases, we need to first understand what data is. Data is information or facts related to an object that is under consideration. In…
Resilient Distributed Dataset (RDD) is the fault-tolerant and immutable primary data structure/abstraction in Apache Spark. It is a distributed collection of objects. The term ‘resilient’ in ‘Resilient Distributed Dataset’ refers…
There are 3 types of joins, Reduce-Side joins, Map-Side joins, and memory-backed Joins that can be used to join Tables in MapReduce. Map Side Join Joining at the map side…
Shared variables are an abstraction in Apache Spark which is used in parallel operations in different nodes. When Spark runs a function in parallel as a set of tasks on…
This post describes some of the most useful Apache Hadoop HDFS commands one would need when working in a Hadoop Cluster