Apache Spark is a distributed, in-memory, and disk-based optimized open-source framework which does real-time analytics using Resilient Distributed Data(RDD) sets. It includes a streaming library, and a rich set of programming interfaces to make data processing and transformation easier.
This page will guide you through different topics one needs for learning SparkĀ and its related technologies.
Table of Contents
ToggleBasics Concepts
Advanced Topics
- Spark Resilient Distributed Dataset(RDD)
- Installing Apache Spark on Linux
- Data Locality in Spark
- Caching and Persisting Mechanism in Spark
- Apache Spark Shared Variables
- Accessing Hive in HDP3 using Apache Spark
- Submit Apache Spark Job with REST API
- SparkSession in Apache Spark
- User-Defined Aggregate Functions(UDAF) Using Apache Spark
Share this:
- Click to share on Facebook (Opens in new window) Facebook
- Click to share on LinkedIn (Opens in new window) LinkedIn
- Click to share on X (Opens in new window) X
- Click to share on Tumblr (Opens in new window) Tumblr
- Click to share on Reddit (Opens in new window) Reddit
- Click to share on Pinterest (Opens in new window) Pinterest
- Click to share on Telegram (Opens in new window) Telegram
- Click to share on WhatsApp (Opens in new window) WhatsApp
- Click to share on Pocket (Opens in new window) Pocket
- Click to print (Opens in new window) Print