Apache Spark is a distributed, in-memory, and disk-based optimized open-source framework which does real-time analytics using Resilient Distributed Data(RDD) sets. It includes a streaming library, and a rich set of programming interfaces to make data processing and transformation easier.
This page will guide you through different topics one needs for learning Spark and its related technologies.
Table of Contents
ToggleBasics Concepts
Advanced Topics
- Spark Resilient Distributed Dataset(RDD)
- Installing Apache Spark on Linux
- Data Locality in Spark
- Caching and Persisting Mechanism in Spark
- Apache Spark Shared Variables
- Accessing Hive in HDP3 using Apache Spark
- Submit Apache Spark Job with REST API
- SparkSession in Apache Spark
- User-Defined Aggregate Functions(UDAF) Using Apache Spark
Share this:
- Share on Facebook (Opens in new window) Facebook
- Share on LinkedIn (Opens in new window) LinkedIn
- Share on X (Opens in new window) X
- Share on Tumblr (Opens in new window) Tumblr
- Share on Reddit (Opens in new window) Reddit
- Share on Pinterest (Opens in new window) Pinterest
- Share on Telegram (Opens in new window) Telegram
- Share on WhatsApp (Opens in new window) WhatsApp
- Print (Opens in new window) Print
