Spark Tutorials

Apache Spark is a distributed, in-memory, and disk-based optimized open-source framework which does real-time analytics using Resilient Distributed Data(RDD) sets. It includes a streaming library, and a rich set of…

Continue ReadingSpark Tutorials

User-Defined Aggregate Functions(UDAF) Using Apache Spark

UDAF stands for User Defined Aggregate functions. Aggregate functions are used to perform a calculation on a set of values and return a single value. It is difficult to write an aggregate function compared to writing a User Defined Functions(UDF) as we need to aggregate on multiple rows and columns. Apache Spark UDAF operates on more than one row or Column while returning a single value results

Continue ReadingUser-Defined Aggregate Functions(UDAF) Using Apache Spark