Technology and Trends

Introduction to Apache Kafka

Apache Kafka is an open-source project for a distributed publish-subscribe messaging system rethought as a distributed commit log. It stores messages in topics that are partitioned and replicated across multiple brokers in a cluster. Producers send messages to topics from which consumers read.

Kafka uses Zookeeper to share and save states between brokers. Each broker maintains a set of partitions: primary and/ or secondary, for each topic. A set of Kafka brokers working together will maintain a set of topics. Each topic has its partitions distributed over the participating Kafka brokers and the replication factor determines, intuitively, the number of times a partition is duplicated for fault tolerance.

While many brokered message queue systems have the broker maintain the state of its consumers, Kafka does not. This frees up resources for the broker to ingest data faster. For more information about Kafka’s performance, see the Kafka’s official documentation

Kafka Architecture

Figure: Apache Kafka

Kafka vs Other Message Broker System

In traditional message processing, you apply simple computations on the messages – in most cases individually per message.

In-stream processing, you apply complex operations on multiple input streams and multiple records (i.e., messages) at the same time (like aggregations and joins).

Furthermore, the traditional messaging systems cannot go “back in time” – ie, they automatically delete messages after they are delivered to all subscribed consumers. In contrast, Kafka keeps the messages as it uses a pull-based model (ie, consumer pull data out of Kafka) for a configurable amount of time. This allows consumers to “rewind” and consume messages multiple times – or if you add a new consumer, it can read the complete history. This makes stream processing possible because it allows for more complex applications. Furthermore, stream processing is not necessarily about real-time processing – it’s about processing an infinite input stream (in contrast to batch processing, which is applied to finite inputs).

And Kafka offers Kafka Connect and Streams API – so it is a stream processing platform and not just a messaging/pub-sub system (even if it uses this in its core).

A communication protocol in Kafka

In Kafka, communication between the clients and the servers is done with a simple, high-performance, language-agnostic TCP protocol. This protocol is versioned and maintains backward compatibility with the older version. Kafka natively supports Java, although it supports clients in other languages too.

Kafka API

Apache Kafka provides four core APIs using which applications can be developed using Kafka.

References:

Kafka Official documentation

Choosing number of topic/Partitions

Exit mobile version