Kafka Introduction, Kafka Architecture Overview, Use-Cases & and Basic Concepts Explanation

By | October 26, 2020

Apache Kafka is a widely used and popular open-source Event-Streaming & messaging system with capabilities to handle huge loads of data with its distributed, fault tolerant architecture. In this Kafka beginners tutorial, we will explain basic concepts of Kafka, how Kafka works, Kafka architecture and Kafka use-cases.

Kafka Introduction: An Overview of Kafka

Kafka is a messaging and event streaming platform which is available open-source and is used by organizations to enable high velocity and high loads of events data transfer between different systems & applications in real-time.

Kafka was originated in LinkedIn and for their own use-cases but later it was made public in 2011 as an open-source platform under apache license and after that It has captured a great attention of people around the world for small, medium as well as large scale integration projects due to its distributed, fault tolerant, scalable and highly efficient event streaming capabilities.

Kafka Works with Pub-Sub mechanism for message communication where message producers publish data to Kafka cluster and interested consumers subscribe to the data of their interest.

If you prefer to go through Kafka Beginners tutorial in video format, watch below video on TutorialsPedia YouTube channel where Kafka basics, Kafka Architecture, Kafka Key Concepts and Kafka Use-cases have been explained in details:

If you prefer to go through Kafka tutorial in text format, continue reading below.

Why Kafka Is Important and What are Benefits of Using Kafka?

Despite the fact that there are many key players in the market in the area of event processing and pub-sub messaging, Kafka places itself prominent and is widely used for a number of reasons and benefits that are associated with Kafka as explained below:

  • Kafka is highly efficient to handle high velocity and high loads of data due to its smart architectural design. It can handle millions of events in real-time with low latency and higher throughput.
  • Kafka is highly scalable due to its distributed and clustered architecture. To meet any growing business needs, clusters can be horizontally expanded and scaled by adding more brokers to the cluster. Topics inside brokers can also be fine-tuned and scaled with a smart partitioning and replication.
  • Kafka is considered to be highly durable due to its default persistence feature where messages are persisted to avoid any data loss. This makes Kafka a great choice for data-critical and mission-critical integrations in real-time.
  • Kafka architecture supports fault tolerance making message communication and event processing robust and reliable without data losses.
  • In terms of cost, Kafka becomes a solid choice as it is open-source and doesn’t involve heavy licencing costs contrary to other proprietary options.
  • Kafka with its rich features and optimal performance, fits for a number of different use-cases in any organization as a messaging platform, as an event streaming platform for activities tracking, application monitoring, real-time events aggregation etc.

Kafka Architecture: Explanation of Kafka Architecture and Its Basic Concepts

Kafka Messaging Platform comprises of multiple components and actors for messages publishing, messages handling, messages replication, messages consumption and platform management and governance.

Below diagram provides a picture of high level Kafka architecture 

kafka architecture

Based on above architecture diagram of Kafka, Let’s explain core concepts in detail.

Kafka Concepts Explained: Kafka Producer

Producer is the source which Publishes events data to Kafka topic. A Kafka Producer can publish messages to one or more topics as per business requirements in any particular scenario.
Kafka message produced by producer contains information about the topic and its associated partition to which message should be written. Each Kafka message produced is written to the partition in any topic to the next available offset which ensures an ordered sequencing of the messages delivery in Kafka messaging system. In this manner, olders messages are available in a partition with smaller offset while new messages appear with higher offset value.

As an example, in below picture, we can see that new messages in each partition are being written on top of the partition:

Kafka producer doesn’t wait for acknowledgements from the broker to ensure faster delivery.

Kafka Concepts Explained: Kafka Consumer

Consumers/Subscribers read messages from Kafka Topics based on their subscription. Multiple Consumers can be groups as a “Consumer Group”. A message from Topic is consumed by only one consumer from a particular group. Consumers belonging to one group share a Group ID. Consumers in a group can divide partitions among themselves for efficient consumption of messages.

In the below picture, we have scenario where we have a total of 5 consumers grouped as two consumer groups. In this scenario, a message from a particular topic partition can be consumed by only one consumer in a group but a consumer from another group is capable to consume messages from same topic partition.

 

Kafka Concepts Explained: Kafka Topic

Topic is a stream of messages belonging to a specific category. Topics are divided into partitions. Each topic has minimum one partition. Messages in partitions are stored as immutable and are in ordered sequence with an offset.

In Kafka, replication can be done at partition level. Each partition can have one or more replicas meaning that partitions contain messages that are replicated over a few Kafka brokers in the cluster. Replication works on Leader-follower principle with one replica as leader and others are followers. All read-write operations are governed by the leader and followers replicate leader.

Kafka Concepts Explained: Kafka Broker & Cluster

A Broker is a central server in a Kafka environment responsible for providing services to producers & consumers for storing and retrieving data. In a stand-alone approach, Kafka platform will contain one broker at minimum.

A collection of brokers forms a Kafka Cluster to achieve distributed and fault tolerant event processing. Kafka Brokers cluster is managed by Zookeeper to support a collective function of all brokers in a cluster in a coordinated manner.

Kafka Concepts Explained: Kafka Zookeeper

Management and coordination of Kafka brokers in a cluster is achieved through ZooKeeper.  ZooKeeper notifies all nodes when the topology of the Kafka cluster changes. E.g. when brokers and topics are added or removed or when a broker goes down or comes back to running state.

ZooKeeper also controls Leader/Follower selection among multiple brokers. When a leader fails, another broker from followers takes charge as leader.

Kafka Interview Questions Answers

In another article, I have discussed Kafka Interview Questions Answers for Developers in detail. That article should greatly help you to get a feel of the possible questions being asked for Kafka Job Interviews.

Kafka Use Cases

Kafka with its publish-subscribe based messaging features and its event streaming capabilities can be used for a variety of use-cases. Here is a list of a few example use-cases for Kafka:

  • Kafka can be used for Stream Processing to stream, aggregate, enrich and transform data from multiple sources. E.g. real-time data from some sensors or vehicles can be streamed, gathered, enriched, aggregated and made available to some downstream systems & applications (consumers) to take necessary actions based on events data.
  • Kafka can be used for Website Activity Tracking. E.g. in an e-commerce website, certain events can be generated and published based on user actions and such event’s data can be utilized on consumer applications side for analytics, business intelligence, marketing and user-experience improvement actions.
  • Kafka can be used for Applications health monitoring in any organization containing large network of systems & applications. Events can be generated from applications based on usage patterns, resources utilization, abnormal activities etc. and necessary pre-emptive and corrective actions can be taken based on events data.
  • Kafka can be used as a Messaging Platform for real-time messaging as replacement of traditional messaging systems.

To learn how to isntall Kafka on Windows OS, refer to my other tutorial where I explained in detail how to install Kafka on Windows as Windows Service.

Feel free to comment below if you have any queries or any further thoughts on this topic.

Ajmal Abbasi

Ajmal Hussain Abbasi is Integration Consultant By Profession with 13+ years experience in Integration domain mainly with TIBCO products. He has extensive practical knowledge of TIBCO Business Works, TIBCO Cloud, TIBCO Flogo, TIBCO Mashery, TIBCO Spotfire, EMS and TIBCO ActiveSpaces. He has worked on a number of highly critical integration projects in various sectors by using his skills in TIBCO Flogo, TIBCO API Management (Mashery), TCI, Tibco Designer, TIBCO Business Studio, Adapters, TIBCO EMS, RV, Administrator, TIBCO BE, TIBCO ActiveSpaces etc. Ajmal Abbasi has experience with MuleSoft ESB as well. Ajmal Abbasi is also experienced in the area of API Management particularly with WSO2 API management platforms. Ajmal Abbasi is also experienced in developing solutions using Core Java and J2EE Technologies. You can contact Ajmal Abbasi for Consultancy, Technical Assistance and Technical Discussions.

More Posts - Website - Facebook - LinkedIn - YouTube

3 thoughts on “Kafka Introduction, Kafka Architecture Overview, Use-Cases & and Basic Concepts Explanation

  1. Pingback: How to Setup Kafka on Windows: Install and Run Kafka As Windows Service | TutorialsPedia

  2. Pingback: MuleSoft Kafka Integration: Send & Receive Kafka Messages Using Mule 4 | TutorialsPedia

  3. Pingback: Kafka Interview Questions Answers for Developers

Leave a Reply

Your email address will not be published. Required fields are marked *