Apache Kafka

Definition

Distributed streaming platform for building real-time data pipelines and streaming applications, enabling high-throughput data processing and integration.

Use Cases

Provider Equivalents

Frequently Asked Questions

What's the difference between Apache Kafka and RabbitMQ?
Kafka is a distributed event streaming platform built for very high throughput and durable event storage (events are kept in a log for a configurable time and can be replayed). RabbitMQ is a traditional message broker focused on flexible routing and per-message delivery semantics, typically with messages removed once consumed. Use Kafka when you need event streams, replay, and many consumers reading the same data; use RabbitMQ when you need complex routing patterns and classic queue-based messaging.
When should I use Apache Kafka?
Use Kafka when you need to ingest and distribute large volumes of events in real time, decouple producers from multiple consumers, and optionally replay historical events. Common scenarios include clickstream and telemetry ingestion, microservice event buses, CDC (change data capture) pipelines, log aggregation, real-time analytics, and feeding data lakes/warehouses.
How much does Apache Kafka cost?
Kafka is open source, so the software license cost is $0, but you pay for infrastructure and operations: compute for brokers, storage for logs, network egress, and the engineering time to run and monitor the cluster (plus ZooKeeper for older deployments, or KRaft in newer Kafka versions). Managed options (e.g., Amazon MSK or Confluent Cloud) charge based on broker capacity/throughput, storage, and data transfer, and can reduce operational overhead.

Category: data

Difficulty: advanced

Related Terms

See Also