Apache Kafka in Five Minutes
What is Apache Kafka?
- Framework for building data pipelines and stream-based applications
- Fault tolerant, resilient
- Very high throughput
- Horizontal scalable
- Integrates well with Big Data frameworks like Apache Flink or Apache Spark
- Apache project ⇒ Apache license (i.e. OS software)
Common use cases
- Messaging systems (e.g. loosed coupled microservices communication)
- Gathering metrics from different locations (e.g. IoT)
- Collecting application logs
- Stream processing / transformation
Components
Inside the cluster
Logs
- Each partition / replica = transactional log
- Data in log is immutable
- Each message in log gets unique id (offset)
- Offsets are per partition
- Message order guarantee within partition
- Data is temporarily kept (thus messages are replayable)