A new version of Apache Kafka has been released, with a new Kafka Streams API for session windows, and improved compatibility for Java clients. Apache Kafka is a distributed streaming platform that can be used for building real-time streaming data pipelines between systems or applications.
Alongside the new Streams API, other improvements to this release include improved compatibility for Java clients, improved semantics for Kafka Streams joins, and single message transforms in Kafka Connect.
Kafka was originally developed at LinkedIn, from where it was taken on as an Apache project. It is a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system that can be used in place of traditional message brokers.
The work at Apache has concentrated on making Kafka work with Apache products including Storm, HBase and Spark for real-time analysis and rendering of streaming data.
Kafka was designed from the outset as a distributed system so it is easy to scale out. It supports multi-subscribers and automatically balances the consumers during failure.
The new version of Kafka now has four core APIs. The Production API allows an application to publish a stream records to one or more Kafka topics.
The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
The Connector API can be used for building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems.
The new Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.