Apache Kafka is a scalable, high performance and low latency platform for handling of real-time data feeds. Kafka allows reading and writing streams of data like a messaging system; written in Scala and Java.Kafka requires Apache Zookeeper to run. Kafka v2.5.0 (scala v2.12 build) and zookeeper (v3.4.13) were installed using docker.
Related Posts
-
Building a real-time big data pipeline 9: Spark MLlib, Regression, Python
Apache Spark expresses parallelism by three sets of APIs – DataFrames, DataSets and RDDs (Resilient -
Building a real-time big data pipeline 10: Spark Streaming, Kafka, Java
Spark Streaming is an extension of the core Apache Spark platform that enables scalable, high-throughput, -
Building a real-time big data pipeline 8: Spark MLlib, Regression, R
Apache Spark MLlib is a distributed framework that provides many utilities useful for machine learning tasks,