Building a real-time big data pipeline 10: Spark Streaming, Kafka, Java

Spark Streaming is an extension of the core Apache Spark platform that enables scalable, high-throughput, fault-tolerant processing of data streams; written in Scala but offers Scala, Java, R and Python APIs to work with. It takes data from the sources like Kafka, Flume, Kinesis, HDFS, S3 or Twitter. This data can be further processed using
read more

Building a real-time big data pipeline 4 : Spark Streaming, Kafka, Scala

Apache Kafka is a scalable, high performance and low latency platform for handling of real-time data feeds. Kafka allows reading and writing streams of data like a messaging system; written in Scala and Java.Kafka requires Apache Zookeeper to run. Kafka v2.5.0 (scala v2.12 build) and zookeeper (v3.4.13) were installed using docker. >>>
read more

Building a real-time big data pipeline 1 : Kafka, RESTful, Java

Apache Kafka is used for building real-time data pipelines and streaming apps. Kafka helps transmit messages from one system to another (a message broker). Zookeeper is required to run a Kafka Cluster. Apache ZooKeeper is primarily used to track the status of nodes in the Kafka Cluster and maintain a list of Kafka topics and
read more