{"id":2118,"date":"2021-01-19T16:43:22","date_gmt":"2021-01-19T20:43:22","guid":{"rendered":"http:\/\/sys4seq.com\/?p=2118"},"modified":"2022-06-22T16:44:50","modified_gmt":"2022-06-22T20:44:50","slug":"building-a-real-time-big-data-pipeline-10-spark-streaming-kafka-java","status":"publish","type":"post","link":"https:\/\/sys4seq.com\/index.php\/2021\/01\/19\/building-a-real-time-big-data-pipeline-10-spark-streaming-kafka-java\/","title":{"rendered":"Building a real-time big data pipeline 10: Spark Streaming, Kafka, Java"},"content":{"rendered":"<p>Spark Streaming is an extension of the core Apache Spark platform that enables scalable, high-throughput, fault-tolerant processing of data streams; written in Scala but offers Scala, Java, R and Python APIs to work with. It takes data from the sources like Kafka, Flume, Kinesis, HDFS, S3 or Twitter. This data can be further processed using complex algorithms. The final output, which is the processed data can be pushed out to destinations such as HDFS filesystems, databases, and live dashboards. Spark Streaming allows you to use <em>Machine Learning<\/em>applications to the data streams for advanced data processing. Spark uses Hadoop\u2019s client libraries for distributed storage (HDFS) and resource management (YARN).<\/p>\n<p><a href=\"https:\/\/adinasarapu.github.io\/posts\/2021\/01\/blog-post-kafka-spark-streaming\/\">&gt;&gt;&gt;<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"Spark Streaming is an extension of the core Apache Spark platform that enables scalable, high-throughput, fault-tolerant processing of data streams; written in Scala but offers Scala, Java, R and Python APIs to work with. It takes data from the sources like Kafka, Flume, Kinesis, HDFS, S3 or Twitter. This data can be further processed using","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[44,43],"tags":[46,45,52],"_links":{"self":[{"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/posts\/2118"}],"collection":[{"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/comments?post=2118"}],"version-history":[{"count":1,"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/posts\/2118\/revisions"}],"predecessor-version":[{"id":2119,"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/posts\/2118\/revisions\/2119"}],"wp:attachment":[{"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/media?parent=2118"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/categories?post=2118"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/tags?post=2118"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}