{"id":2116,"date":"2021-12-21T16:36:25","date_gmt":"2021-12-21T20:36:25","guid":{"rendered":"http:\/\/sys4seq.com\/?p=2116"},"modified":"2022-06-22T16:39:16","modified_gmt":"2022-06-22T20:39:16","slug":"building-a-real-time-big-data-pipeline-9-spark-mllib-regression-python","status":"publish","type":"post","link":"https:\/\/sys4seq.com\/index.php\/2021\/12\/21\/building-a-real-time-big-data-pipeline-9-spark-mllib-regression-python\/","title":{"rendered":"Building a real-time big data pipeline 9: Spark MLlib, Regression, Python"},"content":{"rendered":"<p>Apache Spark expresses parallelism by three sets of APIs &#8211; DataFrames, DataSets and RDDs (Resilient Distributed Dataset).Originally, spark was designed to read and write data from and to Hadoop Distributed File System (HDFS). A Hadoop cluster is composed of a network of master, worker and client nodes that orchestrate and execute the various jobs across the HDFS.<\/p>\n<p><a href=\"https:\/\/adinasarapu.github.io\/posts\/2020\/12\/blog-post-pyspark-mllib\/\">&gt;&gt;&gt;<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"Apache Spark expresses parallelism by three sets of APIs &#8211; DataFrames, DataSets and RDDs (Resilient Distributed Dataset).Originally, spark was designed to read and write data from and to Hadoop Distributed File System (HDFS). A Hadoop cluster is composed of a network of master, worker and client nodes that orchestrate and execute the various jobs across","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[44,43],"tags":[81,82,56],"_links":{"self":[{"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/posts\/2116"}],"collection":[{"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/comments?post=2116"}],"version-history":[{"count":1,"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/posts\/2116\/revisions"}],"predecessor-version":[{"id":2117,"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/posts\/2116\/revisions\/2117"}],"wp:attachment":[{"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/media?parent=2116"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/categories?post=2116"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sys4seq.com\/index.php\/wp-json\/wp\/v2\/tags?post=2116"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}