Building a real-time big data pipeline 8: Spark MLlib, Regression, R - Genomics

Apache Spark MLlib is a distributed framework that provides many utilities useful for machine learning tasks, such as: Classification, Regression, Clustering, Dimentionality reduction and, Linear algebra, statistics and data handling. R is a popular statistical programming language with a number of packages that support data processing and machine learning tasks. To address R’s scalability issue, the Spark community developed SparkR package which is based on a distributed data frame that enables structured data processing with a syntax familiar to R users.

>>>

Related Posts

Building a real-time big data pipeline 9: Spark MLlib, Regression, Python

Building a real-time big data pipeline 10: Spark Streaming, Kafka, Java

Building a real-time big data pipeline 7 : Spark MLlib, Regression, Java