Spatial gene expression data analysis on Cluster : 10X Genomics, Space Ranger

Running spaceranger as cluster mode that uses Sun Grid Engine (SGE) as queuing. There are 2 steps to analyze Spatial RNA-seq data. Step 1: spaceranger mkfastq demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files. Step 2: spaceranger count takes FASTQ files from spaceranger mkfastq and performs alignment, filtering, barcode counting,
read more

Building a real-time big data pipeline 9: Spark MLlib, Regression, Python

Apache Spark expresses parallelism by three sets of APIs – DataFrames, DataSets and RDDs (Resilient Distributed Dataset).Originally, spark was designed to read and write data from and to Hadoop Distributed File System (HDFS). A Hadoop cluster is composed of a network of master, worker and client nodes that orchestrate and execute the various jobs across
read more