Home - Genomics | Data Science

Computing

Cluster and AWS.

Bioinformatics

RNA-seq and proteomics

Genomics

WGS and WES

a real-time big data pipeline

Spatial gene expression data analysis on Cluster : 10X Genomics, Space Ranger

Posted on June 8, 2022 by Ashok Dinasarapu Analysis Omics Transcriptomics

Running spaceranger as cluster mode that uses Sun Grid Engine (SGE) as queuing. There are 2 steps to analyze Spatial RNA-seq data. Step 1: spaceranger mkfastq demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files. Step 2: spaceranger count takes FASTQ files from spaceranger mkfastq and performs alignment, filtering, barcode counting,

Taxonomic and functional profiling of the microbiome – whole genome shotgun metagenomics

Posted on June 8, 2022 by Ashok Dinasarapu Analysis Microbiome Omics

This workflow consists of taxonomic and functional profiling of shotgun metagenomics sequencing (MGS) reads using MetaPhlAn2 and HUMAnN2, respectively. To perform taxonomic (phyla, genera or species level) profiling of the MGS data, the MetaPhlAn2 pipeline was run on a high performance multicore cluster computing environment. >>>

Building a real-time big data pipeline 9: Spark MLlib, Regression, Python

Posted on December 21, 2021 by Ashok Dinasarapu Bigdata Software

Apache Spark expresses parallelism by three sets of APIs – DataFrames, DataSets and RDDs (Resilient Distributed Dataset).Originally, spark was designed to read and write data from and to Hadoop Distributed File System (HDFS). A Hadoop cluster is composed of a network of master, worker and client nodes that orchestrate and execute the various jobs across

Building a real-time big data pipeline 10: Spark Streaming, Kafka, Java

Posted on January 19, 2021 by Ashok Dinasarapu Bigdata Software

Spark Streaming is an extension of the core Apache Spark platform that enables scalable, high-throughput, fault-tolerant processing of data streams; written in Scala but offers Scala, Java, R and Python APIs to work with. It takes data from the sources like Kafka, Flume, Kinesis, HDFS, S3 or Twitter. This data can be further processed using

Cloud Computing

Genomic data into actionable clinical results

Genomics | Data Science

Computing

Bioinformatics

Genomics

Building

a real-time big data pipeline

Spatial gene expression data analysis on Cluster : 10X Genomics, Space Ranger

Taxonomic and functional profiling of the microbiome – whole genome shotgun metagenomics

Building a real-time big data pipeline 9: Spark MLlib, Regression, Python

Building a real-time big data pipeline 10: Spark Streaming, Kafka, Java

1

Regenerative medicine

2

Rare diseases

3

Bioinformatics

4

Proteomics