Copy number variation discovery workflows using NGS data
Copy number variations (CNVs) represent gain or loss of genomic regions. CNVs transmit from parents to offspring or arise de novo and play important role in neuro-psychiatric disorders and cancers. >>>
read more
Quality control for GWAS studies
An important step in the analysis of genome-wide association studies (GWAS) is to identify problematic subjects and markers. Quality control (QC) in GWAS removes markers and individuals, and greatly increases the accuracy of findings. >>>
read more
eQTL analysis of RNA-seq data
Genetic locus that affects gene expression is often referred to as expression quantitative trait locus (eQTL). eQTL mapping studies assesses the association of SNPs with genome-wide expression levels. >>>
read more
Genomic variants from RNA-seq data
RNA-Seq allows the detection and quantification of known and rare RNA transcripts within a sample. In addition to differential expression and detection of novel transcripts, RNA-seq also supports the detection of genomic variation in expressed regions. >>>
read more
RNA-Seq : raw reads to differential expression
A simple RNA-Seq differential expression analysis using High Performance Computing (HPC). >>>
read more
A multi-tiered RNA-seq analyses approach to clinical diagnosis of a genetic disease
A novel three-tiered approach of targeted RNA-seq analysis for molecular diagnosis of a genetic disease (ex. neuromuscular disease; NMD) was proposed. Analysis will be stopped if molecular diagnosis is achieved in any of the Tiers and the results will be clinically correlated to reclassify variants of uncertain significance (VUSs), identify pathogenic events at the mRNA
read more
ATAC-seq peak calling with MACS2
ATAC-seq (Assay for Transposase Accessible Chromatin with high-throughput Sequencing) is a next-generation sequencing approach for the analysis of open chromatin regions to assess the genome-wise chromatin accessibility. >>>
read more
Quantitative proteomics: label-free quantitation of proteins
Liquid chromatography (LC) coupled with mass spectrometry (MS) has been widely used for protein expression quantification. Protein quantification by tandem-MS (MS/MS) uses integrated peak intensity from the parent-ion mass (MS1) or features from fragment-ions (MS2). Label-free quantification (LFQ) may be based on precursor ion intensity (peak areas or peak heights) or on spectral counting. Here, the
read more
Quantitative proteomics : TMT-based quantitation of proteins
Quantification of proteins using isobaric labeling (tandem mass tag or TMT) starts with the reduction of disulfide bonds in proteins with Dithiothreitol (DTT). Alkylation with iodoacetamide (IAA) after cystine reduction results in the covalent addition of a carbamidomethyl group that prevents the formation of disulfide bonds. Then, overnight digestion of the proteins using trypsin or trypsin/LyC
read more
Annotation of genetic variants
Tools such as ANNOVAR, Variant Effect Predictor (VEP) or SnpEff annotate genetic variants (SNPs, INDELS, CNVs etc) present in VCF file. These tools integrate the annotations within the INFO column of the original VCF file. >>>
read more
Single cell gene expression data analysis on Cluster : 10X Genomics, Cell Ranger
Cell Ranger can be run in cluster mode, using job schedulers like Sun Grid Engine (or simply SGE) or Load Sharing Facility (or simply LSF) as queuing system allows highly parallelizable jobs. >>>
read more
Taxonomic and diversity profiling of the microbiome : 16S rRNA gene amplicon sequence data
The 16S ribosomal RNA (rRNA) gene of Bacteria codes for the RNA component of the 30S subunit. Different bacterial species have one to multiple copies of the 16S rRNA gene, and each with 9 hypervariable regions, V1-V9. High-throughput sequencing of 16S rRNA gene (a “marker gene”) amplicons has become a widely used method to study
read more
Taxonomic and functional profiling of the microbiome – whole genome shotgun metagenomics
This workflow consists of taxonomic and functional profiling of shotgun metagenomics sequencing (MGS) reads using MetaPhlAn2 and HUMAnN2, respectively. To perform taxonomic (phyla, genera or species level) profiling of the MGS data, the MetaPhlAn2 pipeline was run on a high performance multicore cluster computing environment. >>>
read more
Spatial gene expression data analysis on Cluster : 10X Genomics, Space Ranger
Running spaceranger as cluster mode that uses Sun Grid Engine (SGE) as queuing. There are 2 steps to analyze Spatial RNA-seq data. Step 1: spaceranger mkfastq demultiplexes raw base call (BCL) files generated by Illumina sequencers into FASTQ files. Step 2: spaceranger count takes FASTQ files from spaceranger mkfastq and performs alignment, filtering, barcode counting,
read more