Bioinformatics • Genomics • AI for Biological Systems

Bioinformatics portfolio across genomics, RNA structure prediction, biomarker analysis, and machine learning.

Bioinformatics graduate with research experience in computational genomics, sequencing-data analysis, RNA biology, and machine learning. I build scalable HPC workflows for WGS, fragmentomics, epigenomics, and AI-driven structure prediction using Python, C++, Bash, and deep-learning frameworks.

Education

Academic background

Master of Science in Bioinformatics

Saint Louis University, USA | May 2026 | GPA: 3.97/4.00

Coursework: Bioinformatics I & II, Genomics, Machine Learning, Deep Learning, Algorithms for BCB, Biochemical Pharmacology.

Bachelor of Engineering in Biomedical Engineering

L.D. College of Engineering, India | June 2023 | GPA: 3.8/4.0

Coursework: Python, C++, Embedded Systems, AI Fundamentals, Circuit Python, Signal Processing, Diagnostic Instrumentation.

Workflow Thinking

How I approach biological data problems

Biological question and dataset definition
QC, preprocessing, alignment, or motif extraction
Feature engineering, templates, embeddings, or statistics
Modeling, cohort comparison, or structural evaluation
Visual reporting, documentation, and reproducible GitHub output
Featured Projects

Selected research and applied bioinformatics work

Projects span cancer genomics, biomarker discovery, RNA structure prediction, tool development, sequencing pipelines, multi-omics analysis, and biological machine learning.

🧬
GitHub

RNAMotifDB

RNA motif database, website, and template-search tool

Built a VFold-style RNA motif database and toolset: motif extraction, sequence grouping, RMSD-based structural clustering, representative-template selection, and a template-search tool for RNA 3D modeling.

Organized 500,000+ motif structures into a searchable, clustered database with reproducible HPC build pipeline.
RNAMotifDB template search: R1116 sequence and secondary structure decomposed into 17 motifs, each matched to best PDB template structures

Template search: R1116 decomposed into 17 motifs, each matched to PDB templates (1txs, 1c2w, 6mtb, ...)

RNA motifsDatabaseTool developmentC++PythonVFold
🧠
GitHub

RNA-Structure-AI

Template-guided RNA structure prediction

Modified OpenFold3 (9 source files) and Boltz-2 (2 source files) to accept RNA templates (capabilities the stock models lack), and built a secondary-structure-driven synthetic-MSA generator. Benchmarked all approaches against unmodified baselines by C1' RMSD.

Synthetic MSA improved best-case RMSD from 22.5 Å → 3.5 Å; reached sub-Ångström (0.9 Å) on favorable RNAs. Failure cases reported openly for honest evaluation.
1E7K RNA structure prediction: baseline 8.98 A versus synthetic MSA 0.94 A, with overlay against reference

1E7K: baseline 8.98 Å vs synthetic MSA 0.94 Å (overlay: reference black, prediction green)

OpenFold3Boltz-2RNA 3DTemplatesSynthetic MSABenchmarking
🩸
GitHub

Fragmentomics Biomarker

Cancer genomics and cfDNA biomarker workflow

Built a two-stage FASTQ-to-BAM WGS pipeline (BWA + SAMtools workers on SLURM/LSF), with fragment-length extraction, 4-mer/5-mer end-motif analysis, multi-tier QC, and cancer-vs-healthy cohort comparison across Streck/EDTA tubes and WGS/HMC assays.

Processed cancer + healthy cohorts through a reproducible HPC pipeline with fragment- and motif-level QC and group-comparison visualizations for biomarker exploration.
Violin plot of WGS short over long fragment ratio, healthy versus cancer cohorts

WGS short/long fragment ratio: cancer cohort shifted higher than healthy

WGScfDNACancer genomicsBWASAMtoolsEnd motifs
Experience Direction

Research experience

WashU (Maher Lab): Fragmentomics & cfDNA: Built FASTQ-to-BAM WGS pipelines on LSF, fragment-length and 4/5-mer end-motif analysis, multi-tier QC, and cancer-vs-healthy cohort comparison across Streck/EDTA tubes and WGS/HMC assays.

SLU (Hou Lab): RNA structure prediction: Modified OpenFold3 and Boltz-2 at the source level for RNA template support, built a synthetic-MSA generator, and benchmarked predictions on a 22-RNA set, reaching sub-Ångström accuracy on favorable cases.

ML project: ESM-2 embeddings + engineered biological features in an ensemble model (ROC-AUC ≈ 0.80), with SHAP-based interpretation of feature contributions.

Technical Skills

Methods, tools, and platforms

Bioinformatics and Genomics

NGS analysisWGSFragmentomics5hmC epigenomicsBulk RNA-seqscRNA-seq analysisAlignment workflowsGenome assemblyGWASTCGASRA

Bioinformatics Tools

BWASAMtools / BAMSTARBowtie2IGVUCSC Genome BrowserBLASTClustal OmegaScanpy

Machine Learning and Deep Learning

GNN message passingDDPM diffusionTransformer encodersRiNALMoESM-2XGBoostScikit-learnSHAP

Programming and Infrastructure

PythonC++RBash / ShellMATLABSQLMySQLLinux / UNIXDockerSLURMLSFNextflowSnakemake

Structural Biology

RNA motif modelingRMSDTM-scoreKabsch alignmentPDB / CIF handlingPyMOLChimera

Visualization and Analysis

MatplotlibSeabornggplot2Jupyter NotebookStatistical summariesCohort visualization
Publication

Peer-reviewed publication

Publication

Smart Stethoscope

IEEE ASIANCON 2023: IoT-enabled smart stethoscope with biomedical signal acquisition, filtering/amplification circuitry, and Arduino-based instrumentation.

Research Interests

Areas I want to keep building in

Computational genomics
Cancer genomics and biomarkers
Sequencing-data analysis
Epigenomics and 5hmC
RNA biology and structure
Transcriptomics and scRNA-seq
Genome regulation
Machine learning for biology
Multi-omics analysis
Bioinformatics tool development
Scalable workflow development
Clinical and translational data analysis
Contact

Open to computational biology, genomics, and bioinformatics opportunities

I am interested in roles involving sequencing-data analysis, computational genomics, RNA biology, cancer genomics, biomarker analysis, bioinformatics tool development, and machine learning for biological systems.