About
During my Ph.D., I immersed myself in advanced scientific research, tackling complex problems with a focus on data-driven discovery. I have developed data analysis pipelines where there was none. I am the first dry (no wet-lab work) graduate of my lab of 30+ years. My work centered on developing novel machine learning models, to analyze intricate datasets and uncover patterns with significant biological implications. The infrastructure and knowledgebase I have built has led to many publications long after my departure from the lab.
In my role as a senior bioniformatician at SickKids I spent countless hours designing predictive algorithms, to achieve high accuracy and efficiency. This hands-on experience deepened my understanding of how to transform raw data into meaningful insights, grounding my approach in rigorous scientific methodology. From building pipelines to diagnose previously undiagnosible diseases to practically eliminating tedious manual work for injury surveilance systems.
Beyond modeling, I’ve built and optimized data pipelines using tools like snakemake, enabling efficient processing of large-scale datasets. My projects often involved SQL-driven data exploration to identify trends and test hypotheses, ensuring robust and reproducible results. One of my key achievements was developing a scalable framework for real-time data analysis, which significantly improved the speed and reliability of insights for processing clinical notes with 99%+ accuracy. This work wasn’t just about crunching numbers—it was about asking the right questions and letting the data guide the answers.
I’ve also led collaborative efforts, working with diverse teams to integrate data science into broader scientific investigations. Whether it was refining algorithms to predict outcomes in complex systems or mentoring peers on statistical techniques, I focused on bridging the gap between technical execution and scientific inquiry. My contributions have consistently aimed at advancing knowledge, from crafting visualizations that make complex findings accessible to publishing research that pushes the boundaries of data science applications. I’m driven by the challenge of using data to solve tough problems and the satisfaction of seeing those solutions make a tangible impact.
Resume
Summary
Experienced researcher with a proven track and publication record. Extensive expertise in data science, omics, high performance computing and experimental design.
Work Experience
Senior Bioinformatician
Hospital for Sick Children, Toronto, ON (2020-Present)
- Leading programming teams for pipeline development for omics data
- Leading analytics teams in omics and clinical data analysis projects
- Creating data infrastructures for large clinical research datasets
- Mentoring junior staff in omics data processing and statistical analysis
- Hosting institution wide and city wide seminar in data analysis from omics to data visualization to statistical models and machine learning
- Leading data analysis efforts in local, national and international collaborations with researchers, clinicians and pharmaceutical companies
- Working in close collaboration with clinicians and scientists to analyze large clinical and genomic datasets
- Published findings in numerous peer reviewed journals
- Performed data analysis in diverse datasets from NLP to omics to computer vision
Bioinformatician
Hospital for Sick Children, Toronto, ON (2017-2020)
- Utilize different programming paradigms to automate data processing, analysis and report generation
- Performing statistical analysis and training machine-learning methods on diverse data sets including but not limited to genomics, survey data, clinical data and computer vision.
- Consulting scientists on grant writing efforts in terms of study and experimental design.
- Creating web applications using shiny to communicate clinical data with physicians to assist them in gaining better understanding of their data
- Published findings in numerous peer reviewed journals
Co-Investigator (Ontario HIV Treatment Network Endgame Grant)
Toronto, ON (2019-Present)
- Lead quantitative data expert, assisted with study design and grant writing process.
- Responsible for statistical analysis of patient reported outcomes survey data, and mixed methods data analysis.
Molecular Data Management Specialist
Indoc Research, Toronto, ON (2017-2018)
- Worked with software developers to generate user interfaces for data visualization
- Managed genomics data (microarray, NGS) for diverse group of scientists
- Generated web applications, R and python packages to help researchers query, download or upload genomics data and perform ETL pipelines in an automated manner
- Consulted during development efforts and provided guidance on data analysis and machine learning workflow development
- Assumed leadership positions on grant applications and provided expertise on genomics and machine learning efforts
Ph.D. Thesis Research
UMASS Medical School Graduate School of Biomedical Sciences, Worcester, MA, USA (2008-2017)
- Led the establishment of data analysis infrastructure and pipelines for 100+ Next Generation Sequencing data
- Performed statistical analyses (parametric, non-parametric population comparisons, regression analysis) of these gene expression profiles and discovered novel insights about cellular regulatory mechanisms
- Developed novel data analysis pipelines for handling large datasets with high speed, high precision and low memory footprint
- Presented findings at local, national and international professional seminars and conferences to a large community of scientists, published findings in peer-reviewed journals
- Mentored several graduate and undergraduate students in various projects and internships from molecular biology to bioinformatics
Statistician
Massachusetts General Hospital Department of Psychiatry, Boston, MA, USA (2015-2019)
- Worked in collaboration with physicians in a multi-site, multi-specialty cohort study to address effects of mindfulness and stress on physician well-being and burnout
- Consulted on study design and implementation
- Analyzed survey data from multi-site and multi-specialty waitlist-controlled study
- Analyzed data from health tracking devices to assess the effect of burnout on behavior
- Published results in peer reviewed journals
- Collaborated with wearable device manufacturers in algorithm testing, data wrangling and analysis
Education
Ph.D. in Bioinfomatics/Molecular Biology (2008-2017) UMASS Medical School Graduate School of Biomedical Sciences
- Thesis title: mRNA Decay Pathways Use Translation Fidelity and Competing Decapping Complexes for Substrate Selection
BSc. in Economics/Molecular Biology (2004-2008) Brown University
- Karen T. Romer Undergraduate Teaching and Research Award (2006-2007)
Skills
Programming
- Python: pandas, Pytorch, Keras, numpy, scipy, scikit-learn, opencv, scikit-image, huggingface, spacy, sqlalchemy
- R: Bioconductor, tidyverse, shiny, data.tables
- Tools: Git, docker, SQL, linux, Snakemake, WDL, bash, LSF, moab, slurm
Data Science
- Regression analysis, statistical testing, MCMC
- Machine learning, SVM, tree-based models
- ANNs for computer vision, NLP
- Expert in experimental/study design
- Experience in handling multi-modal datasets with complex data protocols
Communication
- Effective, engaging presenter
- Extensive experience working with technical and non-technical clients
OMICS
- WGS, WES: SNV and SV calling annotations, statistical testing
- RNA-Seq: Differential expression, Pathway enrichment, Novel splicing event detection
- Single Cell methods: scRNA-Seq data processing and analysis, scATAC-Seq data processing and analysis
- cyTOF: data processing, gating, differential expression/abundance
NLP
- Rule and model based analysis
- Transfer learning using LLMs
- Context aware feature extraction, target detection
Computer Vision
- Processing clinical images and videos
- Segmentation, movement detection, background subtraction
- Object detection
Statistical Modelling
- Regression analysis, generalized linear models, mixed effect models, MCMC
- Statistical testing parametric, non-parametric comparisons
- Time series analysis
- Supervised and unsupervised modelling, clustering
Data Infrastructure
- Experience in using HPC environments, job scheduling systems
- Pipeline development, snakemake, WDL
- Creating and querying SQL databases (SQLite, postgreSQL)
- Version control (git), containers (docker) and linux systems