Imran S. Haque, PhD
Contact
Email: ish AT ihaque.org
Twitter: ImranSHaque
Mastodon: @ihaque@genomic.social
GitHub: ihaque
LinkedIn: imranshaque
Bio
By training, I’m an engineer and computer scientist, focusing on machine learning and fast, scalable code. By interest, I’m a computational biologist, with interests in (epi)genetics, structural biology and biophysics, computational chemistry, and drug discovery.
Since 2019, I have worked at Recursion leading the data science department (as VP, Data Science and since 2023, SVP, AI and Digital Sciences); we are an interdisciplinary group of data scientists, machine learning scientists, computational biologists, and computational chemists working to transform drug discovery through the combination of high-throughput experiment and large-scale computational modeling. Prior to Recursion, I served as Chief Scientific Officer at Freenome from December 2016 to October 2018, where I supervised Freenome’s R&D (ranging from molecular biology and assay development through computational biology and machine learning) to develop blood-based early cancer diagnostics. Prior to Freenome, from 2011 through 2016 I was VP, Scientific Affairs, Director, Research, and Senior Software Engineer at Counsyl (acquired by Myriad Genetics in summer 2018; now Myriad Women’s Health), where I managed research in early technology development, clinical development, scientific publications, and wetlab-focused software engineering and computational assay development.
I completed my Ph.D in Computer Science from Stanford University in June 2011, co-advised by Vijay Pande and Daphne Koller. My research focused on methods for large-scale machine learning for drug discovery (thesis: Accelerating Chemical Similarity Search Using GPUs and Metric Embeddings), and I did additional non-thesis work in protein folding, high-performance computing, and computer architecture. During my PhD, I interned at Vertex Pharmaceuticals with Brian Goldman and Pat Walters, developing methods for computational drug discovery. In 2006, I graduated from UC Berkeley with highest honors with my B.S. in Electrical Engineering and Computer Science (Go Bears!). I did undergraduate research with Profs. Kathy Yelick, Bora Nikolic, and John Wawrzynek. I was also a member and officer for several semesters at the Berkeley Mu Chapter of HKN (Eta Kappa Nu). Even further back, I graduated from Bellarmine College Preparatory in San Jose (Go Bells!). I doubt any high school students will care to read this page, but if you do, I strongly encourage you to do (as I did), speech and debate. Without a doubt, the skills I gained there have been extremely useful to me.
Publications and Preprints
2024
- High-resolution genome-wide mapping of chromosome-arm-scale truncations induced by CRISPR-Cas9 editing
Lazar NH, Celik S, Chen L, …, Haque IS. Nat Genet 1—12. (2024).
2023
- RxRx3: Phenomics Map of Biology
Fay MM, Kraus O, Victors M, …, Haque IS, Mabey B. bioRxiv 2023.02.07.527350. (2023). - RxRx1: A dataset for evaluating experimental batch correction methods
Sypetkowski M, Rezanejad M, Saberian S, …, Haque I, Earnshaw B. In proceedings of CVPR 2023: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVMI workshop). 4284—4293. (2023).
2022
- Biological Cartography: Building and Benchmarking Representations of Life
Celik S, Huetter J, Melo-Carlos S, …, Haque I. bioRxiv 2022.12.09.519400. (2022).
2021
- WILDS: A Benchmark of in-the-Wild Distribution Shifts
Koh PW, Sagawa S, Marklund H, …, Haque IS, Beery S, Leskovec J, Kundaje A, Pierson E, Levine S, Finn C, Liang P. Proc 38th Intl Conf Mach Learning (ICML) 139:5637—5664. (2021). - Enhanced DNA libraries for methylation analysis (News and Views)
Haque IS. Nat Biomed Eng 5:490—492. (2021).
2020
- Functional immune mapping with deep-learning enabled phenomics applied to immunomodulatory and COVID-19 drug discovery
Cuccarese MF, Earnshaw BA, Heiser K, …, Haque IS, Chong YT, Gibson CC. bioRxiv 2020.08.02.233064v2. (2020). - Identification of potential treatments for COVID-19 through artificial intelligence-enabled phenomic analysis of human cells infected with SARS-CoV-2
Heiser K, McLean PF, Davis CT, …, Haque IS, Low AS, Gibson CC. bioRxiv 2020.04.21.054387v1. (2020). - Genetic ancestry analysis on >93,000 individuals undergoing expanded carrier screening reveals limitations of ethnicity-based medical guidelines
Kaseniit KE, Haque IS, Goldberg JD, Shulman LP, Muzzey D. Genet Med 22:1694—1702. (2020).
2019
- Screening for Tay-Sachs disease carriers by full-exon sequencing with novel variant interpretation outperforms enzyme testing in a pan-ethnic cohort
Cecchi A, Vengoechea ES, Kaseniit KE, …, Haque IS, Moyer K, Page PZ, Muzzey D, Grinzaid KA. Molec Genet Genom Med 7:e836. (2019). - Spatial co-fragmentation pattern of cell-free DNA recapitulates in vivo chromatin organization and identifies tissues-of-origin
Liu Y, Liu T, Weinberg DE, …, Haque IS. bioRxiv 564773. (2019). - Current and future perspectives of liquid biopsies in genomics-driven oncology
Heitzer E, Haque IS, Roberts CE, Speicher MR. Nat Rev Genet 20:71—88. (2019). - Machine learning enables detection of early-stage cancer by whole-genome sequencing of plasma cell-free DNA
Wan N, Weinberg D, Liu T, …, Haque IS. BMC Cancer 19:832. (2019).
2018
- METCC: METric learning for Confounder Control: Making distance matter in high-dimensional biological analysis
Manghnani K, Drake A, Wan N, Haque IS. ML4H workshop at NeurIPS 2018 arXiv cs.LG:1812.03188. (2018). - Validation of an Expanded Carrier Screen that Optimizes Sensitivity via Full-Exon Sequencing and Panel-wide Copy Number Variant Identification
Hogan GJ, Vysotskaia VS, Beauchamp KA, …, Haque IS, Mar-Heyming R, Kang HP, Muzzey D. Clin Chem 64(7):1063—1073. (2018). - Systematic Design and Comparison of Expanded Carrier Screening Panels
Beauchamp KA, Muzzey D, Wong KK, …, Haque IS. Genet Med 20(1):55—63. (2018). - Clinical Utility of Expanded Carrier Screening: Reproductive Behaviors of At-Risk Couples
Ghiossi C, Goldberg JD, Haque IS, Lazarin GA, Wong KK. J Genet Counsel 27:616—625. (2018).
2017
- Challenges in Using ctDNA to Achieve Early Detection of Cancer
Haque IS, Elemento O. bioRxiv 237578. (2017). - Noninvasive Prenatal Screening at Low Fetal Fraction: Comparing Whole-Genome Sequencing and Single-Nucleotide Polymorphism Methods
Artieri CG, Haverty C, Evans EA, …, Haque IS, Yaron Y, Muzzey D. Prenat Diagn 37(5):482—490. (2017). - Development and validation of a 36-gene sequencing assay for hereditary cancer risk assessment
Vysotskaia VS, Hogan GJ, Gould GM, …, Haque IS. PeerJ 5:e3046. (2017). - The population genetics of human disease: the case of recessive, lethal mutations
Amorim CE, Gao Z, Baker Z, …, Haque IS, Pickrell J, Przeworski M. PLoS Genetics 13(9):e1006915. (2017). - Smith-Lemli-Opitz syndrome carrier frequency and estimates of in utero mortality rates
Lazarin GA, Haque IS, Evans EA, Goldberg JD. Prenat Diagn 37(4):350—355. (2017).
2016
- Modeled Fetal Risk of Genetic Diseases Identified by Expanded Carrier Screening
Haque IS, Lazarin GA, Kang HP, Evans EA, Goldberg JD, Wapner RJ. JAMA 316(7):734—742. (2016). - Design and validation of a next generation sequencing assay for hereditary \textit{BRCA1} and \textit{BRCA2} mutation testing
Kang HP, Maguire JR, Chu CS, Haque IS, Lai H, Mar-Heyming R, Ready K, Vysotskaia VS, Evans EA. PeerJ 4:e2162. (2016). - Tay-Sachs Carrier Screening by Enzyme and Molecular Analyses in the New York City Minority Population
Mehta N, Lazarin GA, Spiegel E, …, Haque IS, Wapner R. Genet Test Molec Biomarker 20(9):504—509. (2016). - Group Testing Approach for Trinucleotide Repeat Expansion Disorder Screening
Kaseniit KE, Theilmann MR, Robertson A, …, Haque IS. Clin Chem 62(10):1401—1408. (2016). - Expanded carrier screening: A review of early implementation and literature
Lazarin GA, Haque IS. Semin Perinatol 40(1):29—34. (2016).
2015
- SSCM: A method to analyze and predict the pathogenicity of sequence variants
Vikram S, Rasmussen MD, Evans EA, Haque IS. bioRxiv 021527. (2015).
2014
- Systematic Classification of Disease Severity for Evaluation of Expanded Carrier Screening Panels
Lazarin GA, Hawthorne F, Collins NS, …, Haque IS. PLoS One e114391. (2014). - A Fast 3 x N Matrix Multiply Routine for Calculation of Protein RMSD
Haque IS, Beauchamp KA, Pande VS. bioRxiv 008631. (2014).
2013
- SCISSORS: Practical Considerations
Kearnes SM, Haque IS, Pande VS. J Chem Inf Model 54(1):5—15. (2013). - An empirical estimate of carrier frequencies for 400+ causal Mendelian variants: results from an ethnically diverse clinical sample of 23,453 individuals
Lazarin GA, Haque IS, Nazareth S, Iori K, Patterson AS, Jacobson JL, Marshall JR, Seltzer WK, Patrizio P, Evans EA, et al.. Genet Med 15(3):178—186. (2013).
2011
- Knowledge and attitudes regarding expanded genetic carrier screening among women’s healthcare providers
Ready K, Haque IS, Srinivasan BS, Marshall JR. Fertil Steril 407—413. (2011). - Anatomy of high-performance 2D similarity calculations
Haque IS, Pande VS, Walters WP. J Chem Inf Model 51(9):2345—2351. (2011). - Error bounds on the SCISSORS approximation method
Haque IS, Pande VS. J Chem Inf Model 51(9):2248—2253. (2011). - Large-Scale Chemical Informatics on GPUs
Haque IS, Pande VS. In GPU Computing Gems: Emerald Edition (Ed: W.-M. W. Hwu). (2011). - MSMBuilder2: Modeling Conformational Dynamics at the Picosecond to Millisecond Scale
Beauchamp KA, Bowman GR, Lane TJ, …, Haque IS, Pande VS. J Chem Theor Comput 3412—3419. (2011). - Copernicus: A new paradigm for parallel adaptive molecular dynamics
Pronk S, Larsson P, Pouya I, …, Haque IS, Beauchamp K, Hess B, Pande VS, Kasson PM, Lindahl E. In proceedings of SC11: 2011 Intl Conf High Perf Comput, Network, Storage and Analysis. 60. (2011).
2010
- Hard data on soft errors: A large-scale assessment of real-world error rates in GPGPU
Haque IS, Pande VS. In proceedings of CCGrid 2010: 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. 691—696. (2010). - SIML: a fast SIMD algorithm for calculating LINGO chemical similarities on GPUs and CPUs
Haque IS, Pande VS, Walters WP. J Chem Inf Model 50(4):560—564. (2010). - SCISSORS: a linear-algebraical technique to rapidly approximate chemical similarities
Haque IS, Pande VS. J Chem Inf Model 50(6):1075—1088. (2010). - PAPER —- accelerating parallel evaluations of ROCS
Haque IS, Pande VS. J Comput Chem 31(1):117—132. (2010). - Current status of the AMOEBA polarizable force field
Ponder JW, Wu C, Ren P, …, Haque I, Mobley DL, Lambrecht DS, DiStasio Jr RA, et al.. J Phys Chem B 114(8):2549—2564. (2010).
2006
- Absence of reptation in the high-temperature folding of the trpzip2 $\beta$-hairpin peptide
Pitera JW, Haque I, Swope WC. J Chem Phys 124:141102. (2006).
Selected Invited Talks
- Decoding Biology with Data Abundance.
Broad Institute Machine Learning in Drug Discovery Symposium, Oct 2023. Slides. Video. - Thanks, I hate it little less! An update from the world of machine learning.
CUP XXII, Mar 2023. Slides. - Mapping and navigating biology and chemistry with genome-scale imaging.
The Royal Society - Machine Learning and AI in Biological Science, Drug Discovery, and Medicine, Mar 2023. Slides. Video. - Biological Cartography: Building and Benchmarking Representations of Life.
Learning Meaningful Representations of Life (LMRL) @ NeurIPS 2022, Dec 2022. Slides. - Mapping Biology With a Unified Representation Space for Genomic and Chemical Perturbations to Enable Accelerated Drug Discovery.
Learning Meaningful Representations of Life (LMRL) @ NeurIPS 2021, Dec 2021. Slides. Video. - Zero to potential COVID-19 treatments in under 4 weeks with deep-learning driven drug screens.
GPU Technology Conference (GTC), Apr 2021. Slides. - Applying AI to Accelerate Assay Development to Pandemic Speed.
SLAS Transformed, Jun 2020. Slides. - We Are Legion: Statistics and Generalization from Cells to Populations.
(Keynote Presentation) Cancer Research UK 3rd Int’l Symp. on Oesophageal Cancer, Apr 2019. Slides.Blog Post - (How to fix) the very reasonable ineffectiveness of machine learning in biomarker discovery
Molecular Medicine Tri-Con, Mar 2019. - Thanks, I Hate It: Why your biological machine learning model probably won’t work \& what to do about it.
OpenEye CUP XIX, Mar 2019. Slides.Blog Post - Making hay of needles: Connecting clinical and physical parameters in the search for early cancer.
AACR Special Conf. on Convergence: AI, Big Data, and Prediction in Cancer, Oct 2018. Slides.Blog Post - The Reasonable Ineffectiveness of Biological Data.
Early Detection of Cancer, Oct 2018. Slides. - (What to do) when gradient descent digs too deep, and too greedily.
DeepChem User Group Meeting, Jul 2018. Slides. - Embracing heterogeneity: The freenome, information, and early disease detection.
(Keynote Presentation) Cancer Crosslinks, Oct 2017. Slides. - 1 in 550: Using 346,790 expanded carrier screens to estimate the risk of Mendelian conditions.
Society for Maternal-Fetal Medicine Annual Meeting, Jan 2017. Slides. - Overcoming artificial selection to achieve the promise of inherited cancer screening.
AGBT Precision Health Meeting 2016, Aug 2016. Slides. - “Rare” disease is common: results from 388,994 expanded carrier screens of up to 108 genes
American Society for Reproductive Medicine Annual Meeting, Sep 2015. - Can/Do: the disconnect between what we can do and what we do in perinatal precision medicine.
Stanford ChildX 2015, Apr 2015. Video. - Beyond the dict: Python tools for data wrangling.
PyData 2013, Nov 2013. Slides. - Folding@Everywhere: Computational Biochemistry in the New Era of HPC.
(Keynote Presentation) Hyperience: 5th National Dutch Informatics Congress, Nov 2010. Slides. - Hard Data on Soft Errors: A Global-Scale Assessment of GPGPU Memory Soft Error Rates.
Resilience Workshop @ CCGrid 2010, May 2010. Slides. - LINGOs and GPUs.
OpenEye CUP XI, Mar 2010. Slides. - “Rare” disease is common: results from 388,994 expanded carrier screens of up to 108 genes
Festival of Genomics 2015, Jun 2015. - Mapping the Undiscovered Country —- 100,000 Pan-Ethnic Clinical Screens: New tools to detect genetic disease
Association of Clinical Scientists Annual Meeting, May 2013. - Sequencing makes Moore’s Law Look Slow: Lessons from Counsyl’s first 100,000 Medical Genomes
UC San Diego Center for Networked Systems, Jan 2013. - Hybrid Vigor: Using Heterogeneous HPC to Accelerate Chemical Biology.
Workshop on Bio-Molecular Simulations on Future Computing Architectures, Sep 2010. Slides. - Of Jacquard looms and Jaccard coefficients: multithreading biomolecular simulations in a GPU world.
NSF-NAIS Workshop on Intelligent Software, Oct 2009. Slides.
Selected Posters
2024
- Phenomics-enabled discovery and optimization of small-molecule RBM39 degraders as an alternative to CDK12 targeting in high-grade serous ovarian cancer (HGSOC)
Neumann C, Shankaran H, Nadella K, …, Haque I, Donnella H, Cuccarese M, Evangelista M. Poster at Am Assoc Cancer Res (AACR) 2024.
2023
- A Phenomics Platform Combining Imaging and Artificial Intelligence for Rapid Validation and Advancement of Novel Oncology Targets
Rudnick J, Nadella K, Rajan M, …, Haque I, Donnella H, Cuccarese M, Evangelista M. Poster at Am Assoc Cancer Res (AACR) 2023.
2022
- Identification and optimization of novel small molecule modulators of immune checkpoint resistance with a unified representation space for genomic and chemical perturbations
Bhandari A, Cuccarese MF, Fales K, …, Haque I, Alfa R, Rinaldi J. Poster at Am Assoc Cancer Res (AACR) 2022.
2018
- Early-stage colorectal cancer detection using artificial intelligence and whole-genome sequencing of cell-free DNA
Niehaus K, Wan N, Weinberg D, …, Haque IS, Putcha G. Poster at Am Coll Gastroenterol (ACG) 2018. - Multi-analyte profiling reveals relationships among circulating biomarkers in colorectal cancer
Delubac D, Ariazi E, Berliner J, …, Haque IS. Poster at Am Assoc Cancer Res (AACR) 2018.
2017
- Copy number variant calling on a 176-disease expanded carrier screening panel including DMD
Beauchamp KA, Grauman P, Hogan GJ, …, Haque IS, Muzzey D. Poster at Intl Soc Prenat Diagn (ISPD) 2017. Top Five Poster. - Duplication tag SNP g.27134T>G should not be considered diagnostic of SMA carrier status
Davison D, Kaseniit KE, Haque IS. Poster at Am Coll Med Genet (ACMG) 2017. Top Rated Abstract. - ClinVar submitter list leaderboard obscures extensive variation and bias in submission types
Kaseniit KE, Karczewski K, Haque IS. Poster at Am Coll Med Genet (ACMG) 2017. Top Rated Abstract. - Copy number variant calling on a 176 condition expanded carrier screening panel including DMD
Beauchamp KA, Grauman P, Hogan GJ, …, Haque IS, Muzzey D. Poster at Am Coll Med Genet (ACMG) 2017. Top Rated Abstract.
2016
- Computing confidence intervals on positive predictive value for non-invasive prenatal screening
Haque IS, Haverty C, Goldberg JD, Evans EA. Poster at Am Soc Hum Genet (ASHG) 2016. Top 10\% Abstract. - Pre-test genetic counseling as a requirement for germline hereditary cancer testing: what do patients do?
Lazarin GA, Sedgwick K, Doyle D, Haque IS, Ready K. Poster at Am Soc Hum Genet (ASHG) 2016. Top 10\% Abstract. - Clinical utility of expanded carrier screening: reproductive behaviors of at-risk couples
Ghiossi C, Ready K, Lieber C, …, Haque IS, Wong KK. Poster at Am Coll Med Genet (ACMG) 2016. Top Rated Abstract. - Putting guidelines into action: accurate computation of individualized positive predictive value for aneuploidy screening in cell-free DNA
Lo C, Evans EA, Schmitt C, …, Haque IS, Goldberg JD. Poster at Am Coll Med Genet (ACMG) 2016. Top Rated Abstract.
2015
- Carrier screening of 346,790 individuals reveals greater risk of severe recessive disease than of Down syndrome or NTDs
Haque IS, Lazarin GA, Raia M, Bellerose H, Muzzey D, D’Auria K, Kang HP, Evans EA, Goldberg JD. Poster at Am Soc Hum Genet (ASHG) 2015. Reviewer’s Choice Abstract. - Expanded carrier screening of 322,484 individuals: the case for going beyond CF
Haque IS, Lazarin GA, Raia M, Bellerose H, Evans EA, Goldberg JD. Poster at Eur Soc Hum Genet (ESHG) 2015. Best Poster Award candidate.