Budget: 245 000,00 $

Start date: 19 March 2018 End date: 18 September 2020



For this project, we have implemented several machine learning and deep learning methods to analyze data from a variety of omics technologies, including genomics, epigenomics, transcriptomics, proteomics and metabolomics. We explored several representations of the genetic data based on the encoding of whole sequences (RecDL), genetic variants (Diet Network), and genetic ontology (DeepSimDef). We conducted metabolomics studies of heart disease using machine learning approaches to investigate the impact of myocardial infarction on the patients’ metabolome. This revealed a differential fatty acid signature depending on the drug using unsupervised learning methods. We also identified lignoceric acid, potentially important in heart failure, using the XGboost method, a metabolite currently undergoing biological validation. Finally, we evaluated the generalizability of our approaches, including the Diet Network’s approach to predicting ethnicity based on genomic data. We demonstrated that the approach could help us to make accurate predictions on independent datasets with different sets of genetic markers and different levels of missing data, which are ubiquitous in omics data. Our work has also revealed the importance of biological interpretation in prediction, an aspect on which our future work will focus.


Lead Genome Centre : Génome Québec

Partner : IVADO


Simon Gravel Université McGill
Yoshua Bengio Université de Montréal