
Loading
Differences in how patients experience disease can be explained in great part by their genomic differences. Enabling precision medicine, that is to say, being able to tailor treatment to the personal characteristics of patients, hence requires identifying genomic features associated with disease risk, prognosis or response to treatment. This is often achieved using genome-wide association studies (GWAS), which look for associations between single nucleotide polymorphisms (SNPs) and a phenotype. However, for many complex traits, the SNPs these studies uncover account for little of the known heritable variation. One key explanation for this missing heritability is that few of the established approaches for GWAS account for the joint epistatic effect of multiple SNPs, although several SNPs might act together towards a phenotype, for example by regulating multiple redundant parts of a same pathway. Moreover, GWAS are statistically underpowered, as the number of SNPs investigated is orders of magnitude larger than the sample sizes: only SNPs with a large effect size can be detected. This additionally results in a robustness issue, particularly when using complex models: which SNPs are deemed associated with the phenotype can vary a lot across related datasets. This suggests that current approaches often capture spurious associations rather than truly relevant SNPs. SCAPHE is built on the hypothesis that part of the missing heritability can be discovered by combining GWAS data with established biological knowledge. We surmise that this calls for novel machine learning procedures, which successfully model non-linear interactions between genetic loci and compensate for the lack of statistical power due to relatively small sample sizes by incorporating multiple sources of evidence. More specifically, these include molecular networks and data collected for multiple related phenotypes. SCAPHE propose to develop novel machine learning algorithms for GWAS, cast as a feature selection problem, through three orthogonal research directions: (1) the development of methods for non-additive, multi-locus, network-guided GWAS; (2) the development of biomarker discovery algorithms explicitly designed for robustness, that is to say, to reliably return the same SNPs on overlapping subsets of the same data; and (3) the joint analysis of multiple related phenotypes. These three research directions will be complemented by three transversal tasks, ensuring a focus throughout the project on the control of false discovery rate, high-performance computing, and applicative aspects. To achieve its objectives, SCAPHE will build on a machine learning framework called regularized relevance. This framework formalizes the idea of encouraging the selected loci to be connected on a pre-defined biological network, supposing that SNPs along pathways or in a set of co-expressed genes are more likely to act together towards the phenotype of interest. It also allows for the combination of evidence from multiple data sets pertaining to related phenotypes, and the inclusion of nonlinear interactions between SNPs. SCAPHE will propose new tools that will benefit human geneticists and clinicians by providing novel precision medicine insights, potentially resulting in new diagnostic tools or therapeutic targets. Moreover, the application of feature selection methods for high-dimensional data, far from being restricted to genomic studies, is of broad interest in a variety of domains ranging from medical imaging to quantitative finance and climate science. To facilitate the dissemination of our work, the results of SCAPHE will be published in Open Access peer-reviewed publications, and we will put a strong emphasis both on Open Source code development and on facilitating usability via tutorials and user-friendly interfaces.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::06a7b17455a2206739131595e16de919&type=result"></script>');
-->
</script>