Skip to main content

CADA: phenotype-driven gene prioritization based on a case-enriched knowledge graph.

NAR genomics and bioinformatics

Authors: Chengyao Peng, Simon Dieck, Alexander Schmid, Ashar Ahmad, Alexej Knaus, Maren Wenzel, Laura Mehnert, Birgit Zirn, Tobias Haack, Stephan Ossowski, Matias Wagner, Theresa Brunet, Nadja Ehmke, Magdalena Danyel, Stanislav Rosnev, Tom Kamphans, Guy Nadav, Nicole Fleischer, Holger Fröhlich, Peter Krawitz

Many rare syndromes can be well described and delineated from other disorders by a combination of characteristic symptoms. These phenotypic features are best documented with terms of the Human Phenotype Ontology (HPO), which are increasingly used in electronic health records (EHRs), too. Many algorithms that perform HPO-based gene prioritization have also been developed; however, the performance of many such tools suffers from an over-representation of atypical cases in the medical literature. This is certainly the case if the algorithm cannot handle features that occur with reduced frequency in a disorder. With Cada, we built a knowledge graph based on both case annotations and disorder annotations. Using network representation learning, we achieve gene prioritization by link prediction. Our results suggest that Cada exhibits superior performance particularly for patients that present with the pathognomonic findings of a disease. Additionally, information about the frequency of occurrence of a feature can readily be incorporated, when available. Crucial in the design of our approach is the use of the growing amount of phenotype-genotype information that diagnostic labs deposit in databases such as ClinVar. By this means, Cada is an ideal reference tool for differential diagnostics in rare disorders that can also be updated regularly.

© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.

PMID: 34514393

Participating cluster members