A predictive model to infer drug phenotypic similarity by integrating seven different chemical features

Micarelli, E

doi:10.58015/micarelli-elisa_phd2019

Time and costs of the drug development process are constantly increasing. As a consequence the drug-repurposing field is increasingly attracting the interest of researchers and pharmaceutical companies. In my PhD project, I tackled the problem of clustering drugs according to common perturbation of biological systems. In my thesis, I first offer a brief overview of the drug development pipeline to highlight the complexity of the process. I then go on discussing the wealth of data types that describe the properties of chemical compounds that are currently used in therapy focusing on the potential of the resources and repositories that make this information available. However, most of this information, which is either derived from high-throughput experiments or extracted from literature reports, is noisy. I set out to use data integration to include, in a predictive model, as diverse information as possible in order to alleviate the problem of noisy data. Several computational methods, to integrate multiple orthogonal chemical or biological information have been proposed. In the introduction section of my thesis, I have discussed these methods in detail, offering an overview of the “state of the art” in the field. Chemical screenings are an important step of the drug development pipeline. However, in complex assays, covering a large fraction of the chemical space is an intimidating task. Thus, the need for the development of computational methods to restrict the number of chemicals to be tested. The potential of computational applications has proved to be useful in designing biological experiments. My project aims at estimating the likelihood that two apparently unrelated chemicals have similar biological effects. However, the chemical perturbation induced by drug treatment is often based on complex mechanisms, where multiple factors are involved. Although, drugs can be characterized and grouped according to structural properties, their biological effect is not directly inferable from this “straightforward” feature. Abstract 2 In this respect, the question I posed at the beginning of my PhD project was whether hidden patterns of chemical properties exist, allowing to match drugs with similar biological effects. To answer this question, the first part of my work focused on the retrieval and integration of several chemical features and in the exploitation of machine learning methods to estimate how well these features are able to identify pairs of compounds with similar biological effects. In this process, I used as gold standard the drug bioactivity data, as annotated in the PubChem database. In my work, I propose a new approach that integrates different chemical features to produce a model, optimized to infer pairs of compounds that are likely to have a similar impact on the phenotype of biological systems. To this end, I collected and organized different chemical features such as perturbation of gene expression (Lamb, 2007), chemical structure (Kim et al., 2016), KEGG drug information, ATC code, literature co-occurrence and drug bioactivity (Wang et al., 2009). For each feature, I defined a drug similarity score producing drug-similarity matrices. Finally, I explored several machine learning regression methods to integrate in a single predictive model the different drug features. The model with the best performance was used to predict functional similarity of compound pairs, thus producing a drug similarity network. The analysis of the predicted similarity network shows that the chemical features that I retrieved and considered in my model contain specific chemical patterns, which do not occur by chance. Finally, I analyzed the inferred drug-network by comparing it with the results of an experimental screening. The comparison highlighted that drugs with similar experimental phenotypes have a shorter graph distance when compared to random drug sets. Finally, the inferred network was also integrated with KEGG class information and ChEBI, to allow formulating hypotheses about drug bioactivity and drug mechanism of action

Micarelli, E. (2019). A predictive model to infer drug phenotypic similarity by integrating seven different chemical features [10.58015/micarelli-elisa_phd2019].