HiPPO and Panda: two bioinformatics tools to support analysis of high-dimensional mass cytometry data

Pirro', S

doi:10.58015/pirro-stefano_phd2016

Biological processes are often modulated by the interaction of different cell types, in a complex network of relations and dependencies. For this reason, biological research aims to both increase the number of cellular features that can be surveyed simultaneously and the resolution at which such observations are possible. High-dimensional mass cytometry is particularly well suited to tracking cells in complex tissues because more than 40 parameters can be monitored at the same time, on hundreds of thousands of cells per sample. Several computational approaches have been proposed to reduce the multidimensionality of the datasets produced by this technology and to cluster events by their multi-dimensional similarity (i.e. SPADE and viSNE). In order to overcome some limitations of the available toolboxes, I developed two new bioinformatics tools named HiPPO (http://moleculargenetics.uniroma2.it/hippo) and PANDA(http://moleculargenetics.uniroma2.it/panda). HiPPO (High-throughput Population Profiler) takes advantage of a supervised quantitation approach to discretize the expression distribution curves generated for each intracellular and surface protein monitored in the experiments. Cells in the continuous, multidimensional dataset are converted into a bi-dimensional matrix where row and columns are events (cells) and markers, respectively. For characterizing cell populations, HiPPO queries PANDA, a manually- curated database which stores expression profiles for selected markers of primary cells. Comparison between PANDA discrete expression profiles with those identified in the populations under study allows to monitor cell type abundance. Moreover, given a set of experiments in different conditions, HiPPO uses the KolmogorovSmirnov non-parametric test to evaluate the variation of protein expression levels, for any identified population. The analysis is conducted interactively, through a user-friendly web application. The robustness and reliability of HiPPO has been tested on a couple of experimental datasets. In the first case, human healthy bone marrow samples (Bendall et al., 2011) have been analyzed and the results compared with SPADE (Qiu et al., 2011), viSNE (Amir et al., 2013) and manual gating performed by the authors. In the second test, I took advantage of the expertise on CyTOF technology in our laboratory and I have analyzed mass cytometry data of skeletal muscle mononuclear cells from healthy and dystrophic (mdx) mice, in order to quantify fibro-adipogenic progenitors (FAPs) and determine changes in population abundance in different conditions. In both cases the HiPPO accuracy in the identification of known cells is higher than that of SPADE and viSNE, when compared to manual gating. Differently from the tools that are currently available HiPPO also offers the capability of matching the antigenic profile of the “quasi-homogeneous” populations that are identified by the cell clustering procedure to profiles of cell populations that have been characterized and described in the literature. For this task Hippo takes advantage of PANDA, a second resource that I have developed during my PhD. PANDA (Population Analysis Database) is the first manually-curated database which aims at capturing the expression profiles of selected markers in primary cells by integrating multiple layers of information in a user-friendly web portal. The curation process is conducted by experts that retrieve and interpret expression data from the literature. At the time of writing, PANDA mainly focuses the curation effort on the immune system and on the cell populations participating in skeletal muscle regeneration. Panda annotates 32 different cell types in the H. sapiens and M. musculus organisms but aims at increasing the amount of curated data, extending curations to other tissues, organs and organisms.

Pirro', S. (2016). HiPPO and Panda: two bioinformatics tools to support analysis of high-dimensional mass cytometry data [10.58015/pirro-stefano_phd2016].