A novel approach to represent and compare RNA secondary structure

Mattei, E

doi:10.58015/mattei-eugenio_phd2014

The major aim of the work presented in this thesis is that of improving structural alignments of RNAs. I approached this issue by exploring new ways to encode structural elements information. This new representation for secondary structural elements set the background for the second part of the work, which consists in exploiting the encoding for RNA structural alignments and comparison. More in detail, in the first part of the work I describe a new approach to represent secondary structure elements (SSE). The idea is to move from the widely used, but scarcely informative, dot-bracket notation for RNA secondary structure, to a new encoding that uses different sets of characters for each RNA secondary structure element (e.g. loop, stem, bulge, internal loop). This simple idea has never been tried on RNA secondary structures despite being used with satisfying results for proteins. The main approaches used to overcome the problem of RNA structural representation rely on different topology description of the structure. For example, the string-based dot-bracket notation is substituted by a tree-based representation where nodes and branches encode for different substructures. With tree-based approaches, the quality of the structural information increases but the computational complexity to perform comparison increases as well. I addressed the problem by developing BEAR (Brand nEw Alphabet for Rna), a structure-aware encoding with the same structural information content of a tree-based approach but using a string of characters, like the classic dotbracket notation. In the second part of the work I tested the potentiality of BEAR and, in particular, I showed how using this encoding it is possible to detect regularities in the pattern of substitution rates between BEAR encoded structure elements of homologous RNAs stored in Rfam. By doing so, I showed that a substitution matrix that captures transition rates between SSEs (loop, stem, bulge, internal loop) can be computed. The MBR (Matrix of BEAR-encoded RNA secondary structures) represents tolerated changes in SSEs in related RNAs and it is the first matrix of this kind for RNA structural elements. Then, I tested the approach analysing the contribution of the MBR matrix in calculating RNA secondary structure alignments using a simple variant of the Needleman-Wunsch algorithm, and obtained on different datasets results comparable to those obtained by other state-of-the art methods (listed in the Materials and Methods section) which are computationally more complex. Finally, in the last part of the work, I presented AMBeR, a new aligner tool that exploit BEAR and MBR to perform fast and accurate local and global structural alignments. AMBeR has the lowest computational complexity and running time compared to other state-of-the-art methods, and more importantly, show comparable or higher performances than the stateof-the-art methods.

Mattei, E. (2014). A novel approach to represent and compare RNA secondary structure [10.58015/mattei-eugenio_phd2014].