Next: First algorithm Up: RNA secondary structure analysis Previous: Contents

Introduction

Determining RNA and protein shapes has gained considerable importance in the last decade, as biologists became more involved in post-genomic issues; indeed, knowing the shape of these molecules is essential to understand their role within a cell. For instance, it is known that RNA molecules have coding and non-coding parts (called exons and introns); it is possible that these different components are distinguishable according to shape properties.

An RNA molecule is a single-stranded chain of nucleotides (which are designated by A, C, G or U). When left in its environment, this molecule will fold itself into its secondary structure, by creating base-pairs (an A with a U, a C with a G, or even a U with a G). Finding this secondary structure is the objective of the present work. A knot is defined by the presence of two base-pairs { r_i,r_j} and { r_k,r_l}, where i<k<j<l and where $r_{1}r_{2}\ldots r_{n}$ is the considered RNA molecule. We assume that no knots are created. With this assumption the secondary structure of the RNA molecule is planar.

A first solution to determine RNA secondary structures is to list all the solutions, and select the best one ; however, the number of configurations may be exponential with the number of nucleotides; for instance, a sequence of only 200 nucleotides may have over 10⁵⁰ different secondary structures (see [1]). So, this very simple way of proceeding is only suitable for short RNA molecules. There are of course much better approaches for solving the problem. The folding of RNA molecules obeys particular energy rules; these rules may be represented by using difference equations and dynamic programming (as is done here) or by constructing suitable context-free grammars; the latter method is not considered in this work. The accuracy of the final result mirrors the accuracy with which the energy rules are modelized. Of course, increasing the precision of these rules decreases the efficiency of the algorithm (in terms of complexity). Two different models are dealt with, as well as their parallel counterparts.

A novel part of the proposed approach is an attempt to provide the user of the program with the possibility of specifying parts of the secondary structure of a given RNA string. That specification is done using parentheses, brackets or curly braces that indicate the bindings of paired nucleotides.

Next: First algorithm Up: RNA secondary structure analysis Previous: Contents

Nicolas DeRobert
2000-06-29