Abstract [eng] |
During the past decades, due to the fundamental discoveries in the field of molecular biology, it has become a central subject of biology sciences. Previous focus of attention has been shifted from the identification of one specific gene to greater opportunities that have become possible by the sequencing of complete genomes. That, in turn, opened the door to the technologies of the so-called post-genomic era. They are often based on the computer analysis of the entire genome, i.e. on bioinformatics. The numbers of nucleotides and amino acids in such databases of nucleotide sequences as GenBank, DDBJ or EMBL have been continuously increasing and have become enormous. With such extensive and continuously supplemented data amounts available, recognition of biological signals in an individual nucleotide sequences or the whole DNA, as well as their determination of their function have become a complicated task and a relevant problem of bioinformatics. Until now, any randomised sequence of nucleotides or amino acids was considered to be a noncoding nucleotide or amino acid sequence. The work offers and substantiates the opinion that prior good knowing of biological-“genetic noise” is necessary to detect a biological signal in DNA sequences. There occurs a need for definition and accurate formulation of the notion of “genetic noise”. The statistical analysis carried out in the work reveals that the major part of even non-coding nucleotide sequences are not of the first order Markov chain, which is serious grounds for having doubts about the available models of nucleotide sequences, assumptions of their existence and adequacy of their application. This means that, for example, a comparison of real sequences with ones generated according to such models is not a reliable tool in the search either a biological signal or a biological function of a specific nucleotide (or amino acid) sequence. The same holds regarding the accuracy of phylogenetic trees reconstructed by means of these models. As an alternative for the existing models, a mathematical definition of noncoding nucleotide sequence or, in other words, of “genetic noise”, has been formulated and its model has been proposed. The theory of discrete Markov fields is used to define a noncoding nucleotide sequence and to formulate its properties. The model of the noncoding sequence is verified by computer simulation of nucleotide sequence evolution. To analyse DNA sequence structure and nucleotide dependence, correlation and R/S (rescaled range) analysis are used. To verify Markovity of a nucleotide sequence, loglinear and generalised logit models are applied and appropriate hypotheses are verified on their basis. |