Title Complexity estimation of genetic sequences using information-theoretic and frequency analysis methods /
Another Title Genetinių sekų sudėtingumo įvertinimas naudojant informacijos teorijos ir dažnių analizės metodus.
Authors Damaševičius, Robertas
DOI 10.15388/Informatica.2010.270
Full Text Download
Is Part of Informatica.. Vilnius : Matematikos ir informatikos institutas. 2010, vol. 21, iss. 1, p. 13-30.. ISSN 0868-4952. eISSN 1822-8844
Keywords [eng] genetic sequence ; DNA analysis ; entropy ; complexity ; frequency analysis ; bioinformatics
Abstract [eng] The genetic information in cells is stored in DNA sequences, represented by a string of four letters, each corresponding to a definite type of nucleotides. Genomic DNA sequences are very abundant in periodic patterns, which play important biological roles. The complexity of genetic sequences can be estimated using the information-theoretic methods. Low complexity regions are of particular interest to genome researchers, because they indicate to sequence repeats and patterns. In this paper, the complexity of genetic sequences is estimated using Shannon entropy, Renyi entropy and relative Kolmogorov complexity. The structural complexity based on periodicities is analyzed using the autocorrelation function and time delayed mutual information. As a case study, we analyze human 22nd chromosome and identify 3 and 49 bp periodicities.
Published Vilnius : Matematikos ir informatikos institutas
Type Journal article
Language English
Publication date 2010
CC license CC license description