Abstract [eng] |
The genetic information of all life forms is stored in long sequences of nucleotides that make up an individuals deoxyribonucleic acid. The length of the nucleotide sequence also depends on the complexity of the organism. The ever–increasing number of newly read genomes of various organisms opens up opportunities for the development of new research, but due to the unequal lengths of the sequences, the large amount of information and the way it is presented, it is difficult to analyze the genome sequence using numerical algorithms. A simpler and faster method for sequence comparison and analysis could be to represent the genome in a two–dimensional surface. By mapping the sequence on a plane, it could be possible to visually see the whole structure of the sequence, the frequency and regularities of single nucleotides and their combinations in one picture. This paper examines visualization methods of chaos game, chaos game frequencies and matrix representation and the influence of their parameters on the resulting image. The work presents visualizations of DNA sequences of human chromosomes, randomly generated and obtained by multinomial logistic regression. Image contrast adjustment is used to show patterns seen in images. Structural similarity index, Pearson correlation coefficient parameters and image subtraction method were selected for image comparison. Logistic regression was used to assess whether the dependence between data units exists and determine the suitability of the Markov chain model for predicting sequences. The results have shown that the structure of the resulting image has fractal properties in some cases of sequence representation. It has been established that fractal structures can be observed in the image when the value of the selected parameter (sequence fragment length) of the visualization methods is greater than 3. It was observed that the chaos game method is not suitable for visualizing long sequences, if the goal is to depict the resulting fractal structures - in this case, point frequency must be calculated additionally. Comparing randomly generated, obtained by logistic regression and real DNA sequences revealed that the structure of DNA is unique - although in most cases it is possible to predict the base of a nucleotide by knowing its neighboring nucleotides, the sequences obtained by logistic regression do not match visualizations of real DNA sequences. |