Markov Model for Full Genome Sequence Generation

Authors

  • Foo Weng Lim
  • Yong Kheng Goh

Abstract

This work is devoted to introducing a Markov Chain method to generate a long sequence written in this four-letter alphabet namely; Adenine (A), Cytosine (C), Guanine (G) and Thymine (T).  The algorithm can be used to generate a new genomic DNA sequence that captures the statistical properties of the original sequence as well as preserve its statistical properties of the sequence for any case of N-grams. An N-grams is a subsequence of length N in the genomic DNA. Later, by counting the occurrence of different N-grams, and a signature vector of a genetic text, called contrast value is constructed. With the contrast value vector and correlation as distance measures, a phylogenetic tree is constructed.  The phylogenetic trees manage to group the organisms according to its kingdom which does not against the commonly accepted phylogenetic tree.  

Downloads

Published

2020-04-16

Issue

Section

Articles