Abstract
The data from a genomic library can be sorted into the frequencies of every possible tetranucleotide in the sequence. This tabulation, a short sequence distribution, contains the frequency of occurrence of the 256 tetranucleotides and thus seems to serve as a vehicle for averaging sequence information. Two such distributions can be readily compared by correlation. Reported here are correlations (Spearman rs) of the distributions from all of the genomic libraries in GenBank 44.0 with sizes equal to or larger than that of Salmonella typhimurium, except for the data for mouse and humans. All of the organisms examined showed highly significant correlations between the two DNA strands (not the complementarity expected from base pairing). Of 155 comparisons between libraries, 132 showed significant correlations at the 99% confidence level. Application of the correlation coefficients as a similarity matrix clustered most organisms in a phenogram in a pattern consistent with other hypotheses. This suggests a highly conserved pattern underlying all other genetic information in cellular DNA and affecting both DNA strands, perhaps caused by interaction with conserved factors necessary for DNA packaging.
Original language | English |
---|---|
Pages (from-to) | 24-30 |
Number of pages | 7 |
Journal | Journal of Molecular Evolution |
Volume | 32 |
Issue number | 1 |
DOIs | |
Publication status | Published - Jan 1991 |
Externally published | Yes |
Keywords
- Asymmetric nucleotide sequences
- Averaged sequence
- Evolution
- Evolutionary constraints
- GC content
- Sequence constraints
- Sequence structure
- Short sequence distribution
ASJC Scopus subject areas
- Ecology, Evolution, Behavior and Systematics
- Molecular Biology
- Genetics