Dissertation
USING CLUSTERING AND NETWORK SCIENCE TECHNIQUES TO UNDERSTAND THE RELATIONSHIPS AMONG BACTERIA AND AMONG CORONAVIRUSES
Washington State University
Doctor of Philosophy (PhD), Washington State University
01/2021
DOI:
https://doi.org/10.7273/000005417
Handle:
https://hdl.handle.net/2376/119318
Abstract
Recently, high-throughput approaches to DNA sequencing, such as massively parallel sequencing, have resulted in the availability of a vast number of whole-genome sequences. This availability has presented scientists with an unprecedented opportunity to gain knowledge by means of data mining and data analysis. It has allowed the investigation of the relationship among organisms by building and analyzing networks of complete genomes, forming sequence similarity networks, and reconstructing and visualizing phylogenetic relationships among living organisms. Building phylogenetic trees is a fundamental challenge because not all organisms share the same genes. As a result, the first phylogenetic visualizations employed a single gene, e.g., rRNA genes, sufficiently conserved to be present in all organisms but divergent enough to provide discrimination between groups. As more genome data became available, researchersbegan concatenating different combinations of genes or proteins to construct phylogenetic trees believed to be more robust because they incorporated more information. However, the genes or proteins chosen were based on ad hoc approaches. Here, we presented a systematic approach for constructing a phylogenetic tree based on simultaneously clustering the complete proteomes of bacterial species. The clusters are also used to create a network from which bacterial species with horizontally transferred genes from other phyla are identified. We built clusters using a fast and accurate software tool, pClust, to group protein sequences into homologous clusters.
However, pClust has several parameters with values that must be chosen, and the choice of these values affects the accuracy of the clustering results. We studied the most significant parameters: alignment length, match similarity, and optimal score and explored local and semi-global alignments. The last part of our study provides
a comprehensive analysis of the relationship among coronavirus organisms at several
levels. Mutation and recombination enable coronaviruses to adapt to a new environment, resulting in host range and tissue tropism expansion. Therefore, health threats from coronaviruses are consistent, and there is always the feasibility of crossing the species barrier into humans, causing outbreaks of severe and often fatal respiratory
diseases. Understanding the relationship of coronaviruses enables us to study their virology and control their spread, resulting in significant global health and economic stability.
Metrics
3 File views/ downloads
85 Record Views
Details
- Title
- USING CLUSTERING AND NETWORK SCIENCE TECHNIQUES TO UNDERSTAND THE RELATIONSHIPS AMONG BACTERIA AND AMONG CORONAVIRUSES
- Creators
- Ehdieh Khaledian
- Contributors
- Shira L Broschat (Advisor)Kelly A Brayton (Committee Member)Assefaw H Gebremedhin (Committee Member)
- Awarding Institution
- Washington State University
- Academic Unit
- School of Electrical Engineering and Computer Science
- Theses and Dissertations
- Doctor of Philosophy (PhD), Washington State University
- Publisher
- Washington State University
- Number of pages
- 110
- Identifiers
- 99900591957401842
- Language
- English
- Resource Type
- Dissertation