Machine Learning-Based Prediction of Bacteriocins via Feature Evaluation
Suraiya Akhter
Washington State University
Doctor of Philosophy (PhD), Washington State University
05/2024
DOI:
https://doi.org/10.7273/000006506
Files and links (2)
pdf
Suraiya_dissertation_v72.61 MB
CC BY V4.0, Embargoed Access, Embargo ends: 06/24/2026
zip
Supplemental Figures862.71 kB
ImageSupplemetal FiguresCC BY V4.0, Embargoed Access, Embargo ends: 06/24/2026
Abstract
antimicrobial peptides antimicrobial resistance bacteriocin prediction feature selection software and web development Artificial intelligence Machine Learning
Antibiotic resistance is a major public health concern around the globe. As a result, researchers are always looking for new compounds to develop new antibiotic drugs to combat antibiotic-resistant bacteria. The use of bacteriocins has emerged as a promising strategy in the development of new drugs to combat antibiotic resistance, given their ability to kill bacteria with both broad and narrow natural spectra. Therefore, there is a strong need for an accurate and efficient computational model to predict novel bacteriocins. Machine learning's ability to learn patterns and features from bacteriocin sequences that are difficult to capture using sequence matching-based methods makes it a potentially superior choice for accurate prediction. Our objective was to develop a machine learning-based software tool and a machine learning-based web application using selected features to precisely detect bacteriocin protein sequences. To achieve this, we initially extracted potential features from known bacteriocins and non-bacteriocins, considering physicochemical, structural, and sequence-profile attributes of the protein sequences. Subsequently, we refined the feature set using various feature evaluation methods, including statistical justifications, recursive feature elimination techniques, alternating decision tree, genetic algorithm, linear support vector classifier, cross-validated, and hypergraph-based techniques. We developed several popular and widely used machine learning models, and our best predictive model achieved an accuracy of 99.11% and an AUC of 0.9984 for the testing dataset. Furthermore, we analyzed the feature contributions directly from the predictive model using the Shapley values. Our software tool offers prediction results based on the best predictive model, using a statistically justified feature set. In contrast, our web application gives prediction results using the best models created with feature sets selected from all the different methods we considered, including the one used to build the software tool.
Metrics
28 Record Views
Details
Title
Machine Learning-Based Prediction of Bacteriocins via Feature Evaluation
Creators
Suraiya Akhter
Contributors
John H Miller (Chair) - Washington State University, Engineering and Applied Sciences (TRIC), School of
Shira L. Broschat (Committee Member)
Mohamed A Osman (Committee Member) - Washington State University, Engineering and Applied Sciences (TRIC), School of
Awarding Institution
Washington State University
Academic Unit
Electrical Engineering and Computer Science, School of
Theses and Dissertations
Doctor of Philosophy (PhD), Washington State University