Accepted manuscript
Imbalanced Class Learning in Epigenetics
Journal of computational biology, Vol.21(7), pp.492-507
07/01/2014
Handle:
https://hdl.handle.net/2376/116162
PMCID: PMC4082351
PMID: 24798423
Abstract
In machine learning, one of the important criteria for higher classification accuracy is a balanced dataset. Datasets with a large ratio between minority and majority classes face hindrance in learning using any classifier. Datasets having a magnitude difference in number of instances between the target concept result in an imbalanced class distribution. Such datasets can range from biological data, sensor data, medical diagnostics, or any other domain where labeling any instances of the minority class can be time-consuming or costly or the data may not be easily available. The current study investigates a number of imbalanced class algorithms for solving the imbalanced class distribution present in epigenetic datasets. Epigenetic (DNA methylation) datasets inherently come with few differentially DNA methylated regions (DMR) and with a higher number of non-DMR sites. For this class imbalance problem, a number of algorithms are compared, including the TAN+AdaBoost algorithm. Experiments performed on four epigenetic datasets and several known datasets show that an imbalanced dataset can have similar accuracy as a regular learner on a balanced dataset
.
Metrics
1 File views/ downloads
5 Record Views
Details
- Title
- Imbalanced Class Learning in Epigenetics
- Creators
- M. Muksitul Haque - Center for Reproductive Biology, School of Biological SciencesMichael K Skinner - Center for Reproductive Biology, School of Biological SciencesLawrence B Holder - School of Electrical Engineering and Computer Science
- Publication Details
- Journal of computational biology, Vol.21(7), pp.492-507
- Academic Unit
- Biological Sciences, School of; Electrical Engineering and Computer Science, School of
- Publisher
- Mary Ann Liebert, Inc; 140 Huguenot Street, 3rd FloorNew Rochelle, NY 10801USA
- Identifiers
- 99900547743901842
- Language
- English
- Resource Type
- Accepted manuscript