Doctor of Philosophy (PhD), Washington State University
07/2024
DOI:
https://doi.org/10.7273/000007041
Files and links (1)
pdf
FF_dissertation_v37.84 MB
Embargoed Access, Embargo ends: 10/11/2026
Abstract
Complex Data Modeling Machine Learning
This collection of articles explores the application of machine learning (ML) and stochastic process across diverse domains, including agronomic science, aerosol dynamics, and traffic modeling. Despite apparent differences, these fields share common challenges related to data complexity, modeling intricacies, and the integration of statistical techniques.
The first paper delves into precision agriculture, focusing on the prediction of Russet potato clone suitability for advancement in breeding trials. Leveraging data from trials conducted in Oregon, a variety of state-of-the-art binary classification models are investigated. Through comprehensive analysis, including preprocessing, feature engineering, and imputation, top-performing models such as the multi-layer perceptron classifier (MLPC), histogram-based gradient boosting classifier (HGBC), and support vector machine classifier (SVC) demonstrate significant results. Variable selection further enhances model performance, emphasizing the potential of ML in streamlining potato variety selection and informing breeding programs.
In the second paper, a graph network simulator (GNS) framework tailored for aerosol chemistry is presented. Utilizing data simulated from the PartMC-MOSAIC model, the framework accurately predicts particle chemical dynamics, showcasing efficient training and generalization across different scenarios. By categorizing features and leveraging standard neural networks and k-nearest neighbors algorithms, the GNS demonstrates robustness and adaptability in modeling aerosol chemistry, offering insights into atmospheric processes and climate dynamics.
The third paper focuses on urban transportation and emissions modeling, proposing a mathematical model to evaluate waiting times, traveling times of customers, and total vehicle emissions for various Park-and-Ride usage ratios and public transportation operation policies. Through a case-study of Tsukuba city in Japan, the integrated system of queues and emissions model reveals intriguing trade-os between waiting times for Park-and-Ride and traffic congestion-induced emissions from private cars. The results provide insights into optimizing public transportation capacity and frequency, offering solutions to mitigate urban congestion and reduce emissions.
By addressing specific challenges and methodologies in agriculture, aerosol chemistry, and urban transportation, these papers contribute to interdisciplinary research, highlighting the diverse applications and benefits of statistical methods in addressing real-world problems. Future research on integrated stochastic and ML approaches in these disciplines will help address critical issues related to climate change, food security, and urban sustainability.
Metrics
13 Record Views
Details
Title
TRIALS, TURBULATIONS, AND INFERENCES OF ML ON COMPLEX DATA
Creators
Fabiana Ferracina
Contributors
Bala Krishnamoorthy (Chair)
Nairanjana Dasgupta (Committee Member)
Yuan Wang (Committee Member)
Awarding Institution
Washington State University
Academic Unit
Department of Mathematics and Statistics
Theses and Dissertations
Doctor of Philosophy (PhD), Washington State University