SCORING MOTIVATIONAL ASSESSMENTS WITH FREE RESPONSES VIA TRANSFORMERS
Cihan Demir
Doctor of Philosophy (PhD), Washington State University
07/2025
DOI:
https://doi.org/10.7273/000007859
Files and links (1)
pdf
Demir Dissertation4.48 MB
Embargoed Access, Embargo ends: 10/13/2027
Abstract
large language models motivation scoping review validity Education
Student motivation is a well-established predictor of academic engagement and achievement. Despite its importance, motivation is difficult to assess accurately, as it is shaped by multiple, overlapping constructs and contextual factors. This complexity continues to pose challenges for researchers and practitioners seeking measurement tools with valid and reliable scores. While open-ended responses provide meaningful information about students’ motivation, traditional human scoring is time-consuming, subject to inconsistencies, and not easily scalable. Advances in natural language processing, especially with transformer-based large language models, offer promising new approaches for automated scoring. However, their use in scoring student motivation assessments remains largely undocumented, and practical applications are in the early stages of development. This dissertation addresses this gap through two studies. The first study presents a scoping review of transformer-based approaches to motivation scoring, examining model types, scoring methods, target constructs, and available psychometric evidence. Guided by the PRISMA-ScR protocol, the review identified only nine studies, underscoring the limited research in this area. The second study empirically evaluates transformer models for motivation assessment, comparing fine-tuned encoder models (BERT, DeBERTa) with rubric-prompted decoder models (GPT-3.5 Turbo, GPT-4.1). Model performance is evaluated through a psychometric lens, including analyses of human–machine agreement, construct validity, subgroup fairness, and predictive validity with academic outcomes such as GPA and retention. Notably, DeBERTa outperformed human raters in predicting academic outcomes, suggesting that certain transformer models may capture motivational signals with greater predictive value—even when agreement with human scores is modest. However, validity analyses revealed important limitations: models demonstrated inconsistent construct validity and notable subgroup disparities, raising concerns about fairness. Overall, while transformer-based approaches, specifically DeBERTa, hold promise for scalable motivation assessment, challenges remain in ensuring interpretability, validity, and equity. This study also discusses the challenges and limitations encountered, outlines practical implications for educational assessment, and identifies directions for future research in developing robust and fair applications of transformer models.
Metrics
11 Record Views
Details
Title
SCORING MOTIVATIONAL ASSESSMENTS WITH FREE RESPONSES VIA TRANSFORMERS
Creators
Cihan Demir
Contributors
Brian F French (Chair)
Olusola Adesope (Committee Member)
Shenghai Dai (Committee Member)
W. Holmes Finch (Committee Member)
Awarding Institution
Washington State University
Academic Unit
Department of Kinesiology and Educational Psychology
Theses and Dissertations
Doctor of Philosophy (PhD), Washington State University