SCORING MOTIVATIONAL ASSESSMENTS WITH FREE RESPONSES VIA TRANSFORMERS

Cihan Demir

doi:10.7273/000007859

Student motivation is a well-established predictor of academic engagement and achievement. Despite its importance, motivation is difficult to assess accurately, as it is shaped by multiple, overlapping constructs and contextual factors. This complexity continues to pose challenges for researchers and practitioners seeking measurement tools with valid and reliable scores. While open-ended responses provide meaningful information about students’ motivation, traditional human scoring is time-consuming, subject to inconsistencies, and not easily scalable. Advances in natural language processing, especially with transformer-based large language models, offer promising new approaches for automated scoring. However, their use in scoring student motivation assessments remains largely undocumented, and practical applications are in the early stages of development. This dissertation addresses this gap through two studies. The first study presents a scoping review of transformer-based approaches to motivation scoring, examining model types, scoring methods, target constructs, and available psychometric evidence. Guided by the PRISMA-ScR protocol, the review identified only nine studies, underscoring the limited research in this area. The second study empirically evaluates transformer models for motivation assessment, comparing fine-tuned encoder models (BERT, DeBERTa) with rubric-prompted decoder models (GPT-3.5 Turbo, GPT-4.1). Model performance is evaluated through a psychometric lens, including analyses of human–machine agreement, construct validity, subgroup fairness, and predictive validity with academic outcomes such as GPA and retention. Notably, DeBERTa outperformed human raters in predicting academic outcomes, suggesting that certain transformer models may capture motivational signals with greater predictive value—even when agreement with human scores is modest. However, validity analyses revealed important limitations: models demonstrated inconsistent construct validity and notable subgroup disparities, raising concerns about fairness. Overall, while transformer-based approaches, specifically DeBERTa, hold promise for scalable motivation assessment, challenges remain in ensuring interpretability, validity, and equity. This study also discusses the challenges and limitations encountered, outlines practical implications for educational assessment, and identifies directions for future research in developing robust and fair applications of transformer models.

SCORING MOTIVATIONAL ASSESSMENTS WITH FREE RESPONSES VIA TRANSFORMERS

Files and links (1)

Abstract

Metrics

Details