Assessing Critical Thinking in Open-ended Answers: An Automatic Approach Cover Image

Assessing Critical Thinking in Open-ended Answers: An Automatic Approach
Assessing Critical Thinking in Open-ended Answers: An Automatic Approach

Author(s): Antonella Poce, Francesca Amenduni, Carlo De Medio, Alessandra Norgini
Subject(s): Social Sciences, Education, Higher Education
Published by: European Distance and E-Learning Network
Keywords: Big Data in Education and Learning Analytics

Summary/Abstract: The role of Higher Education (HE) is growingly acknowledged for the promotion of Critical Thinking (CT). Constructed-response tasks (CRT) are recognized to be necessary for the CT assessment, though they present problems related to scoring quality and cost (Ku, 2009). Researchers (Liu, Frankel, & Roohr, 2014) have proposed using automated scoring to address the above concerns. The present work is aimed at comparing the features of different Natural Language Processing (NLP) techniques adopted to improve the reliability of a prototype designed to automatically assess six sub-skills of CT in CRT: use of language, argumentation, relevance, importance, critical evaluation and novelty (XXX, 2017). We will present the first (1.0) and the second (2.0) version of the CT prototype and their respective reliability results. Our research question is the following: Which level of reliability are shown respectively by the 1.0 and 2.0 automatic CT assessment prototype compared to expert human evaluation? Data collection is realized in two moments, to measure respectively the CT prototype 1.0 and 2.0 reliability from a total of 264 participants and 592 open-ended answers. Two human assessors rated all of these responses on each of the subskills (XXX, 2017) on a scale of 1-5. Similarly, NLP approaches are adopted to compute a feature on each dimension. Quadratic Weighted Kappa and Pearson product-moment correlation were used to evaluate the between-human agreement and human-NLP agreement. Preliminary findings based on the first data set suggest adequate level of between-human rating agreement and a lower level human-NLP agreement (r > .43 for the subscales of Relevance and Importance). We are continuing the analysis of the data collected in the 2nd step and expect to complete them in June 2020.

  • Issue Year: 2020
  • Issue No: 1
  • Page Range: 109-116
  • Page Count: 8
  • Language: English