TEACHERS OR CHATGPT: THE ISSUE OF ACCURACY AND CONSISTENCY IN L2 ASSESSMENT by Ramy Shabara, Khaled ElEbyary & Deena Boraie Cover Image

TEACHERS OR CHATGPT: THE ISSUE OF ACCURACY AND CONSISTENCY IN L2 ASSESSMENT by Ramy Shabara, Khaled ElEbyary & Deena Boraie
TEACHERS OR CHATGPT: THE ISSUE OF ACCURACY AND CONSISTENCY IN L2 ASSESSMENT by Ramy Shabara, Khaled ElEbyary & Deena Boraie

Author(s): Ramy Shabara, Khaled ElEbyary, Deena Boraie
Subject(s): Social Sciences, Education, Distance learning / e-learning
Published by: IATEFL Poland Computer Special Interest Group and The University of Nicosia
Keywords: ChatGPT; accuracy; consistency; intra-rater reliability; inter-rater reliability

Summary/Abstract: Although there are claims that ChatGPT, an AI-based language model, is capableof assessing the writing of L2 learners accurately and consistently in theclassroom, a number of recent studies have shown discrepancies between AIand human raters. Furthermore, there is a lack of studies investigating the intrareliability of ChatGPT scores. Accordingly, this study aimed to examine theaccuracy and consistency of ChatGPT compared to teachers, as well as withitself, after being trained on a rubric. To accomplish this goal, the study adopteda quantitative correlational non-experimental design. A dataset of 100 writingassignments, submitted by a cohort of B1-level students at an internationalbranch university in Egypt, was analyzed quantitatively. These assignments wereinitially evaluated and moderated by trained teachers (n=11), and subsequently,the same assignments were also assessed twice by ChatGPT. The findingsindicated that teachers’ scores exhibited a higher level of accuracy compared tothose generated by ChatGPT. The results also revealed that ChatGPT exhibitsa moderate, yet questioned, level of intra-rater reliability. The weak-to-moderatecorrelations between ChatGPT and teacher scores raise concerns about theaccuracy and consistency of ChatGPT’s scoring of writing assignments. Theimplications of the findings highlight the potential applications and limitationsof ChatGPT in L2 writing assessment. This study contributes to the ongoingdiscourse on the use of AI technologies in language education and providesinsights into the accuracy and reliability of ChatGPT as an evaluation tool forL2 writing

  • Issue Year: 24/2024
  • Issue No: 2
  • Page Range: 71-92
  • Page Count: 22
  • Language: English
Toggle Accessibility Mode