How to assess the raters of high-stakes tests Cover Image

Kuidas hinnata suure panusega testide hindajaid
How to assess the raters of high-stakes tests

Author(s): Hille Pajupuu
Subject(s): Language and Literature Studies
Published by: Eesti Rakenduslingvistika Ühing (ERÜ)
Keywords: rater evaluation; rater consistency; rating; double marking; Estonian

Summary/Abstract: In a situation where, among all tests, the proportion of high-stakes tests is constantly growing, while their results are increasingly used to pass judgements not only on examinees but also on teachers and teaching quality, and, with time, many of those tests have become obligatory, high demands should certainly be set on the sense of responsibility of anyone involved in test development or test use. Special attention should be paid to the quality of subjective ratings as the writing and speaking parts of a test may often account for half of the total score. If a testee should score lower or higher than their competence is worth, it may change their life as well as that of other people. The commonly used simple statistics (calculation of differences between the marks awarded by two raters, inter-rater correlation) may actually fail to take account of quite a lot of wrong credits if the raters are many and inadequately prepared. In order to reduce unfair assessment a method is suggested to identify poorly performing raters and to reassess their results in good time. The method is meant to be used in the case of double marking. Notably, a quality index is computed to show the degree of similarity between the credits given by the rater to be asessed and an expert rater, even if the two have never worked in a pair. It is assumed that the higher the similarity the fairer the credits. The article describes the general principles of the method suggested, pointing out its advantages over some other simple methods used for the same purpose.

  • Issue Year: 2007
  • Issue No: 3
  • Page Range: 221-233
  • Page Count: 13
  • Language: Estonian