Human Vs. Machine Assessment of Essay Writings of B.A. Students of English Language Translation

Document Type : Original Article


1 M.A. in English Language Translation, Graduated,

2 Faculty Member of the Department of Computational Linguistics, Regional Information Center for Science and Technology (RICeST), Shiraz, Iran


This study intended to compare human vs. machine assessment of essay writings of B.A. students of ELT in an EFL setting. Essay writings of 30 female students were collected based on availability sampling. Their essays were corrected once by class instructor, and once by PaperRater in terms of spelling, grammar, word choice, style, and overall grade. The scoring was done based on PaperRater's system. Wilcoxon signed ranks test as well as tests of correlation were used to analyze the significance of difference and relationship between the scorings of machine and human assessors. Based on the results, for spelling errors, no significant difference was observed between the human rater and the machine. For style and overall grade, the scores assigned by the human rater were significantly lower than those by PaperRater. For grammar, the human rater found significantly more errors and finally for word choice the human assessor assigned a significantly higher score to the papers compared to PaperRater. For significance of correlation, the findings revealed a very strong positive correlation for spelling between the groups. For grammar errors, a rather strong positive correlation was observed. For overall grade, the correlation was weak but positive. But for word choice and style no significant correlation was found. The finding of this research are in line with the findings of [33] and [34], who found agreement between machine and human scorings of essays. In conclusion, despite its usefulness for teachers, machine assessing needs to be researched further to increase its validity.