The automated essay scoring engine behind Criterion, called e-rater, has been used to score more than 1.5 million essays on the Graduate Management Admission Test, or GMAT, in tandem with human readers. The machine score and the human score are in agreement 97 percent to 98 percent of the time. —For Student Essayists, an Automated Grader (NY Times)
Last fall, the Criteria people contacted me and asked me to participate in a test. I never got to look at the algorithm, because the interface was so buggy — about a third of the students reported some problem, including their text window blanking out when they used the spell-check or dictionary. The person running the test admitted (via e-mail) that they were having big problems, and that the other tests they had run weren’t showing much correlation between the human-assigned and computer-assigned scores.
I was supposed to mark the papers and submit the scores to Criteria, which makes perfect sense, but quite frankly the students were so stressed by the experience that I had to tell them midway through that the assignment wouldn’t count as a grade. Upon hearing that, some students stopped taking the assignment seriously. Furthermore, once I realized that the Criteria system was recording my students’ real names in its internal database, I didn’t like the idea of telling the company what grades my students were getting — that would be a violation of the students’ privacy. And, since so many students reported problems with the interface, the assignment wasn’t really worth my time to evaluate — I just treated it as one of the many “did they do it or not” exercises that compose the class participation score.
I can see this being a useful tool in huge lecture courses where it’s impossible for one person to read all essays, in which case the tool can be used to normalize the scores (that is, to tell graders whether they have a tendency to give unusually high or low marks).
Link found via Slashdot, where at least one poster says that knowing an essay will be computer-scored provides a good rationalization for submitting comptuer-generated essays. Of course, the amount of effort that it would take to program essay-generation software would probably be a lot harder than the effort it would take just to write the damn essay, but a true hacker doesn’t care about mundane stuff like that.
The professor of a huge art history course or a history or philosophy survey (where the point of the course is to communicate a lot of facts in the hopes that the students will be able to relate them and synthesize them during the course and perhaps build on them later in more advanced classes) might use a tool like this to help students practice working all the names and dates into coherent narratives. Although multiple-choice tests are easy to grade, they are a lot harder to create than a couple of short-essay prompts — so I can imagine using such a tool to help me evaluate short quizzes that are designed to ensure that students have done the assigned readings.
Composition teachers and creative writing teachers do so much more than mark errors in grammar and punctuation. There’s little danger that these teachers will turn to software like Criteria for any heavy-duty assignments.
I’d much rather see a tool that trains students to evaluate their peers’ papers. Someday maybe I’ll ask for a sabbatical to develop it as an open source projet, but until then I’ll just dream.