Teachers As Raters=Success
We gets calls from schools around the country every day. But this one was different.
“What have you done to Marie-Pierre?” asked the principal. “She is a totally different teacher. Her classroom is so alive and her program is growing. Whatever you are doing is working.”
At first, the question baffled Maury Ennis, who manages our raters like Marie-Pierre. They hadn’t done anything special with Marie-Pierre, just run her through the rater connection program so that she could accurately score STAMP 4S speaking and writing responses. And they certainly hadn’t told her how to teach French.
On the other hand, the essence of standards-based teaching is a set of commonly understood outcomes. Scoring thousands of student speaking and writing responses had given Marie-Pierre a deep and concrete understanding of what proficiency really looks like. For years she had read the words on paper — “familiar situations” “generally comprehensible” “connected discourse” — but it wasn’t until she sat down to rate all those French responses that the abstract concepts came to life. Rating these responses from diverse students forced her to align with others around the country.
Reliable Proficiency Ratings Is Our Mission
“Our mission and our business depends on providing reliable proficiency ratings for every student. All of our raters have to be on the same page,” says our founder David Bong.
“We rate about 500,000 responses a year,” says Maury. “My job is to make sure that they are all graded quickly and accurately.”
With students and raters scattered all over the world, she has a tough job. To make it possible, our software engineers teamed up with Maury to build Rater Connection, which was based on a prototype from the University of Oregon’s Center for Applied Second Language Studies (CASLS). Rater Connection is an online system for training, certifying, and monitoring raters who score those half million responses per year.
Let’s take a look behind the curtain to see how Maury and her team pull off this monumental feat. The system consists of four key components: learning, practicing, certifying, and monitoring.
The Four Elements of Rating Language Proficiency
Learning: Making criteria specific
Maury starts by orienting new raters to the STAMP 4S criteria, which are derived from ACTFL Proficiency Guidelines. Just as OPI (Oral Proficiency Interview) training gives raters techniques and criteria not directly stated in the Guidelines, Avant gives raters specific criteria to make the abstract Guidelines more concrete. For example, the Guidelines for Intermediate-High state that students must produce “connected sentences into paragraphs using a limited number of cohesive devices.” This can mean many things to many people, so Maury introduces raters to a rule of thumb that is easier to interpret consistently: A response should contain dependent clauses. This distinguishes between sentences such as “My friend and I went to the store. We bought ice cream. We ate it on the way home.” (No dependent clauses. Does not meet the “connected sentences” criterion.), and “My friend and I ate the ice cream that we bought at the store.” (Includes a dependent clause. Meets part of the “connected sentences” criterion.)
Practicing: Applying the criteria to student responses
Once raters understand the criteria, they begin applying that knowledge to practice items. Each time they rate a response, they get immediate feedback on whether they scored it correctly and an explanation of the correct rating. Repeating this process through hundreds of responses, raters internalize the criteria. Excellence becomes a habit.
Certifying: Assuring the quality of raters
When they are ready, new raters can take a certification test. If they rate at least 90% of student responses correctly, they become certified Avant raters. If they fail to reach the 90% level, they can go back, practice some more and take the certification test again.
Monitoring: Maintaining excellence
Even after they are certified, Avant raters have to keep proving that they are consistent and reliable. Rater Connection inserts one anchor item — agreed upon by master raters — into every batch of 25 real responses. This shows Maury how different raters score that anchor item and ensures that they remain true to the standard over time. In addition, one out of five items is randomly chosen for double rating. If the first two raters disagree, the item goes to a third rater with master raters carefully reviewing all of the tie breaking decisions.
Providing accurate proficiency-based ratings to language learners at a reasonable cost is one of our contributions to the field. But the greatest value of this system is that it could be used to improve teaching and learning. This is what happened to Marie-Pierre.
“Of course, we were thrilled to hear that,” says Maury. “But it was an unintended consequence. After all, we are an assessment company and a professional development company.”
In the next blog post learn how we have transitioned from the Rater Connection into ADVANCE, a tool that schools, districts, and universities use to help their teachers transform their teaching like Marie-Pierre.