Introduction
This is a useful trick I learned a couple of years ago, for working with course grades. The distribution of raw grades is usually censored (i.e. clipped) at the top end. This happens if admissions standards and grading standards are such that a large fraction of students will succeed in the classroom (receive As). The trick is to recalculate grades to get more information out of them.Method
Statistically, it's annoying to have a truncated distribution as an outcome variable, but there's a way out of it. If we assume that it's harder to earn an A in some classes than others, then we can do the following to estimate course section difficulty:- Assume that a students total cumulative GPA is a reasonable measure of academic ability A.
- Within a given course section, the expected GPA of all students receiving a grade would be the average ability of those student: \(E(A)\). That is, we just find the cumulative GPA for each student, then average those to get an expected GPA for the class.
- A given class's "grade difficulty" is \( E(A) - GPA_{class} \). In other words, we subtract the actual grade average of the class from the expected average. If this is less than zero, that means the assigned grades were higher than expected, so the difficulty is lower than average.
That method allows us to recalculate grade averages for course sections, if we want to do a section analysis, or see if one subject is more difficult than another subject.
To work with individual student grades, e.g. to predict first-year grades as part of an intervention program, we can recalculate them in a similar way.
- For each class, calculate the average grade assigned \( GPA_{class} \).
- Subtract it from each student's grade in that course: \( GPA_{student} - GPA_{class} \). If this is greater than zero, the student out-earned other students in the class on average.
- Average each of those differences for every student to get a new estimate of A. It's a difference between grades. If you want to recast it as a kind of GPA, add the average grade over all classes and students.
Results
Figure 1. The effect of recalculating student grades by the difference method. |
The distributions in the figure show the effect of the second method: recalculating student grades based on course difficulty. Applying this calculation before doing a regression analysis helps get better estimates of A and increases the explanatory power of the model.
Discussion
In R you can simultaneously calculate section difficulty and student grade-earning ability using a random effects models. Some statisticians I compare notes with don't bother to do this, but it's an option if you have time for the model to run (it can take a while with a lot of data).
I use library(lme4) to access the lmer package, and use a formula like
Points ~ 0 + (1|StudentID) + (1|SectionID)
This returns a model with all the random effects, which are estimates for each student and class. For the students it's a difference score that estimates grade-earning ability, with a distribution like the one in figure 1. For classes it's a recalculated grade average that can be used to identify course or subject difficulty.
No comments:
Post a Comment