Monday, December 02, 2019

Finding Latent Mean Scores

Introduction

The use of the ordinal responses as scalar values was on my list of problems with rubric data. I showed how to estimate an underlying "latent" scale for ordinal data here and elaborated on the methods here. These methods cast the response values into a normal (or logistic, etc.) distribution, from which we can get the latent mean. However, by itself this isn't very interesting, because we usually want to compare the means of groups within the data. Do seniors score higher than freshmen?

This post shows how to make such comparisons by combining the latent scale model with a matrix of predictor variables to create a unified model from which to pull means.

For example, if we wanted to compare senior scores to freshmen scores, it wouldn't do to separate the data sets and calculate the mean for each; it's much better to keep them in one data set and use the class status as an explanatory variable within a common model. This idea can then be extended to create hierarchical or mixed-effects models.

Writing Scores

Each fall and spring semester, our Assessment Office asks faculty for evaluations of student writing, if they have had an opportunity to observe it. The details of this data collection and the reliability and validity characteristics we've uncovered so far can be found here, here, and here

The resulting scores range from 0 = below minimum standards for college work, to 4 = writing at the level we expect of our graduates. Notice that this addresses problem 2 in my earlier post by intentionally pinning the scale to progress. The most stellar freshman is very unlikely to receive a 4.

The output below is based on about 17,000 data points gathered over four years.

A Proportional Odds Model

Our prior research has shown that cumulative GPA is important to understanding the scores, and we hope that time is, because we want to understand student development. We'll measure time in the number of fall/spring terms a student has been at college (TermNum). Finally, we want to understand the relationship between general academic ability (GPA) and growth, so we'll include an interaction term between grades and time.

As it turns out, the time variable by itself was not statistically distinguishable from zero, so I'll just show the final model here, with GPA and TermNum:GPA as inputs. 

One way to induce a latent scale is to use a proportional odds model, which can be accomplished in R using the MASS package's polr function. 


MASS::polr(as.factor(Rating) ~ GPA + GPA:TermNum, data = dasl, method = "probit")

Model Output

The results look like this.


Call:
MASS::polr(formula = as.factor(Rating) ~ GPA + GPA:TermNum, data = dasl, 
    method = "probit")

Coefficients:
             Value Std. Error t value
GPA         0.7130   0.015964   44.66
GPA:TermNum 0.0788   0.001188   66.31

Intercepts:
    Value   Std. Error t value
0|1  0.9424  0.0488    19.3201
1|2  2.2105  0.0485    45.5799
2|3  3.5963  0.0513    70.0575
3|4  4.7040  0.0542    86.8262

Residual Deviance: 40453.60 
AIC: 40465.60 

We can see from the standard errors that both the coefficients are convincingly non-zero. Unlike ordinary least squares, where we usually get a single constant coefficient, here we get four of them: one for each cut-point on the scale. So the 0|1 intercept is the dividing line between ratings of 0 and ratings of 1. The scale's units are z-scores (the x-axis for the normal density function).

These z-score intercepts at cutpoints get adjusted for each student based on GPA and the interaction between GPA and TermNum, according to the coefficients for those.

Simulating Output

Once we have the model, we can plug values in to see what the modeled distribution looks like.  The one below is for 4.0-GPA students at term 8, which is normally the last term before graduation.


We can see here that even the best graduating seniors are not all getting 4s in this model. Over half of them will, however, and almost all  receive either a 3 or a 4, but there's a small chance of a 2 showing up.

For 3.0 students, the curve is shifted to the left, which we get by plugging in GPA = 3 and TermNum = 8.

There's significantly less chance of a 4, and even a small chance that a 1 will be assigned.

Comparing Means

Notice in each of the graphs above that the average latent scale value can be estimated from the picture: find the mode (highest point), and look for the x-value at the bottom. For the GPA 4.0 students, it's something like 3.75, and for the 3.0 GPA students, it's about 3. We can extend this idea to look at a cross-section of GPAs for all eight terms.


Unlike ordinary least squares, pretending that the ordinal scores are scalar, the results here aren't all straight lines. We can see the ceiling effect of the scale limit on the highest GPA students, as it deviates from a line. 

The effect of the interaction term between grades and time is what causes the "fan" shape here, where the distance between student groups on the right side of the graph is larger than on the left. This suggests that a Matthew Effect is happening with writing. 

Note that most of our students are 3.0 or higher in GPA, so there aren't many students on those lower two trajectories. Still, the results here are significant in thinking about how college students learn academic writing skills.

Discussion

With easily-accessible software tools, like R and the MASS package, it's straightforward to create models of ordinal data that can test for model fit and allow for comparisons between groups. The challenge to using these tools is having a conceptual understanding of what's going on (which is the purpose of these articles) and having enough comfort with R (or other package) to do the work efficiently. If you're turned off by the cryptic-looking syntax you've seen, I recommend Hadley Wickham's book R for Data Science as a modern way to use R that takes advantage of accessible language features--after a bit of practice it's easy to read and write. 


No comments:

Post a Comment