Saturday, November 09, 2019

Using Grade Data for Improvement

Introduction

This article follows the theme of the last one, in taking a close look at course grade data and what it can tell us about student learning and success in college. I'll use a real example that you can replicate with your data and probably learn something useful.

You may be familiar with the idea of institutional effectiveness (IE), which advertises a decision-making process that comprises:
  1. setting goals relative to metrics,
  2. measuring suitable indicators of success,
  3. analyzing the data, and
  4. taking appropriate action.
I have gradually picked up on a pattern in human affairs that should have been obvious long ago: institutions choose the most convenient solutions to problems up to the point where being wrong is too painful (a less quotable version of Menchen's "there is always a well-known solution to every human problem—neat, plausible, and wrong"). 

Anne Applebaum's Red Famine describes how Bolshevik ideology led to collective farms, and kept the policies in place long after it was obvious that it was counterproductive. Luke Dittrich's Patient H.M. chronicles how many years it took to convince psycho-surgeons that they shouldn't routinely lobotomize their patients. And Siddhartha Mukherjee's The Emperor of  All Maladies describes how hard it was to uproot an erroneous theory about cancer to prevent the harm it was causing in radical surgery. 

The attraction of a good story dressed up as a theory is perhaps humanity's Achilles' heal. In the case of IE, there are several problems with the theory. This will all matter in a minute, so bear with me.

Analyzing Data

A couple of years ago I was preparing graphs of learning outcome ratings for departments like the one below, which I turned into a presentation slide.

Figure 1. Success rates in general education foreign language.


The graph is typical of what's seen in assessment reports, in alignment with the IE philosophy. Our nominal goal for general education outcomes like this was to have the "Doesn't meet expectations" rate under 10%. This one's too high by that standard. 

The problem is that, assuming we trust the data, we still don't know why students are not measuring up; we lack a causal model to explain the data. The usual advice from people who call themselves assessment experts is to extend the IE reductionism to look more closely at the work students are producing. This is one of those too-simple ideas, because it assumes that all learning problems are granular--that there's some subset of the material that is the problem, and if we just "add more problems to the syllabus" on that topic, we'll fix it. This constitutes the majority assessment "actions for improvement" that I see. 

Reality is more complicated than that. Quite by accident I made the graph below while mass-producing these for all our outcomes. Most are developmental, meaning we expect to see growth over four years, so for a threshold scale [doesn't meet/meets expectations], it isn't obviously useful to disaggregate the data by the student's year in college. 

Average success rates by when students take the course.

However, when we plot the rating averages this way, it's interesting. It looks like students who wait to take the introductory foreign language courses don't do as well in them. Is this because of the waiting time or a selection effect, where lesser-prepared students put off a hard class? A regression analysis suggests that both of these factors are involved. 

Grade Analysis

After I found the result above, it occurred to me that I could probably find the same pattern by looking at course grades, which I have a lot more of than assessment data. The analysis went like this:
  1. Get course grade data for the last several years and create a column that joins the subject and number, e.g. "Bio 101". We want grade points, not letters. You can either drop the Ws or turn them into zeros, as you like.
  2. Get a list of entering freshman cohorts and inner join to the grade data (i.e. keep only entering freshmen, because transfers will muddy the waters). 
  3. Identify the 100 or so most common courses taken and filter the data to those. These will include common general education courses and gateways for big majors. A big school would need more than 100 here, probably, to be sure to get everything interesting.
  4. For each of those courses, use the entering date of each student to add a column to the data that tells us what year in college he or she took each course. Our data columns now look like Course Type, StudentID, GradePoints, YearTaken. 
  5. For each course type, correlate GradePoints with YearTaken. (Alternatively, subtract the averages of year 1 grades from years 2+ average grades).
  6. Sort by largest negative correlations (or differences). 
When I did this, I got a list of courses that had math and sciences and foreign languages at the top. To take the next step, create linear models for each of these that has GradePoints as the dependent variable and YearTaken and High School GPA (or some other academic predictor like SAT/ACT) as independent variables. This should help sort out the selection question. 

Note: I am recoding this analysis for my data now, to organize the functionality into a reproducible research format. When I'm finished, I'll post the package on github so you can modify it for your own use. I'll post an update when that's done.

Getting to Action

My conclusion of the analysis on grades was that different advising could raise grades in the courses identified by getting students into them sooner. Notice that this is a long way from "adding another problem set on subject-verb agreement," which would be a likely outcome of only having figure 1 to consider. 

It seems very rare in institutional research to actually find a problem that can be clearly linked to a likely solution. When it does happen, the follow-up can be discouraging, because it's hard to change things. Machiavelli said it best in The Prince:

It ought to be remembered that there is nothing more difficult to take in hand, more perilous to conduct, or more uncertain in its success, than to take the lead in the introduction of a new order of things. Because the innovator has for enemies all those who have done well under the old conditions, and lukewarm defenders in those who may do well under the new. This coolness arises partly from fear of the opponents, who have the laws on their side, and partly from the incredulity of men, who do not readily believe in new things until they have had a long experience of them.
In this case, the solution is complicated, because it's not really just about advising. It's also about course capacity and agency of students. It's about institutional culture. 

My guess is that if you do this analysis yourself, you'll get similar findings. If you can then implement changes so that the most vulnerable students get into the risky classes earlier, with proper support, it could lift success rates, and--incidentally--additional revenue via retention. 

Discussion

The IE model has two simplifications illuminated in this article, that create critical weaknesses in the usefulness of the theory. The first is the implicit idea that just having some data is enough to identify a way to improve your situation. In reality, we have to have a way to use the data to predict what will happen in the future--a predictive model. That's difficult and often impossible, when we just don't know enough about the domain, or it's inherently random  (e.g. predicting the stock market is difficult despite all the data available, and weather forecasts more than a few days out are useless). 

The other oversimplification is the idea that people make data-driven decisions. Says Cassie Kozyrkov at Google "Businesses hire data scientists in droves to make rigorous, scientific, unbiased, data-driven decisions. And then they don't." This is one of several articles I've come across attempting to explain why data science often doesn't lead to different decision-making. It's a human problem, involving leadership style, psychology, and institutional culture. Stalin knew collectivization was leading to mass starvation, but the program was too associated with him to reverse. 

Some problems that Kozyrkov observes:
  • Many people only use data to feel better about decisions they’ve already made.
  • The more ways there are to slice the data, the more your analysis is a breeding ground for confirmation bias.
  • When decision-makers lack fundamental skills, there’s no math in the world that can fix it.
Another take on the same problem is found in an article by Jan Bosch here:
  • bad data,
  • wrong interpretation,
  • confirmation bias,
  • and the opposite: rejecting data that discredits a pet theory.
Jan closes with a subtle problem stemming from too much ambition, political necessity, and statistics:
Finally, when a company starts with a data-driven initiative, the initial focus is on a big, strategically important topic that has many contributing variables. The topic is selected in order to garner the necessary support. However, the first initiatives to use data with the intent to influence the strategic goal have too little power to move the needle on the measured output data. The effect of the input variables is too little to push the output variables outside the noise range. Therefore, the initiative is easily categorized as a failure as the effects were too small to influence the selected output variable with statistical significance. In effect, the initiative was set up for failure from the beginning.
In statistics, we do (or should do) a power analysis in the beginning, to see if the proposed data have enough statistical usefulness to give us an answer to the research question. Otherwise, we can do a lot of work to gather data and then get an inconclusive result even when the thing worked.

The lesson is that if you're serious about using numbers together with goals, the statistical models should be designed at the beginning with the metrics and goals. 

Which brings us to the most significant problem with the IE theory that I'll mention: many problems can't realistically be solved via statistical modeling. There are other kinds of leadership that are just as important, and the most effective leaders are those who can make good decisions with only poor information. How that's possible will have to wait for another day.

No comments:

Post a Comment