Wednesday, March 25, 2009

Where are the error bars?

False certainty is a devil or an angel, depending on whether or not you're in for a surprise. If you assume the world won't end today, life is a lot easier--no messy running to the store to stock up on staples and digging a cellar for the winter. On the other hand, that certainty is almost always misplaced to some extent (solar explosions, calderas, accidentally dropped nukes,...). Still, we'd all be nervous wrecks if we hedged our bets on every conceivable contingency. Much better to be certain about most things and accept a few surprises. You can think of this as a kind of low-pass filter that removes the noise of everyday probabilities.

Sometimes, however, we go to far. We apply categories like 'provisional student' that carry expectations and assumptions. These may or may not be valid, and it's in our best interest to make some of these conversations more complex than they will be with our artificial certainty.

I've argued before that intervention, for example, is better than prediction. If a student is having trouble, we can see that in performance. But predicting whether or not that will happen is fraught with uncertainties. Nevertheless, many institutions categorize students as 'risky' and put them in special programs, ignoring the fact that they're probably right only about half the time (in my experience), and that there are many other students who should be in those programs, but don't get the same category applied.

Even more subtle is the effect of false certainty on learning outcomes. A casual reader of advertisements for products like the Collegiate Learning Assessment (CLA) will be impressed by the certainty implied*.
[A]fter testing of seniors in the spring, you receive a full Institutional Report that evaluates your school's value-added on a comparative basis. Testing every year allows you to measure for effects of changes in curriculum or pedagogy.
and
Performance-based assessments are anchored in a number of psychometric assumptions different from those employed by common multiple-choice exams. As such, initiatives like the Collegiate Learning Assessment (CLA) represent a paradigm shift, the technical underpinnings of which remain unfamiliar to many faculty and institutional researchers.
There's not much evidence of modesty in these claims. To be fair, this is pretty typical even for conference presentations. We may put literal error bars on some statistical data (although even that seems to be rare), but there's little discussion about the actual uncertainty inherent to the process of assessing learning. The most powerful kind of validity information would be predictive--what do your test results say about future performance in some realm that we care about?

In practice, the value of learning assessment is that it generates a useful dialogue that potentially produces improvement and innovation. Unfortunately, that's not what the grand overseers of higher education want--rather they desire the artificial certainty of a real metric: an educational gas gauge.

So accreditors and administrators fuss around trying to produce certainty. Now that there's a huge economic and status incentive to produce the stuff, certainty in learning outcomes measurment is cranked out like sausages. There's also little defense against the "show me" argument, so as a profession we run around in circles manufacturing certainty where it doesn't really exist. We employ a positivist-type philosphy of definition/observation/classification but only in the sense that a cargo cult builts an airplane. That's a bold claim, so allow me to elaborate.

An example of this latter assertion is the way rubrics are constructed and used. In the interests of creating a kind of measurement of learning, standard divide-and-conquer reductionism is employed. To assess writing samples, we might use a rubric that contains several elements including correctness and style. These would be defined in levels of increasing accomplishment, perhaps from 'underperforms' to 'overperforms' on an ordinal scale. There seems to be an overwhelming certainty that this approach is a good one. As I mentioned, the results could indeed be wonderful for sparking a dialogue between professor and student or among professional colleagues as part of an ongoing program review. But as real measurement it fails miserably. The complete catalog of reasons why will have to wait for another day, but one suffices.

Not everything in the world comprises linear combinations of other things. To create a 'composite' or 'holistic' score from a rubric, the individual dimensions must be combined somehow. Sometimes the scores for correctness, style, and so forth are simply summed up to give a total. This total is the 'measured' quality of writing. Implicit in this construction is the idea that if we get the details right, then a picture of the whole will emerge naturally. This is a nasty little enthymeme; in this case false certainty is the devil in the details.

For some odd reason, subjective assessments of learning are discouraged in the big picture, but accepted without comment at the reduced level. That is, the assessment of the elements of the rubric is almost always subjective, but subjective assessments at a larger scale of resolution is met with raised eyebrows. I know this because when I present the Assessing the Elephant model at conferences, I get amazed looks. You can get away with assessing a student's performance as a whole??? This contradiction goes largely undiscussed.

What is the predictive validity of rubric use? If we're going to use linear combinations of subjectively rated sub-skills as a 'measurement' for some whole, how do we know we're getting the right answer? If we are, there should be a natural test. I have my own ideas about what this natural test is, but there ought to be a debate on the question. The misplaced certainty that current 'measurement' practices have a sound epistemology is damaging because it can easily create a new phrenology. It already has.

The reality is that whatever philosophical mish-mash is employed to produce numbers, these numbers are often taken seriously, and become a monological definition for learning. This applied definition may have little to do with the common (dialogical) definition. This is a recipe for a very expensive irrelevance. We need less certainty. I'm sure of it.


*I was amazed to see that a Google search has a hard time even finding an official CLA website. I ended up using a cached one.

No comments:

Post a Comment