Sunday, May 17, 2009

Assessment and Automation

A January article in InsideHigherEd ponders the worth and future of assessment, including mention of the creation of a National Institute for Learning Outcomes Assessment, staffed by such luminaries as NSSE creator George Kuh. As usual, the comments after the article are as enlightening as the formal presentation. It's hard to know if comments are in earnest or simply for the lulz, but some are certainly provocative. Take Robert Tucker's comment (if it is indeed Mr. Tucker of, as advertised--the comments are self-identified):
Sidebar to whiney profs. Should you ever need brain surgery (or a car, or HDTV, etc.), let us know, we’ll hook you up with someone who shares your beliefs; i.e., outcomes and impact can’t be measured (except by you when you decide how well your students are doing) and inputs are sufficient to assess quality.
I couldn't find any assessment advice at, but the company advertises itself as a higher ed consulting business (apparently specializing in adult ed). Taking the comment at face value, it asks us whiney profs to understand that clear outcomes are determinable and desirable when employing a neurologist or when buying durable goods. That's certainly true enough. The implication is that this should be also true of educational products. "Steve's" adendum to Mr. Tucker's comment extends this dubious logic. If you look closely, you can see the flailing arms.
[E]ven those who insist that what Higher Ed does cannot possibly be assessed still seem to want to have doctors who are licensed, lawyers who've passed the bar, and plumbers who've demonstrated their competence.
This is a common confusion, I think, and the reason for highlighting these embarassing passages. The passing of a bar exam and demonstrating competence are two different things. Would you rather fly with a pilot who's passed all the fill-in-the bubble tests, or one who's landed at your airport successfully many times? I'm quite sure all of the Enron accountants were fully certified, and that malpractice is accomplished by doctors who passed their boards. I'm sure there are incompetent lawyers who've passed the bar. A plumber is competent if he or she makes enough money at it to stay in the plumbing business. A test result may have some correlation with demonstrated competence, but they are not the same thing.

"Steve's" closing comment is sadly indicative of the general lack of critical thinking exhibited in the debate on assessing critical thinking (and the like):
20 years from now: Consumer Reports will be assessing the quality of BA degrees, right along side washing machines and flying-mobiles. Parents will ask, "why should we pay 3 times the cost when Consumer Reports says that there is only a 2% increase in quality?!"
This is one of those statements that it makes no sense to even dissect; if you believe it to be true, then no amount of argument is going to change your mind. This sort of epistomological black hole seems common in American society. If you haven't read it, take a look at Susan Jacoby's The Age of American Unreason--an excellent book on the declining rationality of the body politic.

The kind of "measurement of learning" that could actually lead to accurate statistics in Consumer Reports is a very high bar. Let's agree to call that the Hard Assessment Problem. This is in contrast to the sort of thing that goes on in the classroom, where local assessments are used to improved pedagogy, content selection, and program design and coordination. Let's call that the Soft Assessment Problem. There are many successful examples of the latter. For hard assessment, I think we should be a lot more cautious about making claims. In an Education Week article (available here without subscription) Ronald A. Wolk, Chairman of the Big Picture Learning Board, lists five assumptions he claims are the root of education's ills. The first two are related to hard assessment and more generally a positivist approach to education:
Assumption One: The best way to improve student performance and close achievement gaps is to establish rigorous content standards and a core curriculum for all schools—preferably on a national basis.

Assumption Two: Standardized-test scores are an accurate measure of student learning and should be used to determine promotion and graduation.
Proponents of hard assessment ought to be informed by other kinds of "measurement," like ratings of companies and their bond issues by large well-paid companies like Standard and Poor. Their project involves mainly crunching numbers to see what the prognosis is--very much more quantitative than assessing learning. You'd think the success rate would be pretty good. I think the recent financial mess indicates clearly that such ratings are not very good.

If hard assessment is possible, why don't we test parents of their parenting skills? Anyone who flunks gets their kid hauled off to the orphanage. Is there any test that we would invest that much confidence in? I hardly think so.

I actually sat down to write about something tangential, so here's the segue: think of the hard assessment problem backwards. If we can truly understand what goes into the measurement of a phenomenon, then we should be able to reproduce that phenomenon by varying inputs. For example, we can put a sack of potatoes on a scale and weigh them. We can equally add sand to the scale until it weighs the same amount--we can reproduce the measurement through artificial means.

To see where this is going, let me ask this: is being able to do integral calculus a skill that requires critical thinking ability? Before you answer, consider the contrapositive: without critical thinking ability there is no solving integrals. Not coincidentally, this train of thought happens just as Wolfram's Alpha launched. I asked it what the integral of 1/(x^2+1) was:

The mindless web service came back with the correct answer, a couple of graphs, and a Taylor series for the solution, just in case I want a polynomial approximation. Very nice! Was critical thinking involved? Well, not unless you want to concede that computers can think.

You may complain that I've cheated. That of course there was critical thinking involved in the construction of the program, just like a reference book itself cannot think, but is the product of many hours of cogitation. The computer itself is unthinking, and hence cannot think critically or any other way. But in this case, I think you would be wrong. In essence, the programmers have taught the computer to operate just like first year math students. The processes (algorithms) are the same, it's just that the computer is very fast at doing them. Either the CPU qualifies as a critical thinker, or critical thinking is not required to solve the problem.

Please note that it's not necessary to think at all to create complex machines that can think. Evolution is the most powerful problem-solver yet evidenced. It's very slow, horribly wasteful, not goal oriented, and completely thoughtless, and yet it made us. To me, one of the central mysteries of the universe is how deductive processes created inductive processes, but that takes us too far afield.

This hypothesis (viz, if we can measure it, we can create it artificially) actually sounds too strong to me. But I think the distinction is useful nevertheless. We could agree to these conditions for hard assessment:
  • If I can assess some phenomenon quantitatively AND reproduce it artifically, then it is a hard assessment (not necessarily a measurement, for other reasons I've blagged about).
  • If I can assess it quantitatively BUT NOT reproduce it artificially, then let's call it an observation.
As an example, we can observe happiness but not assess it. We can observe critical thinking but not assess it (at least until AI develops). We can assess the ability to do multiplication tables or integral calculus. In the health fields, we can assess some conditions but only observe others.

This idea is a bit out of the box, and is probably useful only in limited contexts. But I think it is useful to draw attention to the transparency or opaqueness, and the complexity of assessment tasks. Focusing on reproducability is one way to do that. On the other hand, I may wake up tomorrow and think "what a stupid idea I've advertised on my blog," but that's the nature of complex debate, I fear. Even in one's own mind.


  1. David,

    As far as I can see, we share perspectives related to the importance of balancing the Janus-headed ideals of precision and realism at the conceptual and methodological levels of assessment. We also share the perspective that it can be difficult (or more) to operationalize and measure some phenomena that we can nonetheless reliably apprehend at the subjective level (although my experiments show the reliability to be less than we believe and often unacceptably low).

    We appear to take separate roads here: You seem to impute a level of inscrutability to at least some of what there is to mean by higher education's outcomes. I believe (as was overheard in a behind-the-counter employee training session at McDonald's), "This stuff isn't rocket surgery." I am less metaphysical that you appear to be when it comes to framing challenges to measuring higher education's outcomes. For the most part, what I see out there in the classroom is a lack of creativity fueled by defensiveness, arrogance, and a corresponding lack of skills in measurement science. Even some of your comments related to my post seem ad hominum. I see no need for this.

    As members of the professoriate (in my case, former) should we not be honest in acknowledging that most of our colleagues assess with a series of essays in which the T/R coefficient is considered stellar if it reaches 0.65, or with multiple choice items 45% of which do not pass elementary validity standards, are administered too infrequently, under too few contexts, and are largely inauthentic to the generalizations we would hope are derived by our students? What exactly are you defending here?

    Of course there are philosophical issues beneath the surface; some of them are unresolved and discussing them is a source of great pleasure to me. This does not change the fact that the majority of the "challenges" we face in becoming more accountable are pedestrian at the scientific level and material only at the level of psychological defensiveness. Don't we want to know measurable things such as, can students analyze the mid-term financial risks in a P&L and suggest a course of remedial action; can they identify the psychological defense mechanisms illustrated in Shakespeare's works and apply them to their case load; can they explain the differences in the early and later Wittgenstein and link each proposition directly to a theory of meaning, showing its influence on modern measurement constructs? And so on . . .

    I'm sorry but as I learned at McDonald's, it ain't rocket surgery.

    Robert W Tucker
    InterEd, Inc.

    P.S. You began this post with a quote from me (I must have been sore headed that day) but your response seems to be exclusively directed at what a "Steve" said in response to me.

  2. Dear Robert, Thanks for the comment. If you browse my other musings on this topic, you'll see that I try to distinguish between assessing learning that is largely algorithmic (deductive reasoning), and CAN be assessed pretty easily with some confidence, versus learning that is more complex, like creative/inductive thinking that is not so amenable to standardized-type tests.

    There is no doubt that there is defensiveness about what goes on in the classroom, and there definitely is a need for more transparency and uniformity. I'm not defending the status quo. On the other hand, I don't think the solution is as easy as conjuring a desired measurement tool regardless of whether one actually exists or not. Brain surgery is relatively easy to assess--if the patient survives and becomes conscious afterwards, that's a big plus. By comparison, doing a lit crit analysis isn't so cut and dried--because there is no 'right' answer. Therefore, the assessments will necessarily be subjective judgments case by case. In this case, the assessment isn't of the right/wrong variety like deductive problems, but rather 'your thinking conforms to current consensus'. Actual new knowledge has to break out of this (at least per Kuhn). Two years ago a test of the question "can students analyze the mid-term financial risks in a P&L and suggest a course of remedial action" would (I hope) look different from the same one today--the 'correct' answers would probably not rely heavily on credit swaps, I imagine. The bottom line is what is the connection to the real world, and unless learning outcomes assessments can actually make and support that connection, they have no claim to be 'measurements', and we should be rightfully dubious of their claims. Who was right: early Wittgenstein or later Wittgenstein? Is that question assessable within the realm of science?

    I've looked at my post and don't see anything that appears to be ad hominum, but I apologize if anything appears that way. It certainly wasn't my intention.