Thursday, April 16, 2009

More Thoughts on Moravec's Paradox

A couple of weeks ago, I wrote about Moravec's Paradox: that many of the mental functions we consider difficult are actually computationally easy, but some things we find easy are very hard to replicate computationally. You're probably wondering what this has to do with anything. It has everything to do with outcomes assessment. Bear with me for a moment.

Let's consider some computational task, like solving a two-dimensional linear system of homogeneous differential equations. This may sound terribly complicated if you've never been trained to do it, but it's completely a cookbook exercise. Provided you know all the rules, you can proceed step by step from problem to solution. This doesn't diminish the fact that lots of students find it hard to master the calculus and linear algebra background needed to get to that point, but this makes the point embodied in the paradox. It's computationally easy, but seems hard because our brains aren't evolved to do ODEs. It's unfair to ask of biology that it pre-adapted our brains for such computations, since they've only been around for a couple hundred years.

On the other hand, it's very easy to grade problems of this sort. The computational simplicity means that each step can be checked for correctness step by step. In summary: learning is hard, but assessment is easy in this example. More specifically, assessment is easy because the whole task (solving systems of differential equations) can be logically broken up into smaller tasks, each of which is easy to assess. We can't really call this phenomenon "reductionism" because that means something else in the scientific context. Let's call it "componentism" instead. Yes, it's an ugly word. No staring, move along... Nothing to see folks, just an unfortunate etymological accident.

What about easy problems? Consider those sorts of things that our brain does for us without much effort, like recognizing letters of the alphabet, regardless of font or even noise that may obscure part of the letter. Moravec's thesis is that functions like visual processing are ancient bits of mental plumbing, honed over millions of years of evolution to a shiny efficiency.

Included in this list of highly-optimized, highly complex abilities we surely must include social judgments. I'm not sure at what point our ancestors started living in groups, or when exactly speech became intelligible, but I would posit that the former is in millions of years and the latter at least hundreds of thousands of years. Whatever consciousness is must have emerged during that time frame, I would suppose. Moreover, there would be heavy evolutionary pressure to favor genes that facilitated navigating social interactions. Included in these is the ability to make subjective assessments about another person's social status, strength, health, age, personality, and intelligence.

This is a speculative argument, then, that subjective judgments that we humans make about one another regarding matters of intelligence are deeply wired and very complex. When we as instructors, employers, colleagues, or supervisors make subjective judgments about another person's academic abilities, we're tapping into this wiring.

Complex systems are not always componentized. There is a burgeoning field of science of complex systems that demonstrates:
1. Interesting and useful behavior can be obtained from nearly chaotic systems.
2. One cannot tinker with such systems the way one can with componentized systems. They behave unpredictably when you try.
As an example of the first, some researchers are building chaotic systems for what are called flip-flops in computer engineering. The advantage is that these new flip flops have multiple states. Traditional ones only have on or off. To illustrated 2. above, it's possible to create networks of electrical components through an evolutionary process, and cause the outcome to perform some useful function. But reverse engineering the resulting circuit is impossible. Any tweak sends it careening off into some new functionality.

It seems reasonable to me that our abilities to assess thinking skills in others is complex and not componentized. We may create relationships between categories of thought, like
Effective writing = addresses purpose + addresses audience + correct language
+ good style + interesting content
but that does not mean that the part of our brain that assesses effective writing can be summed up in components in this way. As a trivial example, not all jokes are funny to all people. (Check out the world's funniest joke here.) Individual tastes and proclivities vary greatly, and there's no room in a components analysis for this sort of thing.

The movie industry makes a good example. There is huge economic incentive to make a movie that will sell. There is a years-long history of what works and what doesn't. There is a well-established theory of how to tell a story (usually in three acts in a 120 page script). Undoubtedly there are component analyses of everything that can go into making a movie, and yet few movies are blockbuster hits. Why is that? It seems to me that it's the same problem--what are imagined to be the components only poorly approximate the effect of the whole experience of watching a movie.

If this conclusion is valid, if a subjective judgment provided easily by our on board wiring has only a poor relationship to the components we try to assemble it from with formal assessments, what then? First note that this does not really diminish the importance of the components. Style is still important to writing, even if we can't quantify exactly how. And when we teach, we have to teach something specific. So it's all well and good to dwell on components and make rubrics of them if you find it useful. But we should not then assume that we have "solved" the writing assessment problem, because this is likely only addressable on the whole.

We might agree to agree. By working hard, we can increase inter-rater reliability of our social judgments using a set of components and averaging those ratings (shudder). But agreeing to agree means that we subvert our original impression, sacrificing validity.

What do you suppose the inter-rater reliability is in the wild? Ask three people who know a friend of yours to rate his or her ability to drive. There is probably consensus, but it's probably not perfect. It doesn't need to be perfect in order to communicate. If we insisted on high inter-rater reliability for all the things we talked about in normal life, it would be a disaster. Part of the reason is that you don't need anywhere near 100% correlation in order to communicate effectively. The other, equally important part, is that disagreement is where new information comes from. This supposes that we can have a dialogue to synthesize our differing points of view, and should be a strength of the academy (even if it's not).

We already come wired with the ability to deal with problems of differing levels of complexity. For algorithmic problems, like completing the 1040 tax form, it may make our brain hurt, but we can wend our way through the components step by step to find the refund amount. On the other hand, we have wonderfully complex and intricate ways of making quick decisions about fuzzy subjective sorts of things, like how well someone drives, or how good they are at solving problems, or how reliable they are as a friend. If we don't acknowledge the difference between the two sorts of assessment problems, we are relegated to a kind of low-complexity shadow world, and are likely to be surprised when the rest of the world (outside academia) doesn't see the value in our fancy numbers and charts depicting a student's ability to, say, resolve ethical dilemmas.

There is an opportunity to see what the difference is between the actual assessments (outside of the academy, where it matters) and our component analysis. This sort of research would lend credibility to the enterprise, and we might find that a few components actually do come close to approximating the holistic subjective ratings we internalize every day as a matter of course. But I find it impossible to accept as an article of faith.