Saturday, February 07, 2009

A Category is a Convenient Fiction

Negotiating any profession's important issues and solving the problems encountered in its employ is complex. Higher education is particularly complex because of political hothouse nature of academic thinking and structure, its mind boggling variety of missions and methods, the tension between resistance to change and the need to meet new challenges. As I've taken on different roles over the years, and been presented with new problems to solve, I've found that some of the most useful, most universally applicable tools are these:
  • Theory of evolution
  • Game theory
  • Information theory
I'll briefly touch on the first two, and illustrate a particularly nasty error that can be detected with the third--and explain the title in the process.

Evolution in the broad sense means that things are the way they are now because of processes and conditions that have existed in the past. Ontology recapitulates philology, perhaps not in the literal biological sense, but in essence. This perspective allows one to look under the surface of things and ask the five whys about an existing situation. This is a kind of archaeological dig, or sequencing of DNA (whichever metaphor you prefer).

The results of evolution are messy, redundant, and complex. There's a good reason for this (see my article on survival, for example)--it works. The tax code is a good example of an evolved artifact. Organizational structure is another.

Game theory [wiki] is a master key for helping to understand human interactions. In combination with evolutionary ideas it's even more powerful (see The Selfish Gene by Richard Dawkins). Understanding a process like budget allocation is much easier, I believe, if one thinks of the transactions as formalized in a game theory context. For example, no matter how idealistic a process for dividing the loot is, it will always be easier to give away money than to get it back. Therefore budget directors will always be trying to pad their budgets. Rather than try to combat this, it's better to put people you trust in charge. This is the same debate as planned economy vs free market on a micro scale. (Maybe that's a bad example these days.)

Information theory has many aspects, but the most charming is the idea that in a given language some statements are concise and others are verbose. This is probably obvious, but what may not be so is that we can actually measure this. In common language, frequently used words tend to be short (yes and no, for example). Imagine if 'the' was a three-syllable word!

Bureaucracies speak in languages with processes and documents comprising the grammar and vocabulary, and in conjunction with local languages to facilitate transactions. If I say a student was admitted provisionally because of his high school GPA, someone familiar with higher education will understand because of his/her familiarity with the semantic field of higher ed speak.

Some of the lessons I've learned from using information theory in the wild are:
  • Generally, simplicity is better than complexity. Complexities are inefficient, and as a result there is an evolutionary tendency to simplify them through compression. As a trivial example, I started calling the "Administrative Council" AdCo in my emails because I was tired of typing all that. More serious examples include abbreviating some work flow process to cut out complexities. Unfortunately, these may be important, and its better done deliberately than through the whims of evolutionary practice.

  • As a corollary, it's nice to be able to recognize randomness when one sees it. Data is random if it can't be compressed further. Usually we reserve the word for high complexity randomness. For example, recognizing that the fate of students who come to the university is largely random is important. More about that in a minute. I would also argue that the result of most committee interactions is largely random. Different people at a different time would produce different results most of the time. This realization leads me to believe that committees are best employed when one wants to get something/anything done, and NOT when one wants to get a particular thing done. That provides a natural litmus test for assigning a group of people or a single director to a project.

  • Finally, understanding a bit of information theory allows one to recognize when data has* been compressed. I'll explain further.
Every noun represents compressed data. Plato's ideals are just that--but they exist only in our minds, approximating the real world in its forbidding detail. Without such approximations, it would be impossible to process thoughts. Every contingency would breed more until every neuron clamored for attention with some interesting tidbit, and we drooled ourselves into the shallow end of the gene pool.

A useful idea is that of low vs high frequency components of a message. This is intended to be a metaphor here, but it's very literal in the world of communications engineering. Low frequencies are the bits that don't changes much. You're safe in assuming them. If I saw "I saw a cat" you'll imagine a four-legged critter of a particular shape, with fur. That's the low frequency component. If you want to talk like a geek you can call it the "DC bias." The high frequency bits are the details. You might ask me what color or how big the cat was, or what it was doing, or where I saw it. This is the detail that gets compressed away when we want to communicate quickly. Some people, unfortunately, have a communication style that ladles on information regardless of frequency. This kind of thing: "I was going to the mall, driving on the interstate, when the phone rang. It was Lois. I talked to her for a bit about the birthday plans for Saturday, and when she was about to hang up, the car started driving funny. So I pulled over and found I have a flat tire. Can you come and get me?" I'll leave it to the reader to distinguish the DC bias (the most important bit) from the high frequency chatter.

It's not by coincidence that high frequency information is often referred to as noise. The signal to noise ratio is a a measure of how much patience you have to have to get what you're interested in.

Categories are convenient fictions because they summarize a description of data into compact form. A category is the DC bias for some more complex concept. And categories (or nouns in general) are all hooked together so that it's easy to make predictions or reach conclusions given some simple set of information. For example: "Johnny is a lazy student." Reading this, we can reach all kinds of natural conclusions about Johnny. He's less likely to be on the Dean's list, probably doesn't spend much time studying, etc. We extrapolate all this high frequency stuff from the simple description. It may not be true.

Therein lies the rub. What if the DC bias isn't very large in comparison to other parts of spectrum? That is, what if the high frequency bit is more important than the bias? Here's a very practical example.

A student who fails to meet admissions standards can sometimes be admitted as a provisional student. Typically these are watched carefully and given lighter loads that regularly admitted students. The provisional students are put in this category because of their high school grades, ACT/SAT or other information used by admissions. But how important is the information contained in this categorical description, compared to all the other things we might want to know about the student?

One form of validity for a concept is predictive validity. Does the application of this label to this situation help you understand its future evolution? Does an astrological sign help you understand the professional future of a student? No--this method has no predictive validity. So the astrological sign as a low frequency piece of information is useless. But because we're so used to dealing in compressed data, it's easy to fool ourselves. Back to the example of the provisional admit.

In fact, about half of the provisional admits (in my research) will do quite well. And if you compare them to the regular admits at the bottom end of the qualification range, their futures are comparable--about half of the latter will fail. So the information conveyed by the category "provisional" isn't nearly as useful as it seems. The reason is that there is little predictive validity for the term. High school grades plus standardized test scores might, on a good day, explain 30-40% of first year grade variance in students. What about the rest? This is high frequency data missed by assigning the category.

With this perspective, a better solution is obvious. Instead of creating special programs for provisional students, half of whom don't need it, and missing all the regular admits who do, it's much better to have a solid intervention program that can accommodate any student in trouble. After all, it is more likely to be for non-cognitive reasons that a student fails (the 60-70% of unexplained variance). A bad roommate, sour family situation, poor work habits, etc. may be more important. It's just as valuable to catch the 1800 SAT student who has problems as it is the provisional one.

The general lesson is to occasionally look under the hood of categorical descriptions to see how important that DC bias really is. It's convenient to talk in compressed symbols, but maintaining a somewhat skeptical attitude about conclusions based on this syntax is recommended. Keep in mind that the predictive validity may be quite low, and it may serve your purpose to delve into the high frequency part of the spectrum now and then.

There is a downside to this approach. Debating predictive validity is sometimes done unproductively in committees--especially large bodies like a faculty senate. There's a tendency to examine details when they're not important. A good policy may be debated to death and tabled because of a random impromptu investigation of all possible contingencies. This is paralysis through analysis. You can 'what-if' any proposal to extinction--it's an easy game to play, and a common political tactic to delay or kill one. I don't know of a good defense against it, but at least I can recognize it for what it is: noise.


* Yes, 'data' is plural, but nowadays it's interchangeable with 'information' which has no plural, so what are we to do? "Data is" sounds better to my ears than "data are." Don't hold me to that, however. As Emerson noted, a foolish consistency is the hobgoblin of little minds. As a humerus aside on the topic, I saw an advertisement for a GPS once that boasted that the unit could hold "1024 datums".

No comments:

Post a Comment