Higher Ed/: Assessments, Signals, and Relevance

In "Tests and Dialogues" I promised to address the use of rubrics, which I get to. But before I do, I want to extend the ideas presented in the last few articles. By coincidence, my daughter provided an example the same day I wrote the article.

My daughter Epsilon had a math test and a French test yesterday, so naturally I asked how they went. She had spent quite some time reviewing (with my imperfect help in the French class), and said that the tests were easy except that she forgot what the degree of a polynomial is (ugh!). She said she was able to guess at some things she didn't know, which made my eyebrows rise. Guess? Sure, she says, it's almost all multiple choice. Here I began to sputter. What?? Algebra and French...multiple choice? Yes, says she, it's because of the EOCs. That would be the local name for the monological "End of Course" state tests. Since the EOCs are multiple choice, and there is so much weight put on them, it makes economic sense to optimize all testing to resemble "the ones that matter." She's 14, and this is old hat to her by now.

The wrong assessments plus a factory mentality optimizes local relevance at the cost of global irrelevance. David Kammler wrote a marvelous parable in this vein, "The Well Intentioned Commissar." Achieving goals can be inherently very complex. When we try to grasp their workings by simplifying cause and effect (e.g. in order to manage like a factory), we can lose important information. This is detrimental when optimizing the simplified problem is not the same as optimizing the original problem. The impact is not merely academic. I read a story in The Economist years ago that went like this:

A state in the US was spending more on road repair that it thought reasonable, and sought to make the situation more equitable by passing cost onto the owners of the heavy trucks that were doing the most damage. So they instituted an axle fee--the more axles the truck had, the higher the cost to the truck owner to use the roads. This was a simple approximation: the heavier the truck, the more axles. What could go wrong? The outcome was that truckers, not being stupid, started using trucks that carried just as much weight, but on fewer axles. This increased the ground pressure of the trucks (same weight over less area) and damaged the roads even more than before. In this case they didn't merely optimize irrelevance, but actually exacerbated the problem they were trying to fix.

Even being irrelevant has an associated opportunity cost. The time spent learning how to game multiple choice tests could be better spent. We can only imagine what the long term cost is, when students finally figure out that real problems don't come with a built-in 20% chance of guessing the right answer.

It is not a coincidence that all of this ties together with the idea in "Self-Limiting Intelligence," where the problem I tried to illuminate about intelligent systems is that self-change easily turns into self-deception. My last couple of articles have mostly ignored the fact that there are powerful motivations lurking behind official definitions. Here's an example of how motivation subverts definitions from theNewspaper.com, which calls itself "a journal of the politics of driving."

Automated ticketing vendor American Traffic Solutions (ATS) filed suit Tuesday against Knoxville, Tennessee for its failure to issue tickets for turning right on a red light -- and that is costing the company a lot of money. A state law took effect in July banning the controversial turning tickets, but the Arizona-based firm contends the law should not apply to their legal agreement with the city, which anticipated the bulk of the money to come from this type of tickets.

If this seems silly, here's a more disturbing example:

Judge Mark A. Ciavarella and former Senior Judge Michael T. Conahan are accused of taking $2.6 million for sending children to two [correctional] facilities owned by Pittsburgh businessman Greg Zappala.

Privatizing a corrections facility created an economic value for criminal offenders, which increased the supply of such through more application of the monological standard by judges. This is what juries are there to prevent--provide a dialogical check.

In higher education, the definition of educational success adopted by the policymakers is "enrollment," and "graduation," for which the state pays plenty. See my previous article "Flipping Colleges for Profit," for how that turns out in the hands of private investors who seek to maximize dollars/student. We maximize enrollment and (perhaps) graduation at very large monetary expense to the taxpayer in grants and loans to students who default at high rates. This counterproductive effect is evidence of an over-simplified index of success.

It would be understandable if you took away from the discussion so far that monological = bad and dialogical = good, but that's not the case. Systems have to function monologically most of the time. I base this on a simple argument:

Systems of all kinds exist. Some work, some don't. The ones that survive do so in part because they are motivated to survive. The actions a system takes to survive can be ultimately reduced to binary "do this/don't do that" decisions. Motivations drive those decisions based on information from the internal and external environment. This reduction of complex data into a simple binary decision we might call an assessment, and the result is a 'signal.' Pain when you stub your toe is a "don't do that" signal corresponding to the implicit motivation to avoid bodily harm.

If we put together all these signals, they comprise a language. If it works, it models the environment and allows the system to survive (eat this, don't eat that). The diagram that goes with a motivation-driven decision loop is the one I discussed in "Self-Limiting Intelligence."

The point is not that monological motivation-driven signals are bad for us, it is that we have to use the right ones if we want to succeed. Sometimes dialogues get turned into signals, as in a plebiscite or anywhere else where public opinion matters. Marketing is another example, but in reverse--working from a motivation to try to affect dialogue so someone can sell more soap. In those cases, lots of energy is spent in trying to affect conversations. The BBC's recent article "Fake forum comments are 'eroding' trust in the web" is an example.

Part of the decision about what assessment to use to create signals should be driven by the consideration that weighing it down with economic value will probably degrade the quality. This problem is ubiquitous. It includes counterfeiting, cheating, and corruption of all sorts. It even shows up in natural selection, as Darwin figured out, in sexual selection--explaining peacock feathers, for example. It is perhaps embodied in the advice "you have to fake it to make it."

In order to make a decision, a system has to process a potentially infinite amount of data for a few clues as to what will accomplish its goal (fulfilling motivations). This assessment is a massive data compression that, if it's done well, describes in signal-language the important elements of the environment relative to motivation.

An example will illustrate the point. First, let's look at the role of complexity and assessment. I took the photos below at the Cape Fear Serpentarium in Wilmington, North Carolina. (It's a fantastic place to visit, along with Fort Fisher and the nearby aquarium if you're in the area.) This is the Gaboon Viper, and I first read about it in a zoo--I think in Columbia, South Carolina. It's a big, slow snake that prefers to sit and wait for lunch to walk or hop by. If you're a small animal, the picture below shows your perspective:

The Gaboon Viper presents a high-complexity look to prey.

In its natural environment, the snake's colors perfectly fit into the surrounding forest floor. The complex patters of light and dark break up the shape of the form, so that a rodent is unlikely to correctly assess this information and form the signal SNAKE! The snake presents a high complexity visual presentation to the world, and is rewarded for this concealment by a reduced probability that a rodent will correctly assess the situation.

And a low-complexity "here I am!" to large beasts.

However, the snake has another problem. It's got a great lifestyle, sitting around waiting for dinner to walk by, but there are also large hooved beasts that are far too large to eat and impossible to get out of the way of when they come ambling by. It's a good thing to be hidden from small critters that one might consume, but quite another to be hidden from a huge monster that might step on you and break your spine! So the presentation for a viewer looking down on the serpent needs adjustment. Instead of concealment, it wants to create an instant assessment in the bovine brain of SNAKE! As you can see in the photo on the right, this is accomplished (via natural selection, of course) by white stripes that look like the center of a highway. I looked for research that actually demonstrates that cows can see these snakes better than ones without such coloration, but didn't come up with anything. So treat this as informed speculation rather than fact, unless someone can point me to an authoritative source. But the effect is real. In the family of large cats, some females have white dots on the backs of their ears so they can present a low-complexity "follow me" sign to their kittens in low light. Military vehicles do something similar so they don't run into one another in the dark, but also don't make good targets.

To continue the example:

Suppose you are going out for an afternoon hike in a tropical jungle. You check into the matter and see that there are deadly poisonous snakes in the area. Fortunately, there is a guy whose job it is to monitor the jungle nearby for such threats, and post a sign at the trail head with a warning when appropriate. You may rightfully be dubious. There is a very large tract of undeveloped forest out there, and how plausible is it that this guy--who may be the governor's favorite nephew, for all you know--could have checked for every possible snake? So you are not reassured when you see a big green NO SNAKES TODAY THANK YOU sign nailed to a tree. In effect, you've rejected the data reducing assessment and have decided to create your own signals. That is, the final assessment for "is there a snake here?" remains pending. You carefully watch where you step, tying up a large part of your mind to continually test the environment against your snake-matching perception. This is a lot of work and quite stressful, so you give it up and go to lunch instead.

The example shows the trade-off in an early or late assessment into a signal. Early signals make subsequent decisions easier. That's why most businesses like stability.

So the big question for a complex system is when to do the assessment for a given motivation. There are symmetrical arguments for and against early decisions based on limited data:

Early assessment from data to signal:

Pro: If we assess early, we reap the economic benefit similar to mass production. All decisions that depended on the first one can now go about their business. By mandating that 21 is the legal age to drink alcohol, it creates a simple environment for liquor stores, as opposed to say administering an on-the-spot test for "responsible drinking" for each customer.

Con: Creating the signal greatly simplifies the actual state of the world. If subsequent decisions need detailed information, it won't be available. Worse, the signal may be wrong entirely.
"Housing prices never fall" was an early assessment that led to a lot of unfortunate consequences.

Con: Signal manipulation for economic benefit (what we would call corruption in a government) can cause a wide-spread disconnect from reality. Adopting unproven test scores as the measure of educational success creates a false economy and doesn't reflect the actual goal.

Late assessment from data to signal:

Pro: Information isn't lost before it's needed. The example of walking in the jungle illustrates this.

Pro: Local corruptions of signals have only local effects. From a virus's point of view, it's ever-changing protein coat means that it can't be intercepted easily. On the other side, those who have to decide on a vaccine have a difficult decision to make about which variants to target.

Con: It's more costly because decisions are made individually instead of in mass production style. Every state has a different set of paperwork for allowing truckers to use the roads, which impedes commerce. Conversely, internet sales are aided by not having to worry about every locality's rules.

There's no one right answer. The cost for deferring assessment can be very high. The railroad owners finally created time zones to solve the problem of every town running on a different clock. A common currency is obviously good for everyone. Standardized traffic laws are a boon. Formalized ownership of property is a prerequisite for a modern society. All of these involve monological definitions that are somewhat based on early assessment of evidence and are somewhat arbitrary. In some cases, it can be completely arbitrary and still immensely helpful (e.g. which side of the road we drive on). Imagine what a mess it would be if we had to survey the dialogical landscape every morning to see whether most people were driving on the right or left, stopping at green lights or red, and make our adjustments accordingly.

Applying this to Higher Education
Education at all levels has to process so many students that good organization is essential. So it has to be a system, and there are going to be a lot of semi-arbitrary decisions made just to provide a workable system language. There are many, many of these, and those of us who labor within the system take it for granted that these things exist:

Courses
Grades
Credit-hours
Grade levels or course level designations
Set times for instruction (e.g. one hour lecture, which is really fifty minutes)
Set curricula
Degrees and diplomas

None of these have much directly to do with learning. The pressure to define what a credit hour means for online courses shows the rigidity of the system. It's worth a moment to look at one of these in detail, so let's follow that train of thought into the dialogical tunnel. Compare the Carnegie fifty-minute block class with how we naturally learn.

The Kahn Academy comprises a large group of tutorials on YouTube started by Sal Khan (who quit managing a hedge fund to do this), and now operated by an impressive team. The home page claims that over 87 million lessons have been delivered, with more than 2700 videos on offer covering a range of academic subjects and levels. The one shown in the picture is about long division. You can see that it's a little less than 10 minutes long. Why ten minutes? Why not fifty minutes? Some are longer and some shorter, depending on the needs of the subject.

The curriculum in the Khan Academy is not a set of courses with prerequisites. It's much more natural than that, using a "knowledge map" to show connections between the ideas taught in the videos. Here's a sample:

There are suggestions for prerequisites, but they are per topic, not per course. Each of these has associated problems to be solved, challenges, and badges to earn. There are individualized feedback reports, and the ability for coaches to be involved.

I recently signed up for a free course on Machine Learning taught out of Stanford University. The lectures were online with a built-in quiz in each one. The videos are of varying length, but none close to fifty minutes. I skipped stuff I was already familiar with and browsed topics I didn't know as much about. Once I had to go back to the introductory material to clarify a point, had my "aha!" moment, and then forged ahead. To me this is a very natural way to learn. It's not at all systematic. I would never have had the time to sit through a traditional lecture course on the subject, but with the browsing ability I have with the online presentation, I can choose what I want off the menu and maximize the use of my time.

What's the point?
If we create systems that make early decisions for learners so that we can make the logistics work, we save time in administration but suffer opportunity cost for each student. This "early assessment" model leads to preemptive decisions solidified into policy and practice: courses, semesters, grades, and so on.

A "late assessment" version would be to provide just-in-time instruction for each student so that it could be used in some authentic learning situation. By authentic I mean anything that doesn't cause the student to think "when will I ever use this?" For example, individual or group projects that require learning the content, but perhaps only a piece at a time on demand.

The early collapse of complexity into a simple bureaucratic language includes the factory-like quality checks that occur along the way in the form of (increasingly standardized) tests. This early assessment presents problems for consumers of the product: employers and the graduates themselves, and society as a whole. The educational system doesn't give much information about what the students have learned other than these formalized assessments. It's like the NO SNAKES TODAY THANK YOU sign.

Before the Internet, late assessment methods would have been too expensive to use for the whole system. Not anymore. We have the opportunity to adopt a new model whereby we coach students on how to learn for themselves. So much the better if this learning is in the context of some interesting project that the student can show off to peers and (if desired) the world. The creation of a rich portfolio along the way allows employers or anyone else with access to make their own assessments.

I do not think that the massive higher education system will change this radically anytime soon. Nor are we ready to make such a change. Something like the Kahn Academy for every academic discipline would be a massive undertaking. Textbook publishers, if they are forward thinking, may lead the way. Imagine an online "textbook" that was actually a web of interrelated ideas mapped out transparently, with video lectures and automated problem solvers associated. This frees up what I once called the low bandwidth part of the course and allows for more creative official class time. Effectively, it offloads all the monologue to out-of-class time and lets you get to the dialogue directly.

There is an opportunity for programs here and there to begin to experiment with this transition. As the examples in my previous articles show, this is already happening. In the most optimistic case, this prevents further solidification of the factory mentality in higher education by showing valid alternatives.

Higher Ed/

Wednesday, November 23, 2011

Assessments, Signals, and Relevance

No comments:

Post a Comment

Search This Blog