Thursday, June 18, 2009

Assessment from the Faulty (sic) Perspective

Faculty members, like people, come in types. These various species are described in administrative handbooks kept locked away from prying eyes—the same tomes that carry the secret works of wesaysoism. This ancient practice is rumored to have started with Julius Ceasar as he crossed the Rubicon and continues to this day in modern garb. Those who govern, tax, administer, accredit, and enforce abide by these proven precepts, modernized and abstracted as they have been by the likes of von Neumann and Nash in the last century.

According to dusty volumes, one faculty archetype is the curmudgeon, described as “thou who distaineth” according to a heavily inked codex, one of the few works to escape the flames at Alexandria. Administrators spend a lot of time complaining about these robed and hooded stumbling stones—almost as much time as the faculty spends complaining about wesaysoism and the lack of parking spaces combined. This is all for good reason, as the established practice of the academy has an even older and hoarier history, stretching back to Plato and beyond. Thus does irresistible force come to bear upon the deeply philosophical and generally immovable objection. Perhaps no better contemporary example of this conflict exists than that caused by the mandate of accountability, which implies the ability to justify ones practices by the outcomes they produce. To which the curmudgeon may look up from his brewing treatise on the transcendental ontology of post-feminist non-euclidean hypersquares and say “well, we made you didn’t we?”

For the representative of the administration, it is important to understand the perspective of these faculty members, so that great things may be accomplished. In particular, deep knowledge from the codex is essential to the Assessor, that species of administrator who is tasked with grinding documentation of learning outcomes into report sausage to be consumed by accreditors. It is essential to the Assessor’s mental health that he have good rapport with at least some of the teaching faculty, and this includes the ability communicate effectively. Unfortunately, this is often hampered by conflicting motivations, expectations, and even vocabulary. Fortunately, the recently-recovered codex entitled “Assessment from a Faulty Perspective” (one presumes that a tired scribe misplaced the ‘c’) has elucidated some of these tensions. The topics forming the glossary found below were carefully chosen from the codex and updated for modern sensibilities. For example ‘Assessor’ now replaces ‘Inquisitor ,’ but could equally well be Assessment Director or Instititional Effectiveness Director. Experts, using a rubric from the codex itself, have rated the translation 4.5 and guaranteed its validity. In layman’s terms we might consider the contents New and Improved. Without further explanation, we present:

GADOT Government, Administration, Deans, Other Trouble. Any government or other regulatory body, university committee, task force, or bureaucracy that makes trouble for faculty. To the faculty this is generally a source of frustration, as in “waiting for GADOT”. To the assessor, this is the font of wesaysoism, the ultimate authority to be used wisely or not, but to get things done.

Academic freedom. Academic freedom is to faculty what statistical goo is to the Assessor—a universal way of making annoying problems from GADOT go away. Its effectiveness varies, however, and tends to weaken when used repeatedly.

Assessment. This word is almost never defined before being used, probably in the assumption that everyone knows what it means. In practice, some will use it interchangeably with ‘measurement’, and others to imply a fuzzier sort of observation. The very use of the word often distinguishes an act as a separate activity, as in “I was teaching, but now I’m doing assessment.” As a result, its use should be limited, except as an irritant. For example “have you done your assessment report?” is particularly good at annoying someone. This is because as the Assessor, you almost certainly would have known if it were finished, and so it’s a passive-aggressive way of saying “go do your assessment report.” Faculty will often use the word as an expletive, often combining it with other words to resemble actual bureau-speak, as in “That belated assessment expects it in by Friday!” The etymology of the word is interesting, having begun as an onomatopoeia for a sneeze (ah-SEZment), which was replaced by “achoo,” “hazie,” and other contemporary versions only recently.

In order to minimize the Assessor’s pain in what will be—quite frankly—a long period of suffering in assembling even rudimentary learning outcomes reports, the word ‘assessment’ should be only whispered if it need to be used at all. The intent is to encourage faculty to think:

assessment = teaching + paying attention,

which can be easily remembered with the mnemonic “Auntie Theresa prefers artichokes.” Paying attention means setting clear expectations and then trying to figure out later if they were met. Commonly, this takes the form of writing down on slips of paper what the purposes of a course or curriculum are.

As a concrete example, Calculus I has a long list of content topics: limits, sketching slopes, computing derivatives, related rates problems, and so on. Paying attention in this case might mean that the teaching faculty coordinate to choose a textbook, syllabus, and final exam. At a minimum they discuss what the course should deliver. Ideally this would be written down somewhere in a list. After they’re finished, the Assessor can cleverly copy and paste this list into a document and write “learning objectives” across the top. Above that he or she can write “assessment plan.”

Of course, this is only half of paying attention—we might call it assessment foreplay. The other half takes place after at least some of the course has been delivered. Here the objective is to see if students performed better than soap dishes would have in a controlled study (some use boxes of rocks instead for comparison). At this point, a terrible migraine-inspiring fog can easily set in for the faculty, and the Assessor must be most adroit. To describe and understand the consummate act of assessment, we must be familiar with a few more terms of vocabulary, however. So we will first consider those, and return to the topic of “closing the loop” to rejoin Auntie Theresa.

Measurement. To the faculty, who are often highly educated and have knowledge of science and other practical experience, “measurement” connotes units and standards, universal methods of comparison, and a datum that relates the physical world to a number in some reproducible way. For example: “I measured the wine in the bottle, and it was only 740ml.” To the assessment profession, it means an estimate of a parameter for some (generally unknown) probability distribution for some set of observations. To this parameter, the Assessor often assigns an imaginative name like “thinking skill.” Mereological fallacies such as this (considering a part as the whole phenomenon) are often the result of over-reaching “measurements. ”

The common definition of measurement and the one used in assessment are so far apart that there are bound to be digestive problems on both ends of the conversation. It is therefore recommended that the Assessor only use this word in situations where the audience will not ask too many discerning questions—board meetings or legislative councils, for example. In those cases, it sounds better to say “we measured our success at 90%” than to say “we can say with utter statistical confidence that we may have made a difference.”

The real problem with thinking of assessment as measurement is that it implies that it is a scientific endeavor. The intrepid Assessor who strikes a path through the waters of empiricism will quickly find out what centuries of Human doubt and superstition attest to: science is damned hard.

We might call measurement the “hard” assessment problem. It can be profitably used for modest assignments. For example, suppose a goal of a course is to teach students to swim. Counting how many laps students have swum without having to be hauled out by the lifeguard would be a useful number to know. We can call this a measurement without waking up with night sweats. The tip-off is that there are clear units we can assigned: laps per student. This is a physically verifiable number. Similarly, attendance records, number of papers (or words) written, typing speed, and cranial circumference are all measurable in the usual sense. Note that these all have units.

Sometimes a commercial standardized test will be mentioned in the same sentence as measurement. Beware of such advertisements, as they are no longer talking about things that can be laid alongside a yardstick or hoisted on a scale or timed with a watch. The use of the word “measurement” in this case really should come with a long list of disclaimers like a drug ad for restless fidget syndrome—very likely to cause cramps when used in outcomes assessment. Although toxicity varies, and in moderation these instruments may be used for useful purposes, they are by and large producers of statistical goo, which we address next.

Statistical Goo. The topic of statistical goo is large enough for its own book, so we will confine ourselves here to the main applications. There is almost no data problem that statistical goo cannot fix, and the reader may want to consult Darrell Huff’s classic How to lie with Statistics for a more comprehensive treatment. It is a double-sticky goo, this numerical melange, both useful and a hindrance in turn. First we shall see how it can be a problem for the Assessor.

Goo generally results when numbers are averaged together, although this by no means exhausts the methods by which to squeeze all life from a data set and leave an inert glob in its place. The goo most familiar to faculty members is the venerable grade point average. The fundamental characteristic of goo is that by itself it is only an informational glob, and not useful for Auntie Theresa’s assessment purposes. We have not yet got to the closing the loop topic, but this paying attention activity hinges on the usefulness of the data at hand. And goo is just not very handy for this purpose.

As an example of how faculty can easily run afoul of goo, consider the following tale. Following the Assessor’s prescription, Alice and Bob discuss course objectives and agree on a common final exam. Once the exams have been scored, they sit down again and compare results. Alice’s average is 67% and Bob’s is 75%. What do they learn from this?

Well, in a word: nothing. Clearly, the goo has got the best of them if this is the level of detail they are working with. Even looking at individual student test scores is little help. What’s really called for is a more detailed analysis of student performance on different types of problems, viz. those tied to the original objectives. Goo occludes detail like a milkshake on your windshield. The lesson for the Assessor is to think thrice before averaging anything. The assessment professional who takes goo seriously as useful data is likely to have a career that is nasty, brutish, and short.

On the other hand, goo can be useful to the Assessor when writing up reports that won’t actually be used by anyone. Top level administrators, accreditors, and the rest of GADOT generally don’t have a lot of time for details, so feeding them a diet of goo is a convenience. If the resulting graphs have lines going up, the Assessor can claim credit for this putative success. If, on the other hand, the graph heads down or meanders, the Assessor can point to this as an area that needs improvement, and therefore a triumph of his cleverly designed assessment process in identifying this “problem.” Goo is like WD-40 and duct tape combined—there’s really no argument you can’t make, given enough data.

Later on we’ll take up the more advanced topic of anti-goo.

Learning outcomes. This term is viewed with suspicion by faculty, who feel that they already know what learning outcomes are and that they see them every day in the classroom. Terrible confusion lurks here for the unguided faculty member, who tries to implement “assessment.” In the worst case, he or she will expend a lot of effort and produce little to show for it, and feel punished later when the Assessor breaks the news. The key idea is that the outcomes have to be observable in some way or another, and eventually the faculty have to sift through these observations for pearls of wisdom. The more specific and detailed the outcomes are, the easier this is, but also the more objectives there must be.

Specificity is attractive because learning is “local.” For example: you can’t teach someone to “do math;” that’s far too general. You can teach someone to factor quadratics if they have the right background. Without further resolution “math” is like statistical goo—a generalization that is useful in some circumstances but not others. The trick to making useful learning outcomes is to find ones that are general enough to permit a small enough number to manage, but not so general as to become too gooey to be of use.

The danger to the Assessor is surplus enthusiasm. In his drive to reach the top of the GADOT org chart, and perhaps awakening a latent case of positivism, he may turn the identification of learning outcomes into a behemoth of bureaucracy, creating “alignments” and levels of competency and purchase expensive data systems to instantiate this educational cartography in inexorable logic. There is a certain attraction to this sort of thing—and it’s probably related to the same overreaching that leads to dreams of measurement for things that can’t be assigned units in the usual sense of length or weight standardization. With patience and a proper diet, the Assessor can overcome this condition. As someone once said: “theory and practice are the same in theory but not in practice.”

Alignment. Having broached this topic in the previous entry, we should address the topic of curriculum alignment properly. This is almost always evidenced by a so-called matrix—a grid with courses along one side and learning outcomes along the perpendicular. Creating the matrix may be referred to as curriculum mapping. On the surface, this seems like a puzzling idea. After all, if a learning outcome is assigned to one course, why should it appear in another course? Is that not a simple admission of failure? And if the learning outcome represents a more advanced level in one course than the other, then isn’t it qualitatively different and deserving of a new name? These are some of the paradoxes that can confront the unwary Assessor who tries to explain curriculum alignment to the faculty. As usual, these matters are best explored with simple examples.

A simple outcome, like say “factoring quadratics” clearly should not appear as a learning objective in more than one sequence without some explanation. If the math curriculum contains this objective in every major course, it would be terribly inefficient.

On the other hand, “analytical thinking” might well appear in every course in the math curriculum, just like “effective writing” might appear in every course in History. The difference is the generality of the objective, of course. The distinction between specific and general objectives is a very important one, because they have to be treated differently in both assessment and in closing the loop.

Rubric. To the faculty member, a rubric is often considered an accoutrement to a course, or perhaps a spandrel—an architectural detail that supports no structure, but is at best merely decorative. To the Assessor it is the sine qua non of classroom assessment—the results to be pored over to divine the minds of a mass of learners. At worse, rubrics become just another kind of grading. At best, they guide the intelligent dialogue that is the engine of change.

Dialogue. This is a better word than “meeting,” which will cause half of your audience to sneak quietly out to the nearest bar, and the other half—who love meetings—to sharpen their digressions and sidle closer. Whatever the name of it, focused discussion among peers with a common interest is the best assessment tool in the Assessor’s kit. As it pertains to the previous topic, dialogue can help identify objectives that are too general. The crux of the matter is that discussion among experts creates a social engine for intelligent change. Fostering this dialogue is the primary goal of the Assessor.

Hard and soft assessment. We are now ready to return to the topic of assessment with a better perspective, to consider what is it that “paying attention” means in two contexts: hard assessment (which we can sometimes legitimately call measurement) or soft assessment (which is subjective and prospers through dialogue).

We should acknowledge that everyone does soft assessment all the time. Sentiments like “he seems sad today,” or “Jonah gets seasick,” or “that waiter was mean to me” are all soft assessments—even though no one calls them that—which convey meaning because of the wonders of language. That is, even for abstract or difficult concepts like love, hate, intelligence, and happiness, these words communicate real information to the listener. This can only be possible because there is enough reliability (agreement among observers) for information to be communicated. Moreover, this reliability suggests that there is some connection between concepts and reality. This can be interesting and complicated, as in “Paris Hilton is famous,” which is only true because we agree it’s true. But for purposes of this discussion, let’s make the leap of faith that if we observe that “Stanislav is a poor reader,” and we get some agreement among other observers, that this is a valid observation. It is, after all, this sort of informal assessment that allows society to function.

This approach has some almost miraculous advantages. First, the Assessor doesn’t have to explain the method—faculty already do it. If you asked them to list their top students, a group of professors can generally sketch out who should be on that list—provided that they all have experience with the students. If you’ve ever witnessed a whole department trying to grant the “best” student an award, and watched faculty members try to describe how good is a student to someone who hasn’t had direct experience, you’ll understand the paucity of our ability to communicate the rich observations that we assimilate.

Soft assessment is the primary means of closing Auntie Theresa’s loop, which we will come to in a moment.

Hard assessment could be measuring things—counting, weighing, timing, and so forth. As we’ve seen in the section on measurement, much of what passes for measurement is really just a sterilized type of observation, which falls far short of measurement, as it’s commonly thought of, but simultaneously lacks the direct experience richness of personal observation. But even if we don’t call it measurement, we can call it hard assessment to reflect the fact that it is difficult, brittle, and inflexible. It still can be useful, however. If we think of hard assessment for the moment as a standardized test or survey, we can use such instruments for general or specific learning outcomes. The key is the complexity of what’s being tested or surveyed. Relating the results back to the classroom is generally the hard part for hard assessment. Supposing you know how many books students check out on average from the library—an indirect measure of literacy, perhaps. How do you improve your teaching based on that? This problem is common, and a good example of:

Zza’s Paradox: We can assess generally, but we can only teach specifics.

There is an inherent asymmetry between teaching and assessing—we can observe far more kinds of things than we can change.

Anti-goo. A type of hard assessment is an subjective survey about student performance--just ask faculty in an organized way what they think of the thinking and communications skills of their students, for example. Rather than reporting these out as averages, use frequencies and maximums and minimums. If averaging creates goo, these simpler statistical data are the anti-goo.

Grades. To faculty, grades are a necessary evil to make GODOT happy, but one with a long tradition, and now accepted as a given. To Assessors, grades are useless, unless finding validating statistics with a new standardized test, in which case correlations with grades are printed on the marketing material.

Reliability and validity. Some faculty members may use these words to annoy assessors by pointing out the flaws in designs of tests and surveys. Often assessors will use the same terms as a stick to advertise some standardized instrument, for which these words are the equivalent of “new and improved” for consumer products. Except for the simplest objects of study, reliability almost always comes at a cost in validity. You can train raters to call a dog a cat and get high reliability, but what about those who aren’t in on the secret? How will they know that dogs are now cats? They don’t, in fact, so the new private language has limited utility. Faculty members, being intelligent people, intuit that we get along just fine in normal language without any attempts to improve inter-rater reliability, so will view suspiciously any attempts to subvert their onboard judgments with agree-to-agree agreements.

Ratings. Faculty look at average scores and shrug. What good does it do to know that sophomores are .2 points better at “subversive thinking” than freshmen? To the Assessor this is often important, so he or she can create graphs with positive slopes to highlight on reports. It makes him look good. But ratings can be reported out usefully—just avoid averages. Use proportions instead. Say two-thirds of Tatianna’s results were considered remedial, rather than that she scored 1.7 relative to a group average of 2.2.

Closing the loop. To faculty, this means scrambling at the end of the year to put together reports that will make the annoying emails from administrators stop piling up in their inbox. To Assessors, closing the loop is the holy grail—a chalice of continual improvement, as it were. In her haste to nudge the faculty in the right direction, the Assessor will often emphasize the wrong things, such as rubrics and ratings, and hard assessments, and try to push the faculty into a mindset of a pickle factory trying to increase output. This is annoying to the faculty and ultimately self-defeating. The result the Assessor really wants is evidence of change. The assessment data will almost always be fuzzy, there will very rarely be headline screaming across the IR report: DO THIS! Here’s the step-by-step process:

1. Get faculty to sit down together and talk about the learning objectives. Call them goals, so as not to arouse their suspicions, and have plenty of alcohol on hand.

2. Provide someone to take minutes if you can, or be there yourself. In any event, someone needs to record the salient bits.

3. For each goal, dig out and discuss any information—opinions, ratings, grades, or other evidence—that bears on the goal. Let the hive mind do its job and come up with suggestions. Some won’t be very practical, but don’t worry about that. Sit with the chair or program director and see if you can get agreement on a few items for action. Most likely the faculty will have decided this themselves. 5. The hard part will always be to get them to write this stuff down, even when they do it successfully. Provide all the support you can there for the lazier groups.

This works for a model of continuous improvement. If you are stuck with a ‘minimum standards’ model, where ‘measurement’ is taken seriously, then all is probably lost. Get a prescription for Xanax and listen to soft jazz while you read the want ads.

Conclusion. It’s really in everyone’s best interests if the faculty and the Assessor are not at war with one another, and deploying the nukes of wesaysoism and academic freedom. There is a moderate path that creates the least pain all around, and as a bonus has a good chance of improving teaching and learning.

[Note: This was originally written as a chapter in a book, but ended up here instead. I never did final editing on it, and didn't add hyperlinks.]

No comments:

Post a Comment