Sunday, May 16, 2010

Time Flies Like an Arrow

...and fruit flies like a banana.

Words are tricky things.  Jerome Kagan puts it much more elegantly in his book What is Emotion? He takes issue with loose language in scientific writing like "yeasts cooperate," writing that:
Poets possess the license to use a predicate any way they wish, as long as an aesthetic effect is produced.  Sylvia Plath can write the swans are brazen and pots can bloom, but scientists are denied this creative freedom because the meaning of a predicate for a process, action, or function always depends on the objects participating in the process.
He elaborates on this point, using an example of psychologists attributing "fear" to mice who have been exposed to unpleasant stimuli, and shows convincingly that a physiological response is not the same as an emotion.  The larger point is that processes are more important than outcomes for scientific inquiry.  Assessing an emotional state as "afraid" is not as meaningful or useful as describing the process that caused a physiological reaction that we give a label to.

It's the identification and study of processes, not outcomes, that makes the hard sciences effective.  Anyone can see that the apple fell off the tree.  Counting and sorting the apples doesn't help us understand what happened. 

Both of these subtleties bedevil the business of learning outcomes assessment.  For the most part, we only study the results (the outcomes) of the educational enterprise.  There are some studies that try to identify better pedagogies (like 'active' vs 'passive' learning), but even with these it's not possible to build a procedure for predicting learning.  The only predictions that come out of learning assessment is of the sort "we saw x in the past, therefore we think we'll see x in the future."  If a student aces the calculus final, we believe that she will be able to do integrals in her engineering classes next term.

Even this simple inductive reasoning is subverted by what we call 'feature-creep' in programming.  Tests and other assessments can never actually cover the whole range of the learning we would like to assess, so we assume that we can broaden the scope.  Sometimes this is partially justified with other research, but usually not.  The inductive assumption becomes:
We saw x on our assessments, and we predict that we will see X in the future.
Here, X is some broadened version of x that uses the flexibility of language to paper over the difference.  I made a particular argument about this concerning the CLA in a recent post.  An analogy will make this clear.
A test for flying a plane is developed, with the usual theory and practice mix.  As part of the assessment, the test subject Joe has to solo on a single engined Cessna.  This is what I've called x, the observation.  We now certify that "Joe can fly."  This is X, our generalized claim.  Clearly, this is misleading.  Joe probably can't fly anything except the Cessna with any aptitude.  Would we trust him with a helicopter, a jet fighter, or a 747?  Hardly. 
Now the same thing in the context of writing:
A test for writing compositions is developed and standardized with rubrics and a 'centering' process to improve interrater reliability.  Joe scores high on the assessment and is labeled 'a good writer.'  But can we really claim that Joe can write advertising copy, or a lab report, or a law brief, or a math proof?  Hardly.
Generalizing beyond the scope of what has actually been observed is misleading unless one has good data to build an inductive argument from.  We can predict the orbits of celestial bodies we see for the first time because we have a model for the process.  Is there a model that tells us the relationship between writing an essay and writing a contract for services?

And yet if we look at the goals set out by well-respected organizations like AAC&U, we see (from the LEAP Initiative) these goals for learning:
  • Inquiry and analysis
  • Critical and creative thinking
  • Written and oral communication
  • Quantitative literacy
  • Information literacy
  • Teamwork and problem solving

All of these are general, just like "flying" or "writing."  This is where we begin to get into trouble.  The list is fine--the trouble is when we start to imagine we can assess these and use the results for prediction.  General skills require general assessments.  Judging from his autobiography, I think Chuck Yeager was pretty good at "flying," in a general sense--he spent his whole life doing it.  But in four years (or two years) of college we don't have the time nor the mission to teach a student everything there is to know about even one of the items on the bullet list.  The list above only starts to make sense if we narrow each item to an endeavor that 's achievable in our limited time frame.  For example:
  • Inquiry and analysis in world history
  • Critical and creative thinking in studio art
  • Written and oral communication in presidential politics
  • Quantitative literacy in basic statistics
  • Information literacy in using electronic resources
  • Teamwork and problem solving in basic circuit design
And of course, this is the normal way assessment works for improvement of a course or curriculum.  If we get enough data points of this specificity, we can study how generalizable these skills are.  We can compare notes between disciplines and see if critical thinking really means the same thing or not.  This is the Faculty of Core Skills approach.

But big general tests, big general rubrics, and calls for accountability at the institutional level all assume that x implies X.  This creates confusion and tension.  Victor Borden wrote recently about this in "The Accountability/Improvement Paradox":
In the academic literature and public debate about assessment of student learning outcomes, it has been widely argued that tension exists between the two predominant presses for higher education assessment: the academy's internally driven efforts as a community of professional practitioners to improve their programs and practices, and calls for accountability by various policy bodies representing the “consuming public.” 
 I don't see this as a paradox.  If you want a paradox, here's one from last week.  A paradox is a logical contradiction--this situation is an illogical one.  Borden gives a succinct summary of it (perhaps quoting Peter Ewell):
Assessment for improvement entails a granular (bottom-up), faculty-driven, formative approach with multiple, triangulated measures (both quantitative and qualitative) of program-specific activities and outcomes that are geared towards very context-specific actions. Conversely, assessment for accountability requires summative, policy-driven (top-down), standardized and comparable (typically quantitative) measures that are used for public communication across broad contexts.
This is a brilliant observation.  What sorts of big complex enterprises are measurable in this way?  Could you do this with a large business?  Sure--they are called audits.  A swarm of accountants goes through all the books and processes and a bunch of big documents are produced.  Or a bond rating agency like Standard & Poor does something similar and arrives at a singular summative grade for your company--the bond rating.  We can argue about whether or not this is something accreditors do (or should do), but these processes look nothing at all like what passes for generalized outcomes assessment, for example in the Voluntary System of Accountability (NSSE, CLA...).  That would be the equivalent of giving the employees at Enron a survey on business practices, and extrapolating from a 10% response rate. 

It's processes, not outcomes, that matter.  We only look at outcomes to get a handle on processes.  Audits look directly at processes for a good reason.  A giant fuzzy averaged outcome is worse than useless because it gives the illusion of usefulness thanks to the flexibility of the language.  The problem derives from misplaced expectations at the top:
Nancy Shulock describes an “accountability culture gap” between policy makers, who desire relatively simple, comparable, unambiguous information that provides clear evidence as to whether basic goals are achieved, and members of the academy, who find such bottom line approaches threatening, inappropriate, and demeaning of deeply held values.
As I have mentioned before, if policy makers want simple, comparable, unambiguous assessments of goals, they need to specify goals that can be assessed simply, comparably, and unambiguously.  So I went looking at the Department of Education to see if I could find them.  I came across a strategic plan, which seemed like the best place to look.  In the mission statement:
[W]e will encourage students to attend college and will continue to help families pay college costs. The Department intends that all students have the opportunity to achieve to their full academic potential. The Department will measure success not only by the outcomes of its programs, but also by the nation's ability to prepare students to succeed in a global economy as productive and responsible citizens and leaders.
That last bit is probably the most pertinent to this discussion. We could extrapolate these goals:
  • graduates should be competitive in the global economy
  • graduates should be productive
  • graduates should be good citizens
  • graduates should be leaders
What direct measures of these things might we imagine?  Here are some suggestions:
  • Employment rates
  • Salary histories
  • Voting records, percentage who hold public office, percentage who serve in the military
  • Percent who own their own buisnesses
I had trouble with the leadership one.  You can probably think of something better than all of these.  But notice that not a single one has anything to do with learning outcomes as we view them in higher education.  You can stretch and say we have citizenship classes, or we teach global awareness, but it's not the same thing.

I find the goals wholly appropriate to a view from the national level.  These are strategic, which is what they ought to be.  They're still to fuzzy to be much good, but that's just my opinion.  I'd like to see targeted goals at science and technology like we did during the space race, but they didn't put me in charge.

Digging into the document, we find actual objectives spelled out.  Let's see how well I did at guessing.  It starts on page 25.
Objective 1: Increase success in and completion of quality postsecondary education.
To ensure that America’s students acquire the knowledge and skills needed to succeed in college and the 21st-century global marketplace, the Department will continue to support college preparatory programs and provide financial aid to make college more affordable. Coordinated efforts with states, institutions, and accrediting agencies will strengthen American higher education and hold institutions accountable for the quality of their educational programs, as well as their students’ academic performance and graduation rates.
The last phrase is hair-raising.  Hold higher education accountable for students' academic performance and graduation rates?  That's like holding the grocery store accountable for my cooking.  Sure, they have something to do with it--if the melon is rotten or whatever, but don't I bear some of the responsibility too?

That's just the objective.  Next come strategies for implementation.
Strategy 3. Prepare more graduates for employment in areas of vital interest to the United States, especially in critical-need languages, mathematics, and the sciences.
The Department will encourage students to pursue course work in critical-need foreign languages, mathematics, and the sciences by awarding grants to undergraduate and graduate students in these fields. The SMART grant program will award grants to Pell-eligible third- and fourth-year bachelor’s degree students majoring in the fields of the sciences, mathematics, technology, engineering and critical foreign languages. In addition, priority will be given to those languages and world regions identified as most critical to national interests.
Well, I got what I wished for.  This outlines objectives that are clearly strategic in nature, and relatively easy to measure success of.  We can count the number of students who learn Arabic or graduate in Engineering.
Strategy 5 mentions learning outcomes for the first time, but the hammer falls in the next one:
Strategy 6. Expand the use of data collection instruments, such as the Integrated Postsecondary Education Data System (IPEDS), to assess student outcomes.
The Department will collaborate with the higher education community to develop and refine practices for collecting and reporting data on student achievement. To encourage information sharing, the Department will provide matching funds to colleges, universities, and states that collect and publicly report student learning outcomes.
We're going to put learning outcomes in IPEDS???  Didn't Conrad write about this? Where's all that useful stuff I dreamed up in one minute about employment, salaries, citizenship, and such?  Those pieces of the mission statement aren't even addressed.  Or maybe I missed them due to the lack of oxygen in my brain after reading strategy six.  It seems perfectly obvious that IPEDS will want comparable learning outcomes, which can only mean standardized tests like the ones the VSA requires.  If so, it won't be the end of the world, but it will be negative.  Beyond the expense and missed opportunity, we will institutionalize the kind of misplaced reasoning that in the earlier example would lead one to believe that Bob the new pilot can fly an SR-71 Blackbird. 

Coming full circle to Kagan's book, he remarks that
A scientific procedure that later research revealed to be an insensitive source of evidence for a concept, because the procedure was based on false assumptions, is analogous to a magical ritual. 
He give the example of Rorschach's inkblot tests.  That's hardly an apt analogy for standardized tests reported out nationally as institutional measures of success, however; inkblot tests are too subjective.  I suggest phrenology as a better comparison.

Postscript. As I was writing this, my daughter Epsilon was studying for her French "End of Course" standardized test (see this post).  I went down and reviewed a few sections with her.  She didn't know the names of the months very well, so I suggested making some flashcards.  She told me this: "I don't need to be able to spell them or tell you what they are, I only need to be able to recognize them.  It's multiple-choice."  The test only assesses reading and listening.  Speaking and writing are not part of it.  Since speech production and understanding are in different parts of the brain, we actually have some clue about the process in this case, and it tells us that the test is inadequate.  This may be okay--maybe those parts are covered somewhere else.  The point is that the subtleties matter.  If I said "Epsilon aced her french final," you might be tempted to think that she knows the names of the months.

She's down there making flash cards now. 

No comments:

Post a Comment