Sunday, March 27, 2011

Complexity as Pedagogy

Learning an academic subject usually goes like this. First you have to get used to a new language and ideas expressed in that language. At first you're quizzed on meaning, but you're increasingly required to actively use the new concepts (using a different part of your brain to do so). When you get far enough along, you can begin to critique work using the new ideas. Ideally you critique your own work as you produce it. I think of this as the analysis and creativity cycle. The former demands use of the language (terms, ideas, grammar and syntax), whereas the latter depends on insight, trial and error, and imagination.

The problem is that it usually takes a long time to get any good a new language before you can effectively be creative in it. Take speaking in a foreign language, for example. You can't just make up words and sentences that 'sound German' and hope for it to be meaningful. A German friend gave a good example of this in reverse, when his sister tried to order a sandwich in New York:
Using "Ich will einen Hamburger bekommen"  (I want to get a hamburger.) to create the English approximation: "I will become a hamburger." 
But 'will' and 'become' mean very different things in the two languages, and there's no way to approximate the knowledge.
You have to understand how verbs are consummated and whatnot...then you can express yourself.

All this preparation is a hindrance if you just want to introduce a student to a subject. One of my professors, David Kammler, exposed me to the interesting idea in math that it should be taught like MacArthur's island-hopping campaign in the Pacific in WWII. That is, don't try to do everything, just teach strategic material that an able student can use as basis to explore other parts as needed. This has the effect of lessening the language burden.

There is an advantage to that. Being creative is like play, or it can be. It reinforces the language and gives students the thrill of discovery. This is how games work. If you make the learning curve too steep for beginners, they may not even make it through the rule book. Moreover, it's not necessary to have a lot of language/rules to have an interesting game. Chess, for example, or Backgammon, or Poker. The trick is to create a tight language that is unambiguous, and thereby allow a platform for exploration that can be easily checked by the rules.

Math and computer science are particularly amenable to this approach. My colleague Soumia Ichoua has a grant to do operations research activities, and part of it is an outreach project for middle or high school students. We are working with the university's Upward Bound program to coordinate logistics, and have created several types of logistical games for the kids.

The idea is to create a simple system that has these three properties:
  1. It's easy to understand the rules and check them. (low complexity rules = simple descriptive language)
  2. It's difficult to solve the problem. (high complexity solution = perfect solution method has a long description, and is unlikely to be found.)
  3. There is a wide range of possible solutions. (expressive language, allowing heuristics to be effective)
As it turns out, there is a major class of algorithmic challenges in computer science that has this property. They are in a class called NP. One example is the traveling salesman problem, where you try to find the shortest circuit to visit a list of destinations. The rules are very simple: add up the length of each leg (transit from one location to the next) in the circuit. The total distance is the sum. Finding the shortest path is very hard, however. So it's a perfect environment to employ creativity in a real discipline-based problem without having to learn a bunch of language and rules first.


The image above is taken from Wolfram Alpha, and shows a minimal distance circuit for the given points. The space of possible solutions is vast, but humans can use heuristics to find reasonably good solutions.

One of the game boards was created by my daughter, and is shown below. There's a story behind it about King Lolly, who needs to keep his people fed in winter, and involves moving food markers around from the grainery and the castle to the villages. There are several scenarios to keep it interesting.


Other logistics games include air-dropping supplies into a remote region, delivering hot dogs to hot dog vendors in a city, and planning for natural disasters. The last one is the focus of Dr. Ichoua's research, which has her visiting FEMA to get real data.

A draft of the rules for Lollyland is found below. The plan is to use undergraduate helpers to assist rising 9th and 10th graders in pairs. We are developing assessments for cognitive and non-cognitives.



Rules


Food travels from the castle and keep to the villages over the roads. If you have to move the food through the castle, the hungry people will eat half of whatever you send through (rounded up). Each food marker is enough to satisfy one hungry villager.

Winter in Lollyland


In the small kingdom of Lollyland, King Lollygaster has to plan well to feed his people during the winter. Being wise, he has constructed a grain tower in his castle and another one at a distant keep. Between these two sources, he must distribute emergency food if the winter is harder than expected. Besides the town surrounding the castle, which has its own store and can take care of itself, there are four large villages that have no such protection. It is these that the kind must provide for in time of famine.

Scenario 1.


The ground stayed frozen so long that the crops were put out late. The king must distribute food to the four villages to keep them from starving. He isn’t good with numbers, so he puts you in charge and entrusts you to get his people fed.

Set up: 25 food in the castle, and 11 in the keep. Each village gets 9 hungry people.


Questions:
  1. Can all the villagers be fed?
  2. How many ways are there to distribute the food to feed them?
Scenario 2.

After many happy years King Lollygaster became too frail to rule effectively, and turned the kingdom over to his son Lollyfright, who promptly dumped the old man down a well. Lollyfright neglected to lay up much in the way of food storage, assuming the winter would be mild. You are now the Minister of Happiness, and tasked with the following situation:

Set up: 18 food in the castle and 8 in the keep. Each village gets 9 hungry people. All of the roads are closed due to lack of interest in clearing snow from them. Choose four roads to open and deliver food; the others will remain closed.

Questions:
  1. What is the most number of villagers that can be fed?
  2. What are the best roads to open?
Scenario 3.

After the disaster, the merchant class of Lollyland dethrone Lollyfright and send him into exile. They form a governing council and put you in charge of planning for the next famine.
Set up: Assume there will be 7 hungry villagers in each village, the populating having declined of late. With this decline there is less ability to clear roads of snow. Assume that you can open only three roads in the worst case.

Question: How much food do you need to put at the castle and the keep?

Friday, March 25, 2011

Why Assessment?

When the word came filtering down through the academic rumor enhancement facility (an endowed building, thankfully) that we'd have to do assessment--this happening in a past demarcated by one (1) millennium post, one (1) century post, and two (2) decades--we academics ran for cover. Some are still down there in their fox holes, waiting for the war to be over, not having access to the headlines proclaiming victory for the other side.

I have to wonder why the emphasis was ever put on assessment in the beginning? It's certainly a bad word to build a PR campaign around, like trying to advertise lamprey on the menu. Anyway, assessment isn't the point at all. This has only recently occurred to me, which may be symptomatic of all those sedimentation layers from said rumor enhancement facility that piled 'assessment' on top of 'measurement' and so on and on. You could make chalk out of the stuff if you had the patience.

When I finally made the synaptic leap, it was a shock. But perhaps I can be excused that because of the continual sustained emphasis on assessment in conferences, publications, and public discourse.

Assessment is one component of a theoretical institutional effectiveness process, but it's not the most important one. The spiral of excellence is to be climbed by:
  1. Setting goals
  2. Finding assessments for same
  3. Gathering data and analyzing it
  4. Making changes that may improve things
The first step and the last are the most important, and if you could only pick one, number four would be it. If the mandate were to set goals and try to achieve them, anyone taking that seriously would pretty quickly figure out some sort of assessment was in order. So why do we start with assessment first? Historical accident?

Compounding the premier position of assessment is the way we sometimes talk about the activity, as if we were doing science in order to do engineering. What I mean here is that I characterize science as pinning down cause and effect, whereas engineering is the use of those principles to accomplish some aim. This is all mixed up in learning outcomes assessment, because it's often expected to do both at the same time. Here again, the heavy emphasis on assessment gets in the way. To find a link between cause and effect, we have to vary starting conditions, and compare ending conditions. Then we have to hope that the universe is reasonable and lets us get away with inductively assuming that because it worked last time it will work again the next time.

Suppose we want to test some variations of fertilizers on corn to see which one works best. We'd try to keep everything constant except our treatment (type of fertilizer), which varies in some suitable way. We'd be careful to randomize over plots of ground to average out soil or topographical variance, and so on. We would carefully document all the conditions (water, acidity, parasites, etc.) during the experiment, and finally assess the yield at the end. Now all that stuff I just reeled off is part of the experiment, but with learning outcomes we typically wish everything away except for the last bit--assessing the yield. I have yet to see an effectiveness model that has a box asking for the experimental conditions during the teaching process. Maybe that's because few would do it.

The effect of this casual approach to experimentation is to make the assessment less valid. This may be remedied to some extent by bringing back into light the context, which is exactly why having teaching faculty involved with broad discussions that include but are not limited to assessment results is so important. Given the typical (and understandable) lack of rigor for most learning assessments, the results are more like a psychic reading than science. And that's fine. It should be told that way, so that faculty members don't get so stressed out about a "scientific" use of the results. Focusing more heavily on goals and improvements makes it easier to engage faculty, and puts the emphasis where it ought to be: taking action.

Thursday, March 24, 2011

Memory as a SLO

Why don't we teach memory skills as learning outcomes? Given the amount of memorizing required for many college courses, wouldn't it make sense to have a first-year course on the general subject: theory and application? Or better yet, integrate techniques within classes. I can remember my uncle, who is a doctor, reciting a long list of bones from some mnemonic device he'd used back in school, decades before. The subject has a long rich history. The Romans practiced memory techniques, and probably considered it essential (see Wiki article "Art of Memory" for sources and details). The book Moonwalking with Einstein: The Art and Science of Remembering Everything is currently the number four best-seller on Amazon.com.

Memory shows up as the base of the pyramid on the revised Bloom's Taxonomy (taken from here).


I'm guessing that there are institutions out there who do this, but I haven't heard about (or ironically forgot about). Does anyone have a "memory across the curriculum" program? It seems like this skill does belong at the base of cognitive skills, and moreover is transferable from one discipline to another. Simple things could easily be done, like teaching students how big flash-card decks should be and how to time the reinforcements, or providing online resources like SuperMemo:
SuperMemo is based on the insight that there is an ideal moment to practice what you've learned. Practice too soon and you waste your time. Practice too late and you've forgotten the material and have to relearn it. The right time to practice is just at the moment you're about to forget.
Here's another tip: move your eyes
If you’re looking for a quick memory fix, move your eyes from side-to-side for 30 seconds, researchers say. Horizontal eye movements are thought to cause the two hemispheres of the brain to interact more with one another, and communication between brain hemispheres is important for retrieving certain types of memories.
Finally, here's a very cool way to remember numbers called pegging:
Basically what pegging does is to turn a number (any number), into a set of phonetic sounds or letters. These sounds are then joined together to form words, and these words may then be linked together to form a series of images. Finally these images may then be committed to memory. This enables an individual to recall numbers of up to (and above) 100 digits, with relative ease.
It seems like helping students improve memory would have broad positive consequences for the rest of their learning. They could spend less time in the process, stop cramming vocabulary just before the test (and forgotten just after), and take away a useful life skill. We could ask students to memorize presentations instead of reading them.

Memory as an outcome is attractive because it's easy to assess, there are straightforward techniques that are known to work, and it can be incorporated into the curriculum. The biggest barrier might be faculty who don't know and don't want to learn the techniques themselves.

Undoubtedly the drama department already does some of this--you can't take the flash cards on stage with you for the performance. Music too, maybe. Can we broaden the scope and take similar advantage institutionally? I'd love to know if someone out there is doing this already.

Wednesday, March 23, 2011

Compare and Contrast

These pairs of quotes come from different viewpoints about how we know what we know, and how we should conduct higher education. Write a 3000 word essay in which you reflect how these statements fit into the practice of teaching and learning as you currently understand it. Due Friday.
If we take take in our hand any volume, of divinity  or school metaphysics for instance, let us ask, Does it contain any abstract  reasoning concerning quantity or number? No. Does it contain any experimental  reasoning concerning matter of fact and existence? No. Commit it then to the flames, for it can contain nothing but sophistry and illusion. --  David Hume, An Enquiry Concerning Human Understanding, ed. L. Selby-Bigge, p. 163, taken from here.
Versus:
Epistemological anarchism is an epistemological theory advanced by Austrian philosopher of science Paul Feyerabend which holds that there are no useful and exception-free methodological rules governing the progress of science or the growth of knowledge. It holds that the idea that science can or should operate according to universal and fixed rules is unrealistic, pernicious and detrimental to science itself. --Facebook Group on Epistemological anarchism(!)
And
We say that a sentence is factually significant to any given person, if and only if, [she or] he knows how to verify the proposition which it purports to express—that is, if [she or] he knows what observations would lead [her or him], under certain conditions, to accept the proposition as being true, or reject it as being false. – A. J. Ayer, Language, Truth, and Logic
 Versus:

[T]he meaning of a word is its usage in the language. – L. Wittgenstein
And
The CLA measures were designed by nationally recognized experts in psychometrics and assessment, and field tested in order to ensure the highest levels of validity and reliability.-- Advertising flyer for the Collegiate Learning Assessment
Versus:
It is a common misconception that validity is a particular phenomenon whose presence in a test may be evaluated concretely and statistically. One often hears exclamations that a given test is “valid” or “not valid.” Such pronouncements are not credible, for they reflect neither the focus nor the complexity of validity. – College BASE Technical Manual
And finally this pair of quotes from a New York Times article:
In a talk to the nation's governors earlier this month, [Microsoft founder] Mr. Gates emphasized work-related learning, arguing that education investment should be aimed at academic disciplines and departments that are "well-correlated to areas that actually produce jobs." 
Versus:
At an event unveiling new Apple products, [Apple CEO] Mr. Jobs said: "It's in Apple's DNA that technology alone is not enough -- it's technology married with liberal arts, married with the humanities, that yields us the result that makes our heart sing and nowhere is that more true than in these post-PC devices."

What positions on the red/blue spectrum are more consistent with your own practices and beliefs? There are not right answers. Well, unless you believe there are I guess.

Tuesday, March 15, 2011

Improving the Principles of Accreditation

If you are in the SACS/CoC region, you know all about the Principles of Accreditation, the document that outlines accreditation standards, and which every institution must use to report compliance every ten years and (with fewer sections) every five years in between.  Until March 31, the Commission is accepting comments that will inform a review of said document. This is an excellent opportunity to make your voice heard in what is, after all, a peer-review process.

I have some observations I will post here for comment before sending them off to the Commission, in order to see if others agree or can suggest better approaches. I will restrict my comments to the institutional effectiveness (IE) sections. So here goes.

Note: CR = Core Requirement, CS = Comprehensive Standard, and FR = Federal Requirement

CR 2.10  The institution provides student support programs, services, and activities consistent with its mission that promote student learning and enhance the development of its students. (Student Support Services)
Although this isn't in the IE sections formally, it has a requirement that student support services "promote student learning and enhance the development of its students." This is a clear IE requirement, and is exceptional in that no other functional units, including academic programs, are required to pass this level of detailed IE review as a core requirement. Taken literally, an institution can be sanctioned severely for not assessing learning for student support services, which seems out of line with the more strategic level requirements that comprise the CR sections. I would suggest moving the IE language to 3.3.1.1, quoted below in the current version:
3.3.1 The institution identifies expected outcomes, assesses the extent to which it achieves these outcomes, and provides evidence of improvement based on analysis of the results in each of the following areas: (Institutional Effectiveness)
3.3.1.1 educational programs, to include student learning outcomes
3.3.1.2 administrative support services
3.3.1.3 educational support services
3.3.1.4 research within its educational mission, if appropriate
3.3.1.5 community/public service within its educational mission, if appropriate
Note that "student support services" doesn't appear. There are administrative and educational support services, but not student support services. The nomenclature needs to be cleaned up so we know exactly what we're talking about. One approach is to assume that any service that has learning outcomes is an educational support service, and therefore the modification could be as follows:
  1. End the statement of CR 2.10 after "mission," omitting the IE component.
  2. Change 3.3.1.3 to read "educational support programs, to include student learning outcomes."

CS 3.5.1  The institution identifies college-level general education competencies and the extent to which graduates have attained them. (College-level competencies)
This is the general education assessment requirement. Note that it doesn't appear in 3.3.1 unless general education is defined as a program by the institution. On the other hand, 3.5.1 is NOT an IE requirement--there is no statement about use of results to improve, just that you assess the extent to which students meet competencies.This is a knotty puzzle, so let me take it one part at a time.

First, the phrase "the extent to which" is ambiguous. Does it mean relative to an absolute standard or a relative one? This is by no means splitting hairs. If it means the former, then the institution MUST define what an acceptable competency is for each outcome, and presumably report out percentages that meet the standard. If it's a relative "extent to which" then simply reporting raw scores of a standardized test against national norms would work.
Example (absolute): Graduates will score 85% or more on the Comprehensive Brain Test. In 2010, 51% of graduates met this standard.

Example (relative): In 2010, graduates averaged 3.1 on the Comprehensive Brain Test, versus a national average of 2.9.
The standard is silent about the complexities of sampling graduates too. Are ratings from all graduates to be included? I would assume not, since this standard generally doesn't apply to IE processes because of the impracticably of it.

My sense of this is that we should allow institutions to define success in either absolute or relative terms, as best suits them, and include this standard with the other 3.3.1 sections so that it formally becomes part of IE. This will resolve the ambiguity about whether or not general education is program, and require that general education assessments actually be used for improvements. It also would broaden the scope to include students generally, not just graduates, who may have been out of general education courses for two years by then.

The modification could simply be to:
  1. Add: CS 3.3.1.6 general education, to include learning outcomes
  2. Delete: CS 3.5.1

Finally, let's look at 3.3.1.1 itself. The first issue is subtle. It concerns the meaning of the language "provides evidence of improvement based on analysis of the results". This can mean two different things, and I've seen it interpreted both ways, causing confusion.

The first interpretation is that it means that you have to demonstrate that improvement happened. This is a very high standard. It means things like benchmarking before changes are occurred, and then assessing the same way later on to see what impact occurred. When I hear speakers talk about QEP assessment, this is generally assumed, but it leaks over into the other IE areas too. Anytime you take a difference between two measurements it amplifies the relative error--this is a basic fact from numerical analysis. So you have to have very good assessments and they have to be objective (for reliability) and numerous (small standard error) and scalar (so you can subtract and still have meaning). Also, pre-post tests are the only method that can be used with any resemblance to scientific method. That is, you can't survey two different populations to compare unless you think you can explain all the variance between the populations (read Academically Adrift to see how problematic that is even for educational researchers). My objections to this are:
  1. No small program could ever meet this standard; the N will never be big enough.
  2. Many subjective assessments are very valuable, but are useless in this interpretation
  3. We don't yet have the technology to create scientific scalar indices of things like "complex reasoning" or other fuzzy goals, despite the testing companies' sales literature.[1]
  4. Random sampling is often impossible, which introduces biases that are probably not well understood
  5. Pre-post testing is quite limited, and severely restricts the kinds of useful assessments we might employ
The other interpretation is simply that we use analysis of assessment data to take actions that would reasonably be expected to improve things, but we don't have to prove it did. This is the standard I've seen most widely applied, except perhaps for QEP impact. Arguably the higher standard should apply to QEP, but  that's beyond the scope of my comments here.

In practical terms, the second interpretation is the most useful. It has to be borne in mind that assessment programs are mostly implemented by teaching faculty, who are probably not educational researchers by training, and in my experience tend to become frozen with a sort of helplessness if they think they are expected to track learning outcomes like a stock ticker tracks equity prices. I blogged about this a while back. Too much emphasis on proofs of improvement is paralyzing and counterproductive. On the other hand, free-ranging discussions that include the meaning of results, subjective impressions from course instructors, and other information that speaks to the learning outcome under consideration is a gold mine of opportunities to make changes for the better.

The most powerful argument against the strict (first) interpretation, however, is that there is simply no way to guarantee improvement on some index unless (1) one cheats somehow, manipulating the index, or (2) the index reflects some goal that is so obviously easy to improve that it's trivial. Either way the meaningfulness of the program vanishes, and we are left with many programs that are out of compliance (not showing improvement) or in compliance in name only (showing fake improvement or showing trivial improvement). 

My recommendation is to clarify the meaning of the language to read (bold emphasizes the change):
CS 3.3.1 The institution identifies expected outcomes, assesses the extent to which it achieves these outcomes, and takes actions based on analysis of the results in each of the following areas: (Institutional Effectiveness)
It should be obvious that the actions are intended to effect positive change, since this is rather the whole point of IE.

There is another issue with 3.3.1 that deserves attention. This concerns goals or outcomes that are not about student learning. It seems to me from the interpretations I've seen of 3.3.1 by reviewers and practitioners that the methods that apply to learning outcomes leak over to administrative areas, and that the expectations for compliance are tilted out of plumb. For example, there seems to be an expectation that every unit do surveys and have goals related to those, and that moreover this is necessary and sufficient.

Part of the problem may be the language "the extent to which," which almost insists on a scalar quantity.  But in fact, the most important goals for a given unit's effectiveness may have little to do with surveys and not be naturally a scalar.

One example I saw takes issue with a compliance report that presented "action steps" as goals. An action step might be "Approval of architectural drawings of the new library by 5/1/11." This sort of thing shows up all over Gantt charts:

(image courtesy of Wikipedia)

 This method of tracking complicated goals to completion is very common and very effective. The bars may or may not represent progress toward completion--they are simply timelines. So, for example, an OK stamp from the city engineer on your electrical plans is not a "percent to completion" item--it's either done or not. In other words it's Boolean.

Reviewers can be allergic to Boolean outcomes like:
  • Complete the library building on time and on budget.
  • Implement the new MS-Social Work program by Fall 2012
  • Gain approval via the substantive change process for a full online program in Dance by 2013 (good luck!)
  • Maintain a balanced budget every fiscal year.
For some reason, there seems to be a bias against this kind of goal, and I can't figure out why. These are obviously important operational items, key to the effectiveness of respective units. But a unit that includes these and doesn't have a satisfaction survey may be cited, whereas the reverse may be true too.

It may be that I'm making a mountain out of a termite hill here, but I think the language of "extent to which" could be changed to make it clearer that Boolean objectives are sometimes the natural way to express effectiveness goals.

For example, 3.3.1 could read (change bolded):
CS 3.3.1 The institution identifies expected outcomes, assesses success in a manner appropriate to each outcome, and takes actions based on analysis of the results in each of the following areas: (Institutional Effectiveness)
I know the non-parallel structure will offend the grammarians, but someone more expert can consider that conundrum.

One final nit-pick is that outcomes may not really be expected, but simply striven for--aspirational outcomes, in other words. If they are expected, then by nature they are less ambitious than they might otherwise be. I also put the odd dangling " in each of the following areas" at the front where it belongs, so my final version is this:
CS 3.3.1 In each of the following areas, the institution identifies aspirational outcomes, assesses success in a manner appropriate to each outcome, and takes actions based on analysis of the results: (Institutional Effectiveness)

[1] I've written much about assessing complex outcomes before, and this is not the page to rehash that issue. The short version is that learning happens in brains, and unless we understand how brains change when we learn, we are not able to speak about causes and effects as they relate to the physical world. See this article about London taxicab drivers to see a study that links learning to physiological changes to the brain. I don't mean to imply that assessing complex outcomes is useless, but just that we should be modest about our conclusions. It's called complex for a reason.

Monday, March 14, 2011

Academically Adrift

I've come up for air after posting part three of Life Artificial yesterday evening. Over the last few weeks I've had dozens of browser tabs open to write a post about, but have been too focused on getting particular projects done to write them.

I finished reading most of Academically Adrift last week, meaning I've started but not finished the Appendices. I downloaded the book from Amazon and read it on the iPad's Kindle application. This is very convenient, and the reading experience is fine, but it does have a significant drawback. Despite the ability to bookmark pages and leave yourself notes and highlights, it's not very easy for me to mark up the book in an easily accessible fashion for reference purposes. Normally I'd have sticky notes coming out of the leaves of the volume, and have text circled by scribbled notes in the margin. This by way of apology that I don't have many quotes in this post.

I've never been impressed with the CLA, and the research in Academically Adrift depends heavily on it. I hasten to add that I don't have any problems with the laudable aim of assessing "critical thinking, complex reasoning, and writing." I think the instrument itself can even be useful. It's the exaggerated claims of importance and over-statement of validity that seem unwarranted to me. Because so much of the book relies on CLA scores, let me elaborate.

First, the test is advertised as posing "real world" problems to test subjects. I'm not quite sure what this means, but it sounds good. An example is given on page 22-23:

This seems to be a particularly poor exemplar of a real world problem. On the face of it: would you ask a twenty-something with two years of college to make a decision like this? Undoubtedly the aircraft is an expensive purchase that, more to the point, you'll be trusting your life to. So the way you make the decision is to give a non-expert a limited amount of documentation and ask for a report?  I don't think so.

Real life problems are not neatly defined by a few newspaper articles, FAA reports, and such. There's a vast body of knowledge to sort through an analyze if one cares to look for it. It is in the form of web searches, academic and professional articles, interviews with experts, and on and on. I understand that it would make the test much harder to administer to allow open-ended searches, and it would take a long time. Days or weeks. This just highlights how far this exercise is from a real "real life problem."

Real life problems are generally solved by people who know what they are doing. You don't ask the guy at the car wash to take out your gall bladder for a good reason. If you want to know about safety issues with a particular model of airplane, I think it would be really good to talk to a senior mechanic that works on such planes and to pilots that fly them. In fact, putting an expert in the field in charge of your investigation would be the best thing to do, no? This is one of the problems with assessments of "critical thinking" that try to ignore content knowledge. It doesn't work. Content is really important when you want to solve a problem, unless all you care about is sophistry.

Here's a real life problem:
You're the head mechanic who supervises a small fleet of helicopters for a rent-a-bird operation. The chief pilot is planning to fly in tomorrow to test out a new Bell Jet Ranger that you're getting ready for operations. But you and your team haven't been able to get the main rotor balanced, and you suspect that the part that came from Bell to do this is defective. But the pilot has a volatile temper, and is going to be very upset with you if the helicopter isn't ready for him to fly when he gets here. What do you do?
This requires technical knowledge, creativity, and judgment about social dynamics to get through, and there's probably no perfect solution. This was a real world problem I witnessed as a teenager. I saw the mechanics try to balance that rotor blade for hours. They finally gave up, put it on the helicopter anyway, and let the pilot try to take off with it that way! It was so out of balance, that he could only wobble the aircraft around on the ground before giving up. He emerged looking like a ghost, too relieved to be alive to be mad at anyone. Probably not an ideal solution.

A problem with the analysis in the book is the emphasis on differences in scores and the way the information is represented. Others have mentioned (e.g. this article in The Chronicle) that the authors of Academically Adrift put a lot of importance on how many students haven't demonstrated significant learning. The literature I've read on the CLA says that it's not even designed for student-level analysis, but for comparing institutions. But never mind that. The statistical problem that has been pointed out by others is that a hypothesis test that does not find a difference in pre-post tests does not mean that there is no difference. This is from Stats 101. As an analogy, let's say my kid is getting ready for school and can't find her gym clothes. I don't have my glasses on yet and am bumbling around the house before the first cup of coffee has kicked in. But I look here and there for the gym clothes halfheartedly as I walk around. There are two possibilities:
  1. I see the distinctive purple bag. Even without my glasses on, there's a reasonably high probability that I've found the gym clothes.
  2. I look in the room but don't see the bag. I cannot say that the bag isn't in the room, only that I didn't see it. It may, in fact, be just behind the door.
This is the difference between rejecting the null hypothesis (#1) and not rejecting it (#2). Yet the authors lead us to believe through their narrative that not finding a difference means that there is no difference, and go on to claim that not much learning is happening during the first two years of college. 

Although the authors wave at the notion of IQ late in the book, they never say anything about the relationship between what they are trying to measure and intelligence (as assessed by standard instruments). Psychologists assure us that IQ is largely static. It's also closely related to verbal skills, which is what the CLA assesses too. If IQ is largely immutable, don't we need to factor it out before making conclusions about improvement? I understand that this is very sensitive issue because you ultimately would have to say that some people are smarter than other people (with all the fine print that goes with that), but to completely ignore the issue seems disingenuous.

We shouldn't necessarily reject all the findings of the book just because the analysis is flawed. The attention on how hard students work and the corresponding rigor of academics is an important discussion. Unfortunately, the way the headline findings of the text overstate the findings on complex learning outcomes only feeds into the current of public discourse advertising that teachers are villains. Despite carefully worded disclaimers that general readers won't decode, the book practically screams that the first two years of college don't produce any meaningful learning. This is a disservice to all the course instructors out there who work hard to teach students calculus, computer programming, world history, composition, and so on.

The fact is that the content of the CLA doesn't correspond well to the course content of a general education curriculum. This could be remedied easily. I am quite sure that a one-semester prep course on items like the ones on the CLA would significantly increase the scores. Would this result in real, meaningful knowledge? Maybe, depending on the depth of the course. A pure test-prep course that teaches how to game the test is probably worthless. But perhaps the current curricula are out of date. Even if that's true, the importance of the CLA and its standardized measurement are not very important because they are too simplified, and rely too much (I assume) on IQ, and not enough on discipline-based knowledge. There are already recommendations (e.g. from the AAC&U) to increase complex problem solving skills, and many institutions are already struggling with how to teach and assess this. But almost by definition, complex problems aren't trivial to solve. If the CLA becomes the measure of these learning outcomes--if we really take analyses like Academically Adrift seriously--they we will shortly have another instance of Campbell's Law at work. See this Wall Street Journal article for a good example. Or this opinion about "impact factors."

Learning outcomes don't matter anyway, in the big picture. That is, test scores and other performance indicators taken during college aren't ultimately the measure of success of a program. It's what happens after graduation that ultimately matters. Are your graduates curious about the world? Do they read newspapers and vote intelligently? Are they engaged in service to their fellow humans? Do they contribute to the overall economy? Depending on the mission of the institution, these goals may vary (but always include "do they donate money?"). At the level of the Education Department, where the strategy should inform the national interests, perhaps the easiest metric of success would be income histories. The federal government has vast data stores on financial aid given. It also has all the IRS data. I asked an official from DoE about mashing these up to answer important questions like "what's the lifetime earnings benefit for attending a private vs public college?". I wasn't encouraged by the answer. No one cares enough or the problem seems too hard. And yet, I'm sure if we gave all the data to the guys at Facebook, they'd have it figured out over the weekend.