Sunday, November 20, 2011

The End of Preparation

A few days ago, I wrote "A Perilous Tail" about problems with the underlying distributions of measurements (in the sense of observations turned into numbers) we employ in education. I've shown previously that at least for the SAT, where there is data to check, predictive validity is not very good: we can only classify students correctly 65% of the time. When this much random chance is involved in decision-making, the effect is that we can easily be fooled. Here I cited the example that might lead us to believe that yelling at dice improves their "performance." It's also unfair to hold individuals accountable if their performance is tied to significant factors they can't control, and it invites cheating and finger-pointing.

As I mentioned in the previous article, I have a solution to propose. But I haven't really done the problem justice yet.  We encounter randomness every day, so how do we deal with it? Functioning systems have a hard time with too much randomness, so the systematic response is to manage it by reducing or avoiding uncertainties, and when that cannot be done, we might imagine it away (for example, throwing salt over one's shoulder to ward off bad luck). Many of our ancestors undoubtedly faced a great deal of uncertainty about what they would be able to eat, so much of our mental and physical activity and the very way our bodies are constructed, has to do with finding suitable food (which can be of various sorts) and consuming it effectively. Compare that with the homogeneous internal system of energy delivery that feeds and oxygenates the cells in our body via our circulatory system. Complex systems often take messy stuff from the environment and then organize it for internal use. I will use a mass-production line like the one pictured below for such a system.
Source: Wikipedia

The idea is to find materials in the environment that can be used to implement a plan, turning rocks into aluminum and tree sap into rubber, sand into glass, and heating, squashing, and otherwise manipulating raw natural resources until they come to create an airplane. This process is highly systematized so that parts are interchangeable, and the reliability of each step can be very high. Motorola invented the concept of Six Sigma to try to reduce the randomness in a manufacturing process to negligible amounts. This is at least theoretically possible in physical systems that have reliable mechanical properties.

What do we do when randomness can't be eliminated from the assembly line? One approach is to proceed anyway, because assembly lines have great economies of scale, and can perhaps be useful even if there are a large number of faulty items produced. Computer chip makers have to deal with a certain percentage of bad chips at the end of the line, for example. When chemists make organic molecules that randomly choose a left-right symmetry (i.e. chirality), sometimes they have to throw away half of the product, and there's no way around it.

The educational system in the United States has to deal with a great deal of variability in the students it gets as inputs and the processes individuals experience. It superficially resembles a mass production line. There are stages of completion (i.e. grade levels), and bits of assembly that happen in each one. There are quality checks (grades and promotion), quality assurance checks (often standardized tests), and a final stamp of approval that comes at the end (a diploma).

All this is accomplished while largely ignoring the undeniable fact that students are not standardized along a common design, and their mental machinery cannot be engineered directly the way an airplane can be assembled. In short, the raw material for the process is mind-bendingly more complex than any human-made physical device that exists.

Because of the high variability in outcomes, the means we use for quality assurance is crucially important, and this is where we have real opportunities to improve. This is the assessment problem. Current methods of assessment look a lot like factory floor assessments: the result of analyzing student performance is often a list of numbers that can be aggregated to show the executives how the line is working. Rewards and punishments may be meted out accordingly. In a real factory,  the current stage of production must  adequately prepare the product for the next stage of production. We must be able to correctly classify parts and pieces as "acceptable" or "not acceptable" according to whether or not they will function as required in the whole assembly. The odd thing about educational testing is that this kind of question doesn't seem to be asked and answered in a way that takes test error into account. Randomness is simply wished away as if it didn't exist. In the case of the SAT (see here), the question might be "is this student going to work out in college?" In practical terms this is defined as earning at least a B- average the first year (as defined by the College Board's benchmark). To their credit, the College Board published the answer, but this transparency is exceptional. Analyzing test quality in this way proceeds like this:
  1. State what the desired observable future effect of the educational component under review.
  2. Compare test scores with actual achievement of the outcome. What percentage succeeded at each score?
  3. Find a suitable compromise between true positives and true negative outcomes to use as your benchmark.
  4. Publish the true positive and true negative predication rate based on that benchmark.
To repeat, the College Board has done this, and the answer is that the SAT benchmark gives the right answer 65% of the time. This would make you rich if we were predicting stock prices, but it seems awfully low for a production line quality check.

Source: Wikipedia

Because we can't make the randomness go away, we imagine it away. So the assessments become de facto the measure of quality, and the quality of the tests themselves remains unexamined. In a real assembly line, an imperfect test would be found out eventually when the planes didn't perform as expected. Someone would notice and eventually track it down to a problem with the quality assurance program. There is so much uncertainty in education that this isn't possible, and the result is truly ironic: the deeper insinuation of tests that are unaccountable for their results. To be clear: any quality test that does not stand in for a clear predictive objective and provide research for its rate of correct classification in actual practice, is being used simply on faith. To be fair, it's virtually impossible to meet this bar. That excuse doesn't make the problem go away, however--it just makes it worse. One result is that test results distort perceptions of reality. If appearance is taken at face value for reality, then appearance has great economic value. There is incentive to do whatever it takes to get higher test scores, with unfortunate and predictable results.

To sum up the problem: variability is too high for good standardized tests to support an educational assembly line, and this fact is generally ignored for convenience.

I don't mean to imply that the major actors in education are incompetent. We are where we are because of historical factors that make sense as a system in evolution, and we have the means to take the next step.

The real world is not an assembly line. There is this expression we use when talking to students "when you get out into the real world...", as if the academy is a walled garden that excludes the vulgar world at large. This too, is a factory mentality. The students have heard it all before. From kindergarten on up, they hear stories about how hard it's going to be "when you get to high school," or "when you get to college." My daughter marveled at this during her first few weeks of high school. She was amazed and somewhat appalled that her middle school teachers had misled her about this. Of course it's not much harder--the system can only work with a high degree of integration, smoothing out the hard bits. If the wing assembly is slowing down the production line, then it needs attention. One could argue that the whole path from Kindergarten through Doctorate is becoming a smooth one for anyone who wants to trod it.

But the real world really is different. The assembly line stops at the hanger door, and the planes are supposed to be ready to fly. The tests don't matter anymore. No one is going to check their validity, nor delve too deeply into what a certification means after graduation. And in the real world, the factory mentality has to be unlearned: one cannot cram all night just before promotions are announced in order to game the system.

One solution is to try to change the real world to be more like the educational system. This is a practical choice for a military career, perhaps, where strict bureaucracy is essential to function. But it certainly is a mismatch for an entrepreneurial career, the engine of the nation's economic growth.

I believe it is now a reasonable and desirable choice to go the other direction, and change the assembly line to look more like the real world. The most important aspect to change is to admit uncertainty and begin to take advantage of it. This means we have to forget about the idea of standardizing and certifying. I will argue that we can do this to our great advantage, and introduce efficiencies into the economic structure of the nation in the process. Currently we pass our uncertainties on to employers. We hide behind test results and certificates, and leave it to employers to actually figure out what all that means. The result is that they have only very crude screening information at their disposal, and have to almost start from scratch to see what a graduate can actually do. The Internet can change all that. To my surprise, I discovered last week that it already is.

I had written a draft of this article last week, but when I ran it through my BS-detector, I couldn't bring myself to publish it. The reason is simple: it's just another untested idea, or so I thought. I hadn't actually employed the solution I will describe below, and so I didn't have anything concrete to show. But by coincidence, I saw exactly what I was looking for at the Virginia Assessment Group conference, at a presentation by Jeffrey Yan, who is the CEO of Digication. I didn't know about the company before last week.Vendors in higher education technology solutions may be excused perhaps for exaggerating the effectiveness of their products, and I generally think they are overpriced, too complicated, and too narrowly focused. I didn't have great expectations for Jeffrey's talk, but after about ten minutes I realized that he was showing off a practical application of what I was hypothesizing.

Digication is an eportfolio product. In what follows, I will not attempt to describe it as a software review would, but as it fits into the flow of ideas in this article.

The main idea is simple: instead of treating students as if we were preparing them for a future beyond the academy, treat them as if they were already there. In the real world, as it's called, our careers are not built on formalized assessments. To be sure, we have to deal with them in performance reviews or board certifications, but these are mostly barriers to success, not guarantees of it. Instead, it's the record of accomplishment we create as we go that matters. In many instances, promotions and accolades are inefficiently distributed, based on personal relationships and tenure, rather than merit, but this is not what we should aspire to. In fact, these imperfections are vulnerable to the sort of transparency that is within our grasp.

Senior Design Project at Stonybrook
In his presentation, Jeffrey showed examples of the sort of thing that's possible. Take a look at this senior design project at Stonybrook University. It's a real world project to design a new sort of sphygmomanometer (blood pressure meter). Quoting from the project page:
We aim to satisfy all the customer needs by designing a [blood pressure measuring] device that translates the vibrations into a visual indication of blood pulses, more specifically the first pulse to force its way through the occluded artery (systolic) and the last pulse detectable before laminar flow is regained (diastolic).
Another showcased student portfolio was from a second year student at the same institution, who created a public portfolio to tell the world about his interests and abilities. He shows how to solve what we call a difference equation (similar to a differential equation) using combinatoric methods here. This shows an interest in versatility in the subject that cannot be communicated with a few numbers in an assembly-line type report.

By concentrating on authentic evidence of accomplishment, rather than artificially standardized means of observation, we create an important opportunity: a public portfolio can be judged on its own merits, rather than via an uncertain intermediary. It's the difference between seeing a movie yourself and knowing only that it got three and a half stars from some critic.

The solution to the factory mentality presents itself. If students see that they are working for themselves and not as part of some unfathomable assembly process, accumulating what will become a public portfolio of their accomplishments, their learning becomes transparent. They can directly compare themselves to peers in class, peers at other institutions, graduates from all over, and professionals in the field. I imagine this leading to a day when it's simply unthinkable for any professional not to have an up-to date professional eportfolio linked to his or her professional social networking presence (see mathoverflow.net, Academia.edu, and LinkedIn.com as examples of such networks). Once started, the competitive edge by those with portfolios will become obvious--you can learn much more from a transparent work history than you can from a resume.

While in school, of course, some work, maybe much of it, needs to be private, to gestate ideas before presenting them to the world. But the goal should be for a forward-looking institution of higher education to begin to create public sites like the Stonybrook showcase and the one at LaGuardia Community College. Ultimately, universities need to hand the portfolios off to the students to develop as their respective careers unfold. I understand that graduates get to keep their portfolios and continue to develop them with Digication's license, as long as it is maintained.

Here's the manifesto version:
We don't need grades. We don't need tests or diplomas or certificates or credit hours. None of that matters except insofar as it is useful to internal processes that may help students produce authentic evidence of achievement. That, and that alone is how they should be judged by third parties.
Some advantages of switching from "assemble and test" to authentic work that is self-evidently valuable:
  1. We change student mentality from "cram and forget" to actual accomplishment. We can make the question "when will I ever really use this stuff?" go away.
  2. The method of assessing a portfolio is deferred to the final observer. You may be interested in someone else's opinion or you may not be. It's simply there to inspect. Once this is established, third parties will undoubtedly create a business out of rating portfolios for suitability for your business if you're too busy to do it yourself.
  3. Instead of just a certificate to carry off at graduation, students could have four years' worth of documentation on their authentic efforts. This idea is second nature to a generation who grew up blogging and posting YouTube videos.
  4. It doesn't matter where you learned what. A student who masters quantum mechanics by watching MIT or Kahn Academy videos might produce as good work as someone sitting in class. It makes a real meritocracy possible.
  5. Intermediate work matters. Even if someone never finishes a degree, they have evidence beyond a list of grades that they learned something. And it's in rich detail.
There's more than this, actually. The very nature of publishing novel work is changing. At present, the remnants of paper bound publication, with its interminable delays, exorbitant costs, virtual inability to correct errors, and tightly bound intellectual property issues, is still around. But it's dying.  A journal is nothing more than a news aggregator, and those are now ubiquitous and free. It's hard to say what the final shape of publishing will be, but something like a standardized portfolio will probably be front and center. When I say 'standardized', I mean containing certain key features like metadata and historical archive, so that you can find things, cross-reference, and track changes. As the professional eportfolio develops, it will need help from librarians to keep it all straight, but this can be done at a much lower cost than the publishing business now incurs in lost productivity, restricted access, and cost to libraries.

The focus will, I believe, shift from journals and other information aggregators, to the individuals producing the work. And institutions will share in some of the glory if part of the portfolio was created under their care.

All of this has been around for a while, of course. Eportfolios are almost old news in higher education, and I've blogged about them before. My previous opinion was that there was no need for a university to invest in its own portfolio software because everything you already need is on the web. If you want to put a musical composition on the web, just use Noteflight, and of course there's YouTube for videos, and so on. All that's needed is a way to keep track of hyperlinks to these in a way that can allow instructors to annotate as needed with rubrics and such. The demos convinced me, however, that having a standard platform that can be easily accessible for private, class-wide, collaborative, or public use is worth paying for. I don't know how much it costs in practice, but there is value beyond what one can get for free on the Internet.

Portfolios have been incorporated here and there as just another part of the machinery, amounting to a private repository of student work that can be used for rubric ratings to produce more or less normalized ratings of performance--an advanced sort of grading. This is useful as a formative means of finding all sorts of pedagogical and program strengths and weaknesses. The point of this article is not that portfolios are a better way to produce test-like scores, but that the test scores themselves will become obsolete as external measures of performance. For professors to get feedback on student performance, and for the students themselves to hear directly what the professors and their peers think is invaluable. It's essential for teaching and learning. But it's downright destructive to use this is as a summative measure of performance, for example for holding teachers accountable. The instant you say "accountability," no one trusts anyone else, and there really is no way to run the enterprise but as a factory, with inspectors enforcing every policy. It cannot work in the face of the uncertainties inherent to the inputs and outputs of education.

There is a history of tension in higher education between the desire for authenticity and the simultaneous wish for factory-like operational statistics that show success or failure. The Spellings Commission Report has a nice sidebar about Neumont University and mentions their portfolio approach (their showcase is here), but can't tear itself away from standardized approaches to learning assessment. Three years before, the Council for Higher Education Accreditation beautifully illustrated the tension:
[I]t is imperative for accrediting organizations–as well as the institutions and programs
they accredit–to avoid narrow definitions of student learning or excessively standardized
measures of student achievement. Collegiate learning is complex, and the evidence used
to investigate it must be similarly authentic and contextual. But to pass the test of public
credibility–and thus remain faithful to accreditation’s historic task of quality assurance –
the evidence of student learning outcomes used in the accreditation process must be
rigorous, reliable, and understandable.
This is from CHEA's 2003 paper "Statement Of Mutual Responsibilities for Student Learning Outcomes: Accreditation, Institutions, and Programs."  More recently, Peter Ewell wrote "Assessment, Accountability, and Improvement: Revisiting the Tension" as the first Occasional Paper for the National Institute for Learning Outcomes Assessment, in which he illuminates the game-theoretic problem I alluded to above:
Accountability requires the entity held accountable to demonstrate, with evidence, conformity with an established standard of process or outcome. The associated incentive for that entity is to look as good as possible, regardless of the underlying performance. Improvement, in turn, entails an opposite set of incentives. Deficiencies in performance must be faithfully detected and
reported so they can be acted upon. Indeed, discovering deficiencies is one of the major objectives of assessment for improvement.
In a real factory setting, tests of mechanical process can be very precise, eliminating the difference between what the assessment folks call formative (used to ferret out useful improvements) and summative (an overall rating of quality). If a machine is supposed to produce 100 widgets per hour, and it's only producing 80, it's clear what the deficit is, and the mechanic or engineer can be called in to fix it. But when one is held accountable for numbers like standardized test results that have a considerable amount of uncertainty (which itself is probably unknown, as I pointed out before), the game is very different. It is less like a factory and more like going to market with a bag of some good and some counterfeit coins, which I described in "The Economics of Imperfect Tests." One's optimal strategy has less to do with good teaching than with manipulating the test results anyway one can. Unfortunate examples of that have made national news in K-12 education.

My proposal is that we in higher education take a look at what Stonybrook and others are doing, and see if there is not merit to an emphasis on authentic student learning outcomes, showcased when appropriate for their and our benefit. That we don't consider a grade card and a diploma an adequate take-away from four years and a hundred thousand dollars of investment. That instead, we help them begin to use social networking in a professional way. Set them up with a LinkedIn account during the orientation class--why not? Any sea change from teach/test/rinse/repeat to more individual and meaningful experiences will be difficult for most, but I believe there will be a payoff for those who get there first. Showing student portfolios to prospective students as well as prospective employers creates a powerful transparency that will inevitably have valuable side effects. Jeffrey told said that some of the portfolios get millions of Internet views. How many views does a typical traditional assignment get? A handful at most, and maybe only one.

The odd thing is that this idea is already quietly in place and old hat in the fine arts, performing arts, and architecture departments, and there are probably some I'm not aware of. Who would hire a graphic designer without seeing her portfolio, even if she had a wonderful-looking diploma? This means that we probably have experts already on campus. Computer Science is a natural fit for this too, and there's already a professional social network set up at Stackoverflow.com.

A good first step would be to allow portfolio galleries to count for outcomes assessment results in the Voluntary System of Accountability (VSA). Currently, the only way to participate is to agree to use standardized tests. From the agreement's provision 17:
Participate in the VSA pilot project to measure student learning outcomes by selecting one of three tests to measure student learning gains. 
a) Collegiate Assessment of Academic Proficiency (CAAP) – two modules: critical thinking and writing essay - http://www.act.org/caap/. 
b) Collegiate Learning Assessment (CLA) – including performance task, analytic writing task - http://www.cae.org/content/pro_collegiate.htm. 
c) ETS Proficiency Profile (formerly known as MAPP) – two sub scores of the test: critical thinking and written communication - http://www.ets.org/. Either the Standard or the Abbreviated form can be used.
The VSA is a wonderful program, but it is handicapped by this requirement. If you already use one of these tests, that's fine, but it's expensive and a distraction if you don't find them useful. More to the point of this article, there is no option on the list to report authentic outcomes. Adopting another pilot project to see how far the public portfolio idea will sail would be a great addition.

[The next article in this series is "Tests and Dialogues"]

Acknowledgements: Thanks to Jeffrey Yan for letting me chew his ear off after his presentation. And thanks to the coordinators of the Virginia Assessment Group for putting that wonderful event together.

Disclaimer: I have no financial interest in any of the companies mentioned in this article.

No comments:

Post a Comment