Friday, June 10, 2011

An Addendum and Apology

On Wednesday I wrote about my take-aways from the AALHE meeting in Lexington, and drew on some remarks from Trudy Banta. Some of the responses I got were justly critical of the way I mangled Trudy's message about the wider importance of the SAT example. It's true--I botched it, and I'd like to set the record straight so that Trudy doesn't begin to doubt her communications skills.

In order not to further fold, spindle, or mutilate someone else's message, let me say that what follows is my interpretation, and any faults should be ascribed to the author alone.

In a comment on the still-infantile state of the art of measurement, Trudy observed that even after more than 80 years of development and efforts to improve the SAT for its intended purpose, there is as much disagreement as ever about the validity of the SAT as a predictor of success in college. I intended to emphasize this point and what it portends for other projects that are even more ambitious. In the original article I left this argument dangling rather ineptly. So here it is, starting with a closer look at SAT, generalizing to the characteristics of industrialized tests (meaning massive, usually commercial, standardized instruments), and how we can do better with more authentic assessments.

Even with all the research that has gone into the SAT, in absolute terms it still isn't very good. The 2008 validity study from the College Board gives us (pg. 2):
[T]he weighted average correlation between SAT writing scores and English composition course grades was 0.32, after correcting for range restriction.
This is the most direct link I could find in the study between SAT and learning outcomes. Here we have actual writing in a standardized environment, correlated with course grades in coursework where writing is taught. If the course grade is related to how well students can write, then their potential as demonstrated by the SAT writing component ought to line up with it. And it does, but only to the tune of explaining 10% of the variance in grades (.32 squared). More generally, the variance in first year college grades explained by all the components of the SAT combined is 25% (page 5, squaring the adjusted R).

Any predictor with greater than zero R may be useful in the right context. But if we consider SAT's performance as an upper limit to what industrialized tests can do, it's a warning for current and contemplated high-stakes applications. This is a good place to segue to the mis-step in my prior article.

I gave three advantages of standardized tests. Here they are:
  • They can have good reliability, so they seem to measure something.
  • Because of this, they can create self-sustaining corporations that provide a standardized service to higher education, so there are fixed points of comparison.
  • Even if the validity isn't stellar, some information is better than none, right?
I was still thinking mostly of the SAT when I wrote this, so you might consider this the best case scenario. However, I intended it to be clear (it wasn't) that this is damning with faint praise: although these bullets describe how standardized tests have colonized an ecological niche successfully, these reasons aren't sufficient to warrant their use for high-stakes purposes. Like value-added comparisons of institutions, for example.

The missing "anti-bullets" that describe the corresponding disadvantages are respectively:
  • Depending solely on reliability is like searching for your lost keys under the lamp because that's where the light is. The drive for reliability creates artificial conditions like timed multiple-choice tests that have very little mechanical relationship to real-world application of knowledge. So reliability comes at a cost in validity. 
  • The size of the testing industry means that it can use its resources to circumvent professional review of its products and sell them directly to politicians. 
  • The validity comment is specious: "some" information isn't good enough in many circumstances. Imagine driving on a cliff's edge and only knowing where 10% of the road is. (An unfortunate real case is the publication of ranks of teacher "value-added" scores by the LA Times last year.) Exacerbating the general dearth of validity is is the fact that validity has to be determined locally after application of the test results (that is, to decide if some proposition about the actual results can be supported or not). It's very hard to do. Most of the time we don't know if our applications are valid or not. 
I would like to apologize to Trudy for bungling the transition between my summary of her remarks and the bullets in the original. I detracted from the point and may have misled or confused some readers. Mea culpa.

We can now step beyond the problems and look for solutions, and Trudy gave a list of some of these alternative approaches. Reliability isn't produced in a factory in Iowa City: we can get good reliability using rubrics on authentic student work. Additionally, the convenience of industrialized testing can nowadays be matched with local technological solutions. If we want to use portfolios, we don't have to have a thousand file cabinets, or mail copies to prospective employers; it's all on the web.

The big win for local solutions is validity: with the right application of technique, we can link assessments solidly with pedagogy at many levels of resolution, from individual assignments up to the institutional level.



As noted above, the testing industry has a lot of ability to protect its own interests, including great access to policy makers. With that in mind, I should wrap back around to the  theme of the policy conversations at the conference: the calls for accountability. It's not good enough to just resist the K-12-like solution; we have to find an alternative we find acceptable, and there are some good candidates available. That, I hope, is a clear and unmangled statement of the main message.

I don't mean to demonize "big testing," by the way. Maybe the best solution would be if we could convince them that authentic assessment and electronic portfolios are profitable new markets. I don't know what the chances of that are.

Thursday, June 09, 2011

Inputs and Outputs

There's an interesting article in The Atlantic. There's much more to it, but this quote gets to the heart of the experience for many classroom instructors at all levels, I imagine:
Professor X is shrewd about the reasons it’s hard to teach underprepared students how to write. “I have come to think,” he says, “that the two most crucial ingredients in the mysterious mix that makes a good writer may be (1) having read enough throughout a lifetime to have internalized the rhythms of the written word, and (2) refining the ability to mimic those rhythms.” This makes sense. If you read a lot of sentences, then you start to think in sentences, and if you think in sentences, then you can write sentences, because you know what a sentence sounds like. Someone who has reached the age of eighteen or twenty and has never been a reader is not going to become a writer in fifteen weeks.
The source here is the book by "Professor X" called In the Basement of the Ivory Tower.

In Seattle Education there is a three-parter on privatizing education. There's a great game-theory example I didn't know about:
Proponents claim that by encouraging competition, privatization can improve the efficiency of public services. But there can be serious drawbacks. For instance, before fire departments were publicly run, groups of firefighters sometimes set fires just to earn money by putting them out!
 The evidence provided is more suggestive than compelling, but it's a nice business model (from the point of view of investors anyway) to control both supply and demand. In education this would translate as institutions not really caring much about education, but really caring about credentialism, and then seeking to maximize the number of credentials available and perceived as necessary to employment. In other words, if education is a gate to the promised land, put us as many toll booths as possible on the road between here and there.

Wednesday, June 08, 2011

Policy Questions Raised at AALHE

I flew back yesterday from Lexington and the first AALHE conference. I found it very stimulating. I put faces to names from the ASSESS list server, which was delightful.

In the opening plenary, Trudy Banta gave us a broad perspective on the evolution of the measurement and accountability, pointing out the weaknesses of value-added derivations and standardized tests in particular, and suggesting authentic assessment (e.g. portfolios and their analyses) as useful alternatives.

One point is particularly compelling to me. Trudy mentioned the pedigree of the SAT, and it's not hard to imagine the many hours and dollars that have gone into fine-tuning this test. These are smart people working with a will for a long time toward a well-defined purpose: predicting how well high school students will do in college. In my own experience as an IR person, the SAT does add some predictive strength to linear models, but not much once high school GPA is considered. A handful of percent of R^2 is it. At my present institution, it's virtually worthless as a predictor of first-year grades, which points also to the known biases of the test.

In short, there is some usefulness to the SAT, but it may not warrant all the trouble and expense. And of course some schools are now SAT-optional. I've written before about how, as a market signal, SAT overprices some students and overlooks others, creating the opportunity to use other (e.g. non-cognitive) indicators to find good students.

Trudy's comments did not go this far, but it's not hard to connect the dots. It's an important point: if all this effort yields so little result, maybe we're doing it wrong. The alternative is to admit that maybe this is the best we can do, and that our ways of knowing will just never be much good.

It should be noted that the plenary was planned as a kind of debate with two viewpoints, but that because of a cancellation by Mark Chun, only one side was presented. So in the defense of big standardized tests, here are some advantages:

  • They can have good reliability, so they seem to measure something.
  • Because of this, they can create self-sustaining corporations that provide a standardized service to higher education, so there are fixed points of comparison. 
  • Even if the validity isn't stellar, some information is better than none, right?
The conversation is much more subtle than "down with standardized tests!" But you may have noticed that fine distinctions aren't part of the public debate on the quality of higher education. It would be great if Mark and Trudy and other experts introduced a public discussion like this on a message board or something--somewhere that the complexities of the issues could be fully explored and commented on over time.

[Update, see my "Apology and Addendum" that goes with this section.]

Bookending the conference, the closing plenary was given by David Paris, and also included a historical and political overview of where we are now. He is the executive director of the New Leadership Alliance for Student Learning and Accountability. You can sign up for email updates on their website. Paraphrasing David, the Obama administration talks nicer, but wants the same things as the Bush administration. And what that seems to be is "accountability." One important take-away for me is that we (higher education) have a window of opportunity that will soon close. That is, we can "do it to ourselves" or "have it done to us." The 'it' is amorphous, but centers on the accountability for cost and learning trade-off. In other words, all this public noise about how bad college is will create changes, and we should try to shape those.

It's hard to disagree with that. One approach is the Alliance's Institutional Certification Program, which you can learn about here. I'd like to see some examples of how it works in practice, but on the face of it, it seems like a great idea, similar in some aspects to the Bologna 'centering' process in Europe. I would call it a kind of horizontal professionalization, complementary to the 'vertical' type provided by discipline-based accreditations and professional organizations. Overall, it seems like a serious and well-reasoned approach, and I look forward to finding out more about it.

There was a good Q&A afterwards, and I suggested that higher education needs access to data pertaining to what happens to students after graduation, and that the federal bureaucracy sits on a gold mine of such information. This didn't have a chance to become a real discussion because of the format, but my interpretation of David's response was that the student-level record project that was floated and shot down (with help from higher education lobbyists) would have solved that problem. So we are complicit in causing the problem.

I don't buy this. First of all, what we need most right now is a deep historical understanding of how higher education has affected lives after school (graduation or not), for individual institutions if not individual majors. I understand that the data are imperfect and that there are privacy issues to be solved. But I think 9/11 shows that any privacy concerns can be solved rather quickly if the motivation is there. Specifically, it seems like it should be possible to combine some combination of student loan records, FAFSAs and  PELL grant records, tax records, federal employee records (including military) and so on to link where students took instruction, what their demographics were, and what happened to them afterwards. There are plenty of other places to find data, I'm sure, like state system databases.

The payoff for this is potentially enormous. Suppose we (higher education) can agree with the politicos that employment is a good outcome of a college education, and even talk about the details like what kind, how much pay, where geographically, and so on. Then we could look at what kinds of schools have what kinds of effects on what kinds of students. Is a biology program at my (private) university worth the cost premium over  the public one down the road? What about access? Which schools are economic engines by taking low-income inputs and (eventually) producing high-income outputs?

The payoff is so great that I think that if the government really can't find a way to do this, we should figure out how to do it ourselves. 

What we have now is survey data that shows that college generally pays for itself, but the price is going up. This is in opposition to the cries of critics that say students aren't learning much, if anything.

Here's my analogy. It's imperfect, but has the advantage of being vivid. Students are like movie scripts coming to our production companies. We have our specialties: niche art films or mass market gloss, and so on. Each script has an individual experience, no matter what our organization is. They get cast differently (different instructors), and we do our best to make a good fit usually. Or maybe we just get the cheapest actors we can find and hope for the best. We always have to rewrite the script to some extent, and our mission--the hope that gets buried in the Sisyphean rolling of semesters up and down the calendar--is that the screenplay comes to life and realizes its potential.

It takes a lot of money to make this happen, and it comes from investors who are increasingly grumpy. They say the movies are no good. They're too expensive and nobody likes them. Film critics like Arum and Roksa used data from a large number of scripts in production to claim that a large portion weren't being significantly improved in the rewriting stage.

But we don't have any real information about what the audience thinks. We can analyze the heck out of our own productions, and do six-sigma and professionalize all we want, but until we understand what the audience thinks, we don't really know if we're doing any good. Maybe all those films get shown once, and then end up in a storage shed. 

Maybe some types of films shouldn't be made anymore. Maybe Forensic Dolphin Psychoanalysis shouldn't even be a major because there's no audience for it. 

The recent explosion of for-profits makes the problem more urgent to solve. Most industries have to depend on their products working. We don't have much direct evidence one way or the other in the kind of detail we need to make decisions. It's starting to happen, for example with student load default rates getting attention. But we can go a lot further than that.

Imagine if we could sit down with the investors and agree on the long-term goals, have metrics that more or less tell us how well we're doing, and plan together how to get there. Are the goals strictly economic? Or do we want people to pursue happiness and bear arms? Even if it is just economics, it presents an opportunity to really understand how, say, liberal arts education matters in job mobility, lifetime income, and so on. Institutions could say "yes, you'll get a fine job, but in addition you'll have a fulfilling career." Or whatever their mission is. Instead of talking past each other about standardized assessments and authentic assessments, we could figure out how to work backwards from the real goals to real assessments that matter at the strategic level and then add institutional flavors that matter to us. That would be an exciting and productive conversation.

The alternative is grim. If we only focus on what goes on inside our production studios, the future of the nation is at the mercy of every critic with a theory or an agenda. I'm not sure which is worse: theories or agendas. Some will want to break down every step of movie making into a reductionist flow-chart, and create spreadsheets to show the rate of script-to-casting time or use biometrics to calculate charisma factors of the actors. There's no bottom to this because although movie scripts are only 120 pages, individual brains of our students have perhaps 100 trillion synapses each. If each one is, say eight bits of information, that's about four billion times the complexity of a movie script. Each one. Others will work backwards from agendas to create policies that make no sense in context.

Even if we think we've solved the problem and get universal agreement that our learning outcomes are being achieved at the university, how do we really know what the effect is after graduation unless we measure that? Maybe they're learning the wrong things. Maybe some small college has methods that work twice as good as ours. Maybe we can reduce costs and increase quality. The ingredients are wonderful: a diverse ecology of isolated experiments. We just can't see where the lab results are recorded in order to make conclusions.

Yes, we need individual student records. But we can't wait for that. The problem is too important. Maybe tax records and FASFAs can't be mashed up because of political or technical reasons (but do you really think Mark Zuckerberg couldn't figure this out in an afternoon?). If that's the case, then we have to find another way to measure historical and ongoing long-term outputs in such detail that it can inform institutional decisions. 

What we're doing at present is creating our own dramatic screenplay: an epic version of "No Exit."