Wednesday, June 23, 2010

Proxy Problems

You may have seen this New York Times article on Loyola Law School about grades.  Quote:
The school is retroactively inflating its grades, tacking on 0.333 to every grade recorded in the last few years. The goal is to make its students look more attractive in a competitive job market.
Or from the same source, "Under Pressure, Teachers Tamper With Tests."  Quote:
The district said the educators had distributed a detailed study guide after stealing a look at the state science test by “tubing” it — squeezing a test booklet, without breaking its paper seal, to form an open tube so that questions inside could be seen and used in the guide.
Motivation?
Houston decided this year to use [standardized test] data to identify experienced teachers for dismissal
And then there's the metaphysics of time expressed with the concern "Credit Hours Should Be Worth the Cost, House Panel Members Say" in The Chronicle.
The standard of a credit hour, which is not actually a full 60 minutes in most cases, is deeply embedded in higher education as a benchmark for earning a degree. But the definition of what constitutes a credit hour has become muddled in recent years with the increase in online education.
Or an example from The People Republic of China, courtesy of  Yong Zhao at Michigan State, where standardized tests have very high stakes:
[The test] puts tremendous pressure on students, resulting in significant psychological and emotional stress. Imagine yourself as a 6th grader from poor rural village—how well you do one exam could mean bankrupt your family or lift your family out of poverty, give your parents a job in the city, and the promise of going to a great college.
About the tests themselves:
[T]he selection criterion is only test scores in a number of limited subjects (Chinese, Math, and English in most cases). Nothing else counts. As a result, all students are driven to study for the tests. As I have written elsewhere, particularly in my book Catching Up or Leading the Way, such a test-driven education system has become China’s biggest obstacle to its dream of moving away from cheap-labor-based economy to an economy fueled by innovation and creativity. The government has been struggling, through many rounds of reforms, to move away from testing and test scores, but it has achieved very little because the test scores have been accepted as the gold standard of objectivity and fairness for assessing quality of education and students (as much as everyone hates it).
Changing the topic yet again, there's a June 13th article in The Chronicle entitled "We Must Stop the Avalanche of Low-Quality Research."  The thesis is:
While brilliant and progressive research continues apace here and there, the amount of redundant, inconsequential, and outright poor research has swelled in recent decades, filling countless pages in journals and monographs.

Then there are presidential aims of doubling the number of college graduates by 2020 juxtaposed to potentially "toxic degrees" and alpine debt generated by for-profit colleges (see last post).  This tension, as well as speculation about what it means for all of higher ed is given in The Chronicle's "New Grilling of For-Profits could Turn Up the Heat for All of Higher Education."  Here's the bit where the writing on the wall appears:
Many of the issues at stake, however, could mean harsher scrutiny for all of higher education, as worries about rapidly growing costs and low-quality education in one sector could raise questions about long-accepted practices throughout higher education.
Congress and colleges still lack a firm sense of "what our higher education system is producing," said Jamie P. Merisotis, president of the Lumina Foundation for Education. "The model of higher education is starting to evolve, but it's not clear to us what that evolution looks like," he said.

If you didn't read that and say "uh-oh," take another look :-).  Here's a hint, from a May article:
The [Education] department is already working with the National Governors Association and its next chairman, West Virginia's governor, Joe Manchin III, a Democrat, to develop college-graduation-rate goals for each state and eventually, for each institution of higher education. That kind of push is necessary, Mr. Duncan said, if the country is to meet President Obama's goal for the United States to have the world's highest proportion of residents with a college degree by 2020.
What do all these have in common?  In each case, we look at one thing and imagine that it is something else.  Grades equate to academic performance, number of papers published equates to professional merit, credit-hours equates to time and effort expended in learning, standardized test score equates to academic potential for students and successful teaching for teachers, and graduation equates to (I presume) preparation for satisfying employment and a successful life. 

All of these are proxies, and they each have problems. Anyone who works in assessment is in the business of creating proxies.  They might have problems too.  Problems are created, for example, if there is economic value associated with the outcome, which is true for all of the above. Dept of Ed is willing to pay for grads?  Hey, we'll give you grads.  Here's the bill.

Other problems are related to validity; how closely does the proxy track what you're actually interested in?  Note that for true statements of fact, validity is never an issue.  If I say "35 students took the survey," and if that's true, validity is not an issue.  When I start to talk about what the survey results mean, then I have to worry about validity of statements like "students want a better salad bar in the cafeteria," or whatever.


There are some proxies that work pretty well. Money, for example.  A dollar bill is a proxy for anything you can buy for a dollar.  It's like a wild card. You'd think such a crazy idea would never work, but it mostly does.  Why?  Because it's very forcefully regulated. There's a strong incentive to go run off your own $20 bills on the color copier, but you'll end up in jail so you probably don't do that.  Where this proxy fails is where it's not tightly controlled. Like in banks, which can print their own money (for all practical purposes, this is how banks work).  If they create too much, and say cause a booming market in stuff we can't really afford, then problems occur.  Or if the government itself just starts printing the stuff wholesale. In short, if the scarcity of the proxy matches the scarcity of some set of important commodities, the buying power ought to behave itself.  (The Economist puts out a Big Mac index based on this idea of purchasing parity.)

How would you put this kind of enforcement into practice in an assessment situation? First, you have to keep people from cheating--printing their own money, so to speak.  So researchers aren't allowed to break one long paper into two in order to get more publications.  Schools can't artificially inflate grades. Colleges can't crank out graduates that didn't complete a suitable curriculum satisfactorily.  Teachers can't see the standardized test before they're supposed to.  And so on.  This may not be possible for your assessment, in which case you should probably abandon it.  Like the first example: how in the world can you keep researchers from padding their vitas?

The harder problem is validity. With dollar bills, validity is enforced by law and custom.  It says "legal tender" on the bills, and while you could barter with tomatoes instead it would become impractical.

Part of the difficulty with validity is that it changes due to the fact that a proxy has been announced to be so.  For example, if you looked back at the 17th century, it might make sense to rank researchers based on the number of their publications because they were probably not using that particular yardstick as a measure of success.  But once it's announced that papers = tenure, any validity you might have assumed before cannot still be assumed to be true.  Either economic motivation (leading to gaming the system) or lack of it (apathy and inaccurately low performance) may affect validity.

A really bad proxy can ironically bring about the opposite of what you intend.  More on that here

You would think with these problems that we would use proxies only as a last resort. But we seem to be hard-wired to want to use them.  Maybe there's some psychological reason for this.  Maybe it's part of the package for language-users. As exhibit A, I present one of the "solutions" to the posed problem of too many publications from the "low quality research" article above:
[M]ake more use of citation and journal "impact factors," from Thomson ISI. The scores measure the citation visibility of established journals and of researchers who publish in them. By that index, Nature and Science score about 30. Most major disciplinary journals, though, score 1 to 2, the vast majority score below 1, and some are hardly visible at all. If we add those scores to a researcher's publication record, the publications on a CV might look considerably different than a mere list does.
As some of the comments to this article point out, it's easier to inflate citations than it is even to crank out papers.  It's the siren song of the next, better proxy just...over...that...hill...  With a clever name like "impact factor," it must be good.

Moral: When you can, gather and analyze statements of fact about whatever it is you care about, rather than using a proxy.  More on this subject anon.

For other ideas about alternatives to grades and standardized tests, see fairtest.org, Joe Bower's blog, and Alfie Kohn's site.

Update: There's a fascinating article in Inside Higher Ed this morning called  "The White Noise of Accountability."  There are many proxies for "measuring accountability" mentioned, including this example:
The Louisiana Board of Regents, for example, will provide extra funding for institutions that increase not the percentage, but the numbers, of graduates by … allowing them to raise tuition.
The author's observation about student learning outcomes is (in my opinion) right on:
But if the issue is student learning, there is nothing wrong with -- and a good deal to be said for -- posting public examples of comprehensive examinations, summative projects, capstone course papers, etc. within the information environment, and doing so irrespective of anyone requesting such evidence of the distribution of knowledge and skills.
I interpret this as saying just publish performance information.  If we could get everyone interested in actual performance rather than proxies, we'd be a lot better off.  See "Getting Rid of Grades" for an extreme version of that idea.

The IHE article cites (and criticizes) an Education Sector report "Ready to Assemble: A Model State Higher Education Accountability System" and summarizes thus:
By the time one plows through Aldeman and Carey’s banquet, one is measuring everything that moves -- and even some things that don’t.
I took a closer look at the article to see what it says about learning outcomes--the heart of the matter. It doesn't take long to find problems:
[S]everal nonprofits have developed promising new ways to measure higher education quality that have become widely accepted and implemented by colleges and universities. The Collegiate Learning Assessment (CLA), which measures higher-order critical thinking and analytic reasoning skills, and the National Survey of Student Engagement (NSSE), which measures effective teaching practices, are two examples.
Standardized proxies.  The first one has serious validity and game theory problems (see "Questions of Validity" and  these related posts*), and the NSSE doesn't directly look at outputs at all.  There's more stuff about the CLA that looks like it came straight from the marketing department, and this assertion:
Colleges are often ranked by the academic standing of the students they enroll. But measures like the CLA allow states to hold colleges accountable for how much students learn while they’re in college.
Really?  That's a lot of faith in a proxy that doesn't even test discipline-specific material (e.g. you could ace the test and still be a lousy engineer or biologist or whatever your major was). There are other tests mentioned, but all standardized tests of general reasoning. Maybe the attraction of such things is that they give the illusion of easy answers to very difficult questions. As Yong Zhao put it in his article, describing the situation in the PRC:
The government has been struggling, through many rounds of reforms, to move away from testing and test scores, but it has achieved very little because the test scores have been accepted as the gold standard of objectivity and fairness for assessing quality of education and students (as much as everyone hates it).
The siren song goes on...

Related: "Fixing Assessment."

*These are all obviously my opinions. You can find out about the CLA yourself from their website.The test is unique and has some interesting merits as contrasted to standard fill in the bubble tests.  My point is not that the test (or other standardized tests of general knowledge) can't be used effectively, but that assuming that it's a suitable global measure for student learning at the college level is more weight than the proxy can bear. In the VSA and the "Ready to Assemble" article, the Spellings Report, and elsewhere such tests are granted status that resembles an ultimate assessment of collegiate learning, which is doesn't seem justified to me.

No comments:

Post a Comment