Higher Ed/: June 2010

Monday, June 28, 2010

Unintended Consequences

My post a couple of days ago on proxies shows some examples of unintended consequences when you put artificial conditions on performance. Sometimes it involves cheating, like the teachers 'tubing' tests. But sometimes emphasis on a proxy ironically brings about the opposite of what you'd expect. I explored that idea in "The irony of good intentions" a while back. Since the World Cup is on, here's a crazy football (soccer) example. The basic expectation in a football match is that teams try to score in their opponent's goal, NOT their own. Scoring an own-goal is a sure way to make the fans grumpy (or murderous in one tragic case).

The 1994 Shell Caribbean Cup had the odd twist that goals scored in overtime counted double for purposes of the tournament. I'm not sure what the motivation for that was, but it had an unexpected consequence. Here's the wiki description of the Grenada vs Barbados game:

Grenada went into the match with a superior goal difference, meaning that Barbados needed to win by two goals to progress to the finals. The trouble was caused by two things. First, unlike most group stages in football competitions, the organizers had deemed that all games must have a winner. All games drawn over 90 minutes would go to sudden death extra time. Secondly and most importantly, there was an unusual rule which stated that in the event of a game going to sudden death extra time the goal would count double, meaning that the winner would be awarded a two goal victory.

Barbados was leading 2-0 until the 83rd minute, when Grenada scored, making it 2-1. Approaching the dying moments, the Barbadians realized they had no chance of scoring past Grenada's mass defense, so they deliberately scored an own goal to tie the game at 2-2. This would send the game into extra time and give them another half hour to break down the defense. The Grenadians realized what was happening and attempted to score an own goal as well, which would put Barbados back in front by one goal and would eliminate Barbados from the competition.

However, the Barbados players started defending their opposition's goal to prevent them from doing this, and during the game's last five minutes, the fans were treated to the incredible sight of Grenada trying to score in either goal. Barbados also defended both ends of the pitch, and held off Grenada for the final five minutes, sending the game into extra time. In extra time, Barbados notched the game-winner, and, according to the rules, was awarded a 4-2 victory, which put them through to the next round.

Here's a grainy pre-HD video exerpt.

Wednesday, June 23, 2010

Proxy Problems

You may have seen this New York Times article on Loyola Law School about grades. Quote:

The school is retroactively inflating its grades, tacking on 0.333 to every grade recorded in the last few years. The goal is to make its students look more attractive in a competitive job market.

Or from the same source, "Under Pressure, Teachers Tamper With Tests." Quote:

The district said the educators had distributed a detailed study guide after stealing a look at the state science test by “tubing” it — squeezing a test booklet, without breaking its paper seal, to form an open tube so that questions inside could be seen and used in the guide.

Motivation?

Houston decided this year to use [standardized test] data to identify experienced teachers for dismissal

And then there's the metaphysics of time expressed with the concern "Credit Hours Should Be Worth the Cost, House Panel Members Say" in The Chronicle.

The standard of a credit hour, which is not actually a full 60 minutes in most cases, is deeply embedded in higher education as a benchmark for earning a degree. But the definition of what constitutes a credit hour has become muddled in recent years with the increase in online education.

Or an example from The People Republic of China, courtesy of Yong Zhao at Michigan State, where standardized tests have very high stakes:

[The test] puts tremendous pressure on students, resulting in significant psychological and emotional stress. Imagine yourself as a 6^th grader from poor rural village—how well you do one exam could mean bankrupt your family or lift your family out of poverty, give your parents a job in the city, and the promise of going to a great college.

About the tests themselves:

[T]he selection criterion is only test scores in a number of limited subjects (Chinese, Math, and English in most cases). Nothing else counts. As a result, all students are driven to study for the tests. As I have written elsewhere, particularly in my book Catching Up or Leading the Way, such a test-driven education system has become China’s biggest obstacle to its dream of moving away from cheap-labor-based economy to an economy fueled by innovation and creativity. The government has been struggling, through many rounds of reforms, to move away from testing and test scores, but it has achieved very little because the test scores have been accepted as the gold standard of objectivity and fairness for assessing quality of education and students (as much as everyone hates it).

Changing the topic yet again, there's a June 13th article in The Chronicle entitled "We Must Stop the Avalanche of Low-Quality Research." The thesis is:

While brilliant and progressive research continues apace here and there, the amount of redundant, inconsequential, and outright poor research has swelled in recent decades, filling countless pages in journals and monographs.

Then there are presidential aims of doubling the number of college graduates by 2020 juxtaposed to potentially "toxic degrees" and alpine debt generated by for-profit colleges (see last post). This tension, as well as speculation about what it means for all of higher ed is given in The Chronicle's "New Grilling of For-Profits could Turn Up the Heat for All of Higher Education." Here's the bit where the writing on the wall appears:

Many of the issues at stake, however, could mean harsher scrutiny for all of higher education, as worries about rapidly growing costs and low-quality education in one sector could raise questions about long-accepted practices throughout higher education.

Congress and colleges still lack a firm sense of "what our higher education system is producing," said Jamie P. Merisotis, president of the Lumina Foundation for Education. "The model of higher education is starting to evolve, but it's not clear to us what that evolution looks like," he said.

If you didn't read that and say "uh-oh," take another look :-). Here's a hint, from a May article:

The [Education] department is already working with the National Governors Association and its next chairman, West Virginia's governor, Joe Manchin III, a Democrat, to develop college-graduation-rate goals for each state and eventually, for each institution of higher education. That kind of push is necessary, Mr. Duncan said, if the country is to meet President Obama's goal for the United States to have the world's highest proportion of residents with a college degree by 2020.

What do all these have in common? In each case, we look at one thing and imagine that it is something else. Grades equate to academic performance, number of papers published equates to professional merit, credit-hours equates to time and effort expended in learning, standardized test score equates to academic potential for students and successful teaching for teachers, and graduation equates to (I presume) preparation for satisfying employment and a successful life.

All of these are proxies, and they each have problems. Anyone who works in assessment is in the business of creating proxies. They might have problems too. Problems are created, for example, if there is economic value associated with the outcome, which is true for all of the above. Dept of Ed is willing to pay for grads? Hey, we'll give you grads. Here's the bill.

Other problems are related to validity; how closely does the proxy track what you're actually interested in? Note that for true statements of fact, validity is never an issue. If I say "35 students took the survey," and if that's true, validity is not an issue. When I start to talk about what the survey results mean, then I have to worry about validity of statements like "students want a better salad bar in the cafeteria," or whatever.

There are some proxies that work pretty well. Money, for example. A dollar bill is a proxy for anything you can buy for a dollar. It's like a wild card. You'd think such a crazy idea would never work, but it mostly does. Why? Because it's very forcefully regulated. There's a strong incentive to go run off your own $20 bills on the color copier, but you'll end up in jail so you probably don't do that. Where this proxy fails is where it's not tightly controlled. Like in banks, which can print their own money (for all practical purposes, this is how banks work). If they create too much, and say cause a booming market in stuff we can't really afford, then problems occur. Or if the government itself just starts printing the stuff wholesale. In short, if the scarcity of the proxy matches the scarcity of some set of important commodities, the buying power ought to behave itself. (The Economist puts out a Big Mac index based on this idea of purchasing parity.)

How would you put this kind of enforcement into practice in an assessment situation? First, you have to keep people from cheating--printing their own money, so to speak. So researchers aren't allowed to break one long paper into two in order to get more publications. Schools can't artificially inflate grades. Colleges can't crank out graduates that didn't complete a suitable curriculum satisfactorily. Teachers can't see the standardized test before they're supposed to. And so on. This may not be possible for your assessment, in which case you should probably abandon it. Like the first example: how in the world can you keep researchers from padding their vitas?

The harder problem is validity. With dollar bills, validity is enforced by law and custom. It says "legal tender" on the bills, and while you could barter with tomatoes instead it would become impractical.

Part of the difficulty with validity is that it changes due to the fact that a proxy has been announced to be so. For example, if you looked back at the 17th century, it might make sense to rank researchers based on the number of their publications because they were probably not using that particular yardstick as a measure of success. But once it's announced that papers = tenure, any validity you might have assumed before cannot still be assumed to be true. Either economic motivation (leading to gaming the system) or lack of it (apathy and inaccurately low performance) may affect validity.

A really bad proxy can ironically bring about the opposite of what you intend. More on that here.

You would think with these problems that we would use proxies only as a last resort. But we seem to be hard-wired to want to use them. Maybe there's some psychological reason for this. Maybe it's part of the package for language-users. As exhibit A, I present one of the "solutions" to the posed problem of too many publications from the "low quality research" article above:

[M]ake more use of citation and journal "impact factors," from Thomson ISI. The scores measure the citation visibility of established journals and of researchers who publish in them. By that index, Nature and Science score about 30. Most major disciplinary journals, though, score 1 to 2, the vast majority score below 1, and some are hardly visible at all. If we add those scores to a researcher's publication record, the publications on a CV might look considerably different than a mere list does.

As some of the comments to this article point out, it's easier to inflate citations than it is even to crank out papers. It's the siren song of the next, better proxy just...over...that...hill... With a clever name like "impact factor," it must be good.

Moral: When you can, gather and analyze statements of fact about whatever it is you care about, rather than using a proxy. More on this subject anon.

For other ideas about alternatives to grades and standardized tests, see fairtest.org, Joe Bower's blog, and Alfie Kohn's site.

Update: There's a fascinating article in Inside Higher Ed this morning called "The White Noise of Accountability." There are many proxies for "measuring accountability" mentioned, including this example:

The Louisiana Board of Regents, for example, will provide extra funding for institutions that increase not the percentage, but the numbers, of graduates by … allowing them to raise tuition.

The author's observation about student learning outcomes is (in my opinion) right on:

But if the issue is student learning, there is nothing wrong with -- and a good deal to be said for -- posting public examples of comprehensive examinations, summative projects, capstone course papers, etc. within the information environment, and doing so irrespective of anyone requesting such evidence of the distribution of knowledge and skills.

I interpret this as saying just publish performance information. If we could get everyone interested in actual performance rather than proxies, we'd be a lot better off. See "Getting Rid of Grades" for an extreme version of that idea.

The IHE article cites (and criticizes) an Education Sector report "Ready to Assemble: A Model State Higher Education Accountability System" and summarizes thus:

By the time one plows through Aldeman and Carey’s banquet, one is measuring everything that moves -- and even some things that don’t.

I took a closer look at the article to see what it says about learning outcomes--the heart of the matter. It doesn't take long to find problems:

[S]everal nonprofits have developed promising new ways to measure higher education quality that have become widely accepted and implemented by colleges and universities. The Collegiate Learning Assessment (CLA), which measures higher-order critical thinking and analytic reasoning skills, and the National Survey of Student Engagement (NSSE), which measures effective teaching practices, are two examples.

Standardized proxies. The first one has serious validity and game theory problems (see "Questions of Validity" and these related posts*), and the NSSE doesn't directly look at outputs at all. There's more stuff about the CLA that looks like it came straight from the marketing department, and this assertion:

Colleges are often ranked by the academic standing of the students they enroll. But measures like the CLA allow states to hold colleges accountable for how much students learn while they’re in college.

Really? That's a lot of faith in a proxy that doesn't even test discipline-specific material (e.g. you could ace the test and still be a lousy engineer or biologist or whatever your major was). There are other tests mentioned, but all standardized tests of general reasoning. Maybe the attraction of such things is that they give the illusion of easy answers to very difficult questions. As Yong Zhao put it in his article, describing the situation in the PRC:

The government has been struggling, through many rounds of reforms, to move away from testing and test scores, but it has achieved very little because the test scores have been accepted as the gold standard of objectivity and fairness for assessing quality of education and students (as much as everyone hates it).

The siren song goes on...

Related: "Fixing Assessment."

*These are all obviously my opinions. You can find out about the CLA yourself from their website.The test is unique and has some interesting merits as contrasted to standard fill in the bubble tests. My point is not that the test (or other standardized tests of general knowledge) can't be used effectively, but that assuming that it's a suitable global measure for student learning at the college level is more weight than the proxy can bear. In the VSA and the "Ready to Assemble" article, the Spellings Report, and elsewhere such tests are granted status that resembles an ultimate assessment of collegiate learning, which is doesn't seem justified to me.

Tuesday, June 22, 2010

Flipping Colleges for Profit

Slate's The Big Money has a June 17th article entitled "Subprime for Students," which details how a not-for-profit college gets flipped into a raging money machine. In this version of a now-familiar tale, a private concern bought out a small religious school for $9 million, renamed it, and turned it into an online school set up to harvest federal aid dollars. The result: $454 million in tuition and fees last year (actually from two schools treated this way). The company, BPE, spends $145 million on marketing and recruitment, according to the article. It's a finely tuned machine:

At the end of last year, BPE had 1,175 recruiters responsible for drumming up business. The company plainly discloses in its annual report that it sets tuition (currently $7,860 a year) to stay within the loan limits of Title IV, the regulations governing financial aid. One Ashford recruiter Eisman quotes is more blunt: “They conveniently price tuition at the exact amount a student can qualify for in federal loan money. If a person has money available for school, Ashford finds a way to go after them. … It’s a boiler room, selling education to people who really don’t want it.”

How many recruiters does your school have? I bet you don't need a comma to write it down.

This situation gets compared to the sub-prime mortgage crisis. Details are in the article, but one estimate of the loan defaults by 2020 puts the total in the hundreds of billions.

Tuesday, June 08, 2010

For-Profit and No-Profit

They may be converging. From The Chronicle:

While officials at for-profit colleges anxiously await the release of regulations that could limit borrowing by their students, rumors about the content and timing of the new rules abound.

Here's the picture to go with it. American Public's Walmart bounce is evident.

In other news, alternatives to for-pay education are maturing. This New York Times piece sums up the choices, with links.

And this one sums up the Edupunk movement, including Peer-to-Peer University. Quote:

P2PU’s mission isn’t to develop a model and stick with it. It is to “experiment and iterate,” says Ms. Paharia, the former executive director of Creative Commons. She likes to talk about signals, a concept borrowed from economics. “Having a degree is a signal,” she says. “It’s a signal to employers that you’ve passed a certain bar.” Here’s the radical part: Ms. Paharia doesn’t think degrees are necessary. P2PU is working to come up with alternative signals that indicate to potential employers that an individual is a good thinker and has the skills he or she claims to have — maybe a written report or an online portfolio.

Some of my own ideas about that last thought are found in "Getting Rid of Grades."

Monday, June 07, 2010

Shorting Higher Ed

There are stirrings lately that the Edu-bubble--putative over-leveraging of higher ed--is about to burst. For some time, the narrative in the press has been that a four-year degree is overpriced, typically with graphs that show average tuition and fees. Here's one from Money magazine last year (graph on left).

This one was cited byt Glenn Reynolds in a Washington Examiner article "Higher education's bubble is about to burst":

College has gotten a lot more expensive. A recent Money magazine report notes: "After adjusting for financial aid, the amount families pay for college has skyrocketed 439 percent since 1982. ... Normal supply and demand can't begin to explain cost increases of this magnitude."

I'm not sure he noticed the Money article was a year old. He gives no proximate cause for thinking the bubble is pop-worthy.

There have been others, for example in the New York Times. I summarized and commented on some of these in this post in November 2009. It's a popular sport to find examples of recent graduates with mountains of debt and figure out where the blame goes. For example, this article in the New York Times:

[...] Ms. Munna, a 26-year-old graduate of New York University, has nearly $100,000 in student loan debt from her four years in college, and affording the full monthly payments would be a struggle. For much of the time since her 2005 graduation, she’s been enrolled in night school, which allows her to defer loan payments. This is not a long-term solution, because the interest on the loans continues to pile up. So in an eerie echo of the mortgage crisis, tens of thousands of people like Ms. Munna are facing a reckoning.

I speculated that at some point we'd start to see discount online colleges, and compared it to a merger of University of Phoenix and Wal*Mart. That doesn't seem to have happened yet, but Wal*Mart has bought into the current prices in bulk. In The Washington Post article "Wal-Mart partners with online school to offer college credit to workers" Ylan Mui writes that " [I]n classic Wal-Mart fashion, the company negotiated a 15 percent tuition reduction on other courses at APU in exchange for handling some administrative and marketing duties." This is from American Public University, where their website quotes prices up to $300/credit, which is about the going rate for coursework.

Is this too high? If you imagine a three-credit course with 20 students enrolled, each paying the $300 per credit, that's a total revenue of $18,000 for the course. The instructor may cost $3000, and the infrastructure a few hundred. A large amount goes to advertising, and there's some administrative overhead. The rest is profit. From the article:

American Public University is one of a growing number of so-called career colleges that operate on a for-profit model, rather than as state institutions or private foundations. APU's parent company is publicly traded and its reported revenue jumped 43 percent to $47.3 million during the most recent quarter, while profit rose 46 percent to $7.6 million.

A large part of this profit comes from the federal government in the form of grants and guaranteed loans. (See this article for details.) There is noise now about changing the rules to cut off this geyser of public money going into the pockets of shareholders. Steve Eisman, who bet against the mortgage industry and won, is quoted in Mother Jones in "Steve Eisman's Next Big Short: For-Profit Colleges":

In a speech titled "Subprime Goes to College," delivered Wednesday at the Ira Sohn Investment Research Conference, Eisman blasted the for-profit education industry, likening these companies to the seamy mortgage brokers who peddled explosive subprime loans over the past two decades.

This is political, of course. The for-profits seem to have enamored themselves to the Republicans, who are perhaps hoping to privatize all of education. The Chronicle has an article about a review panel of 18 individuals...

[...] that reviews accrediting groups. And the stark divisions among those named to the panel does not bode well for a unified or harmonious approach to its task when it begins meeting again this year after a two-year hiatus.

Agendas are clear from the nominations:

Congressional Republicans, naming a third of the committee's members, mostly chose panelists from the business world or for-profit colleges but no one currently serving at a traditional nonprofit institution. The GOP appointments include Anne D. Neal, president of the American Council of Trustees and Alumni, who served on a previous version of Naciqi and was one of the most outspoken advocates for colleges to show greater evidence of student achievement and for accreditors to require such evidence in their standards.

Republicans also named two leaders of proprietary colleges to the committee: Arthur E. Keiser, chancellor of Keiser University, and William J. Pepicello, president of the University of Phoenix.

Congressional Democrats, by contrast, chose four of their six members from the ranks of public colleges, along with the former chancellor of the University of California at Davis.

So Republicans are for private, for-profit education and for standardized testing. Democrats are supporting the old creaky model of higher education. This doesn't bode well for reasoned debate.

The Dept of Ed is casting about for new models. In this article in The Chronicle, Secretary Duncan is quoted as saying that

his hope is to reallocate money from other federal programs, which he didn't name, and use it to offer financial incentives to colleges that show creativity in containing costs and improving their graduation rates.

Huffington Post article lists one possible remedy:

In January, the Education Department suggested one answer: for a program to be eligible [for public money], a majority of its graduates' annual student loan payments under a 10-year repayment plan must be no more than 8 percent of the incomes of those in the lowest quarter of their respective professions. The earnings data would come from the Bureau of Labor Statistics.

This is the kind of measure I've been asking for for years--actual data on how students do in the marketplace after graduation. This would do more than anything to illuminate the debate about the worth of education.

Thursday, June 03, 2010

Assessment Crosstalk

A recent InsideHigherEd article "The Faculty Role in Assessment" has stayed on the site's Most Popular list for several days. In "Fixing Assessment" I posted my solution: differentiating clearly between strategic and tactical goals and pursuing those at an appropriate level. It is evident from the comments on the IHE article that there is deep interest in the issue, and in this post I will take a stab at categorizing those comments.

There's no proof assessment does any good is the argument made by RSP. This turns the tables on the psychometric-speak employed by the academic arm of the assessment profession (up to and including standardized test vendors). The argument is natural: if you can measure learning, show me how much students have learned and how much assessment matters, qualitatively and rigorously. The community of assessment professionals is aware of this contradiction, but don't seem to take it seriously. The obvious solution to that is to be more modest about claims.

Assessment is big business is the ad hominum comment by skeptical_provost. It is ironically book-ended by John B. later on, who is selling the product. Now an ad hominum may not be a bad argument. If I learned you invested with Mr. Madoff, I may have reason to be worried for your finances without further information. In this case, the industrialization of testing has produced undesirable side effects. Marketing overstates what the tests can do, this and politicization seem to have cultivated the idea even at the Dept of Ed that we can measure minds like weighing potatoes. This underlines the previous point of don't believe the advertising. Do some critical thinking, right?

Oversimplification of assessments is a problem first raised by Jonathan. I've argued this point many times in this blog: the complexity of an assessment has to match the complexity of what you're assessing. Or else you should be very suspicious of the results. This relates to the previous comments in at least two ways. First, the industrialization of assessment requires convenience (for economic reasons) and reliability (in order to make the argument that measurement is being done). Both of these serve to reduce the complexity of assessments to standardized instruments and standardized scoring. As a prosaic example, imagine hiring a babysitter. Would you rely solely on some standardized test to tell you how good a candidate is? Or would you want to interview the person, see if you can suss out how good his/her judgment is, how well the person communicates, how bright they seem? You might want to do both to cover your bases, but the second (high complexity) step tells you things the low-complexity test cannot. This problem raises a particular question for the more aggressive claims of assessment: how do you account for complexity? Here's another example.

A limerick is a well-defined form, which you can read about here. We can fairly easily create a rubric and standardized assessment for limericks. Imagine you've done that, and then assess the following one (borrowed from John D. Barrow's Impossibility):

There was a young man of Milan
Whose rhymes they never would scan;
When asked why it was,
He said, 'It's because
I always try to cram as many words into the last line as ever I possibly can.'

Rated honestly, you'd have to flunk it, because it's not technically a limerick at all. It fails the most basic requirements. However, it could be considered a meta-limerick, which is arguably more interesting. It's unlikely your rubric can handle this. It might explode.

It does make a difference, responds Jeremy Penn (RSP's comment). But there is a subtle shift here. Jeremy references what happens in the classroom and instruction techniques, NOT assessment writ large. This mismatch of meaning and vocabulary is at the heart of much misunderstanding. Let me call these points of view:

Assessment as Pedagogy (AP): use of subjective and some standardized techniques to improve the delivery of instruction at the course and program level, tied very closely to the faculty who teach the classes and the material that's being taught. It doesn't need heavy bureaucracy or assumptions about reductionism.

Assessment as Science (AS): the idea that we can actually measure learning precisely enough to determine things like the "value added" by a general education curriculum. The hallmarks of AS are large claims tied to narrow standardized instruments, theoretical models that pin parameter estimates on test statistics, and a positivist/reductionist approach.

So Jeremy is talking about AP (if I understand the comment), and RSP's comment is probably addressed at AS.

Accountability and efficiency are mentioned by Luisa, who points out that bureaucracies are not known for either. One fragment:

[I] become better at what I do through meaningful dialogue and narrative, rather than through inaccurate/invented data and endless bullet lists.

[...] What goes on in my classroom on a daily basis does not 'count.' What 'counts' is 'documented' learning, ie, the product-as-educational-widget.

This contrasts AP to AS. Luisa also makes the distinction between outcome and process, which I wrote about here.

Grades are assessment says Cal. It is by now canon in the assessment world that grades are not assessments. This does not stop researchers and test-makers from using GPA correlations as evidence of validity, but never mind. But if we look at grades in the lens of the AP/AS definitions, it's clear that grades aren't really either one.

Allow me to pull together an observation combining Luisa and Cal's comments:

Assessment as AP can allow us to overcome bureaucratic limitations natural to higher education.

Good assessment (as AP) can allow us to do things that wouldn't otherwise be thunk of. Grades are statistical goo that don't tell us enough about what a student actually did. There are easy ways to get richer information (I mean really easy, and free too). I recently helped our nascent performing arts programs work through their assessment plans. One of their goals was to develop creativity in students. This is something that takes time to mature, and can best be observed outside the bureaucracy of class blocks. It was a great discussion, which led them to create a feedback form for students, to be used across the curriculum. A draft is shown below.

The idea is that students get feedback in an organized way throughout the whole curriculum. This can't be done with grades. It also doesn't require a big bureaucracy or expensive software to pull off. Note also, that this form is very much AP. Students will get feedback in the form of written comments telling them why their presentation (for example) is at the Freshman/Sophomore level of expectation rather than Graduate (meaning ready to graduate). It is not reductionist (as AS would be), and does not attempt to measure anything--it's a pure, subjective, professional assessment by faculty. There is a rubric (not shown) that communicates basic expectations for each level, but it remains a very flexible tool for tracking progress. I've used this technique in math too--it's a good exercise to have students rate themselves and then compare to your own ratings. In the arts, critiques are important, so you can do peer ratings also.

By way of contrast, the AS approach is to reduce learning into narrow bins (to increase reliability), assign ratings, and then aggregate to create dimension-challenged statistical goo. This quickly becomes so disconnected to what's happening in the classroom, that it's a head-scratcher to figure out what actions might be implied by the numbers (see this, for example).

Emphasis on input vs output is one of several points made by commenter G. Starkey. This may be another way to refer to process versus outcomes. Note that AP is (or can be) deeply involved with process, and its messy subjectivity. AS tries to stand aloof from that by looking only at outcomes. Starkey mentions "the hierarchical nature of assessment," which sounds like a reference to AS. This is underlined by subsequent comments, so the criticism seems to me to be "AS is imposed from the top for accountability," which is an "inherently ill-defined problem." Although AS produces simple, standardized results, the inputs (students, processes) are anything but standardized.

A charge of incompetence is leveled by Charles Bittner, who sees failure in public schools and links that to "failed methods and techniques" of education departments. I presume this to refer to AS-type activities, since massive low-complexity standardized tests are in place in public education.

Fuzziness of rubrics and metrics is not to be taken seriously, says Gratefully uninvolved in an gratuitously elitist comment (referring to those who have never taught at "real universities").

Assessment makes a difference argues the aptly named Ms. Assessment, who supports this with anecdotes about curriculum maps and rubrics and feedback. Although there are not enough details to be sure, it seems to me that she is referring to AP approaches that have direct influence in the classroom and across classes, not a grand AS approach with standard deviations to be moved. But I could be wrong.

AP and AS are different, is the point of BDL, when translated into my definitions above. BDL identifies the "meaning gap" better than I did, talking about the question "what are our students learning, and how do we know this?":

This is no simple bean-counting question for summative accountability. Instead it is an ongoing question of self-reflection that ought to be integral in everything from our curriculum planning, to our classroom practice, and whatever occurs between.

BDL also points at bureaucracy as being unhelpful:

And we continue to measure "student success" in opaque grades, with transcripts that like so many 19th century factories account, mostly, for how many hours students have "spent" in class, over how many years. What they've learned, we can only assert as a matter of faith and belief.

If this idea resonates with you, you might be interested in "Getting Rid of Grades."

AS is symbolic, AP is useful, is my translation of Charles McClintock:

While assessment clearly has a symbolic use to external audiences to signal that resources are being used wisely, it can have its greatest instrumental value for students.

He gives examples of AP being successful. For example "The greatest value of these learning outcomes, in my view, is that they communicate to students what is expected of them in their graduate work."

AS is difficult or impossible, says Raoul Ohio.

Taking issue with Ms. Assessment's comment is Peter C. Herman, who says anonymous anecdotes are not evidence. Here he assumes that she is talking about AS, whereas I assume she's talking about AP, in which case the question of evidence is moot. He underlines this by talking about the problems with an AS program. For example: "'Assessment' assumes that students are inert objects, to be made or marred by their professors, and that is just not a valid assumption." Commenter Bear says "me too" to this idea. Both seem to assume that Assessment = AS, and ignore AP. This is not perhaps a conscious decision, because the context at their respective institutions may be a solely AS-oriented approach.

The next comment, by midwest prof, also criticizes AS:

Standardized measures throughout became the benchmark of effective assessment, with resistance depicted as evidence that faculty either did not know how to teach or simply did not want to be held accountable.

midwest prof advises to "find value in what faculty are already doing and find a way to organize it in a way that those who feel assessment is important can utilize." I don't know if this is cynical (tell the administration some good news and get on with things) or constructive use of crowd-sourcing (see Evolutionary Thinking). Another thread to this comment is lack of preparation in graduate school for assessment, and recommends mentoring.

It takes money, says Unfunded Mandate, including faculty time. It's not clear what kind of assessment program is the subject here, but something time-consuming and expensive is a good guess. This applies more to AS than AP, I think, although both require investment in staff positions and professional development. The next comment, by Wendell Motter, puts some numbers to time spent doing classwork-based (probably AP) assessment, and makes that case that time is proportional to students, and therefore there is a limit beyond which it is not reasonable to expect good results.

Decoupling is the result of bureaucratization argues sk: that (presumably) institutional top-down AS isolates process from result. I interpolated a lot there.

Administration detracts from teaching, asserts Professor C, and argues against "endless rubrics and data formats." This seems like a criticism from the Assessment = AS perspective.

Rubrics are not a panacea, is the theme from an Alfie Kohn article linked by Professor D. The crux of the criticism is perhaps in this paragraph:

Consistent and uniform standards are admirable, and maybe even workable, when we’re talking about, say, the manufacture of DVD players. The process of trying to gauge children’s understanding of ideas is a very different matter, however. It necessarily entails the exercise of human judgment, which is an imprecise, subjective affair.

Again, this is AS contrasted to AP. Rubrics can be used for either--it depends on the approach and use of results. Rubrics as scientific data-measuring are AS, rubrics as communications tools and guides are AP. The whole article is worth reading--it contrasts the two approaches to assessment in different language than I have used here, but it's the same issue.

SA isn't proven to work, is how I interpret John W. Powell's comment, who writes:

Rubrics, simplistic embedded assessments constructed with an eye to efficient end-of-term review, and the resulting reports which overwhelm chairs and assessment coordinators, all betray a severely stunted vision of the multifarious answers to the question, what is education for?

John goes further, challenging the quality of debate: "[...Outcomes assessment] looks more and more like a fundamentalist religious faith." Given the nature of the argument (e.g. "its justifications abjure the science we would ordinarily require"), the critique is best addressed to an AS approach.

Inappropriate measures are a problem, says humanities prof, comparing actual learning outcomes:

I try to teach my students to systematically break down and evaluate complex, abstract and subjective questions.

to the official assessmen:

A four to five question multiple choice pre- and post-test consisting of basic terms and concepts.

Humanities prof blames this on the need to quantify and standardize, which is a charge that would only apply to AS.

Crosstalk is noted by Betsy Vane, who says that advocates are arguing about the reasons for doing assessment and opponents are criticizing the means. Betsy suggests a rapprochement, but I don't see that as possible without deciding first to what extent assessment is going to be AS or AP. I don't see faculty ever buying in to the pure form of AS, for example. Administrations out there spending kilo-dollars to implement data-tracking systems based on the AS idea might want to think hard about this.

Stating the AS agenda is Brian Buerke, who advises that "The key for assessing higher-order skills is to break them down into fundamental, well-defined elements and then assessing each element separately." So that: "[S]tudent progress on specific elements can be charted. Aggregating the results over multiple elements allows student learning to be assessed in the more complex areas." I put this in the AS category. Depending on how seriously you took the numbers generated, it's possible this could be AP. But I doubt it. One of the problems I have with the sort of aggregation I imagine is meant here is that you end up averaging different dimensions and then calling the result some other (new) dimension. For example, if you sum up the rubric scores for "correctness", "style", and "audience" and call the result "writing", it's not much different from adding up height, weight, and hat size of a person and calling it "bigness." Maybe you can abstract something useful out of bigness once in a while, but it's become statistical goo and suspect as information.

The idea that you can take something complex like "critical thinking" and create linear combinations of other stuff to add up to it is suspect to me. I've written a lot about that elsewhere. I think part of the problem is that we get the idea of complexity backwards sometimes because of Moravec's Paradox. For example, solving a linear system of differential equations is probably far lower complexity than judging someone's emotion by looking at their face, even though the first may be hard for us and the second easy.

Arguing against the validity of AS is FK, who writes

Many of the processes that permeate our teaching, in the humanities and in the natural and social sciences, cannot be encapsulated in how we measure students’ learning of facts or methods of analysis through tests or papers, exams or presentations, etc.

I interpret this as complexity argument, which is a theme here. FK describes the practical problem this poses.

If part of the “assessment” is describing what our learning outcomes are, and there are no definitive ways of measuring these outcomes after a course or within the 4-years of the students’ academic career, do we de-value these outcomes and only focus on what is measurable—even when certain courses, especially in the humanities, involve a lot more intangible outcomes than tangible ones?

Note: So you know what my stake is in this, I'll describe my own bias. I'm responsible for making sure that we do assessment and the rest of institutional effectiveness at a small liberal arts school. All my direct experience is with such schools. I've tried and failed to make AS work both as a faculty member and now as admin. I can see the usefulness of an AS approach in very limited circumstances, but I find the claims (for example) of the usefulness of standardized tests of critical thinking to be dubious. On the other hand, I've had good success with AP methods in my own classroom and in working with other programs. So I have a bias in favor of AP because of this. Given the comments I've tried to synthesize, it seems that I'm not alone.

Tuesday, June 01, 2010

Getting Rid of Grades

Here's a thought experiment. Imagine for a moment that there were no end-of-semester grades. No mad rush to see who graduated so we can get the diplomas right, no agonizing phone calls from tearful students about to lose their financial aid. No stress from failing too many (or too few).

Also feel free to imagine that students have to find something else to care about than the bureaucracy of hurdles and credentials that "education" has become. Alfie Kohn writes in this article about rubrics that

[R]esearch shows three reliable effects when students are graded: They tend to think less deeply, avoid taking risks, and lose interest in the learning itself. The ultimate goal of authentic assessment must be the elimination of grades.

In know it sounds crazy, but bear with me for a moment, because it's already being done successfully. As I've noted before, WGU uses assessments instead of grades, but that's not what I mean. No, I'm talking about sports. In intercollegiate sports, student-athletes create a performance history that takes different forms, depending on the sport. These records, videos, and other evidence show the results of performance. Something as artificial as a grade is unnecessary. But this is a very different model from what happens in the classroom. In business, a company is ultimately judged by its bottom line, not by its bond rating. The value of stock shares is not based on ratings by stock-pickers but by on the number on the ticker. In other words, actual performance is more valuable than someone's opinion about it.

What would performance look like for academics? I borrowed the idea the other day that understanding process is more important than looking at outcomes. It's a different, perhaps radical, departure from the way we think about classroom education.

At present, we give assignments, evaluate them according to some scale, and assign a grade. Then that's the end of it for most assignments. We accumulate those in some way and crank out an average to go on the grade sheet. These document outcomes: our assessment of work done. What would it look like to document process instead?

Before mulling that over, let's follow the rabbit further down the hole. If students didn't have grades to worry about, what would they do? It seems obvious that the diploma is just a "big grade" that they get at the end, so we have to get rid of that too. Does the NBA go looking for a "certificate of participation" by the NCAA or university when recruiting a college graduate? Somehow I doubt it.

So no grades, no diplomas, no degree programs, no graduates. The only thing left is direct evidence of participation and performance. This is a frightening thing from the perspective of a university administrator; half the bureaucracy just went out the window. What's left?

Building a culture of accomplishment would take time and restructuring. Instead of absolute ratings in the form of grades, we'd have student portfolios from the past that showcase what is possible with work and wit: not just finished products but also the process of creating them. I gave an example the other day of what this might look like. I assume that performance evidence would be housed in public portfolios that live in the cloud. These portfolios would have to look different from the ones we have now.

The sole purpose of coursework would be to be to lead students to demonstrate performance. Without grades or credentials to be a proxy between the student and potential employers (for example), only actual evidence of having accomplished something is under consideration. In this utopia, rather than students trying to find last minute extra credit to bump their grade up, they'd be demanding opportunities to show off accomplishment. Imagine that.

On the other hand, there are problems with this idea too. What's to prevent someone from padding their portfolio with stuff they didn't really do? Or simply buying a complete portfolio from a third world "portfolio farm"? The problem, represented schematically, is

Actual -> Representation -> Observer

If there's any room to pretty up a representation, then the observer can be fooled. It's impossible for a NCAA basketball player to fake performance because it's on TV and in official records. So maybe the bureaucratic role of the university is simply to certify that, yes, this student really did this work, and provide context for it. In this way, the university would build up a corpus of official student accomplishment that it can show off. Of course, that only brings other trouble. Now it's the institutions themselves who want their product to look good, and could tilt the scales in any number of ways. Still, this may be the best compromise.

Then there's the problem of evaluation. Diplomas and GPAs are easy to assess; portfolios are a big mess from a prospective employer's viewpoint. But there's always the possibility that third-party industrial psychology types could provide search services that would ultimately be much more useful than relying on credentials and grades.

If all this talk of change makes you grumpy, you can always go in the other direction. Here's some advice on how to make grading even more bureaucratic.