Higher Ed/: February 2006

Saturday, February 11, 2006

Letter to the Commission

I wrote letters to several of the members of the Commission on the Future of Higher Education. It's not all that easy to find their email addresses, and I found only six in the time I spent looking. Here's the letter.

I'm writing to you regarding the progress of the Commission on the Future of Higher Education. I applaud the efforts of your group to improve our educational system, but I have some concerns based on news reports I've read.

It seems that the Commission is considering using standardized testing (instruments like the Collegiate Learning Assessment (CLA)) as a metric to compare institutions. If so, this is a seriously flawed approach that could become an expensive lost opportunity.

Standardized testing is convenient, but not at all suitable to judge complex skills. In contrast, the National Council on Teachers of English (NCTE) has a very good set of guidelines for assessing writing. For example, students are ideally taught write deliberately, to revise, to use social connections to improve their work (short of plagiarism, obviously). None of this is possible in a standardized testing environment. Standardized tests may be convenient and reliable, but they suffer from a lack of validity. Sometimes test makers try to assure us that the "guys in white coats" have done all that technical stuff. This would contradict the very notion of validity--the meaning of the results. In fact, from a teacher's point of view, it's hard to figure out what a standardized test score *means* about the curriculum, unless one simply addresses very narrow skills that the test focuses on.

There are better, more valid, ways to assess student learning. For example, many colleges and universities now use student portfolios in association with specific learning objectives in order to set goals and assess them. This has obvious validity since it links outcomes to a huge database of direct evidence, and we have found at Coker College that inter-rater reliability is acceptable also. Of course, comparing such measures across institutions would be a nightmare. But the truth is, that parents don't care about *learning* per se (for the most part), and perhaps your commission shouldn't either.

What parents care about is how much earning potential their graduates will have. This is far easier to measure than "learning", and lends itself to the kinds of calculations that can validate or invalidate public financial aid. Are the costs of student loans justified by the increased earning power of graduates (after subtracting the opportunity costs)? There's no way to answer that question based on averages of standardized tests.

Since the US government already has on hand tax data and financial aid data, one would imagine that very detailed studies by institution, by state, by demographic, by major, and so on, could be accomplished relatively cheaply. Conventional wisdom is that a college degree is the best investment a young man or woman can make. A definitive study of that would be very enlightening and could guide policy. Conversely, we could maximize scores on some computer-scored test and end up with a generation of students who can only write poetry. (Not that there's anything wrong with poetry, but we need engineers and doctors too.)

I'm a mathematics teacher by training, and I still believe that a good education is about learning, not money. That's a healthy attitude from the perspective of the classroom. However, at the national level, the hard decisions of how much financial aid is "worth" it, or which institution does a better job of preparing students for a productive (economically) career, we're not really talking about learning anymore, but the *effects* of learning. From public documents it seems to me that the Commission may be conflating these. I hope you will reconsider before advocating more standardized testing.

It will be interesting to see if I get a response.

Friday, February 10, 2006

Higher Profile for Testing Proposal

Currently the most emailed New York Times article (subscription may be required) is entitled "Panel Explores Standard Tests for Colleges." A quote:

A higher education commission named by the Bush administration is examining whether standardized testing should be expanded into universities and colleges to prove that students are learning and to allow easier comparisons on quality.

This is of course the Commission on the Future of Higher Education, chaired by Charles Miller. Oddly, he's quoted as saying "There is no way you can mandate a single set of tests, to have a federalist higher education system." I agree completely with that statement, but I'm not sure how they propose to achieve their stated goal of comparing institutions if each uses a different metric.

The article quotes Leon Botstein and David L. Warren staking out a reasonable position from the perspective of those actually doing the educating, opposing "a single, national, high-stakes, one-size-fits-all, uber-outcome exam."

Predictably, standardized test fans like the proposal. Here's a sample:

Any honest look at the new adult literacy level data for recent college grads leaves you very queasy. And the racial gaps are unconscionable. So doing something on the assessment side is probably important. The question is what and when.

And there's a connection to the CLA, which I've written about before here.

Mr. Miller, in his recent memorandum and in the interview, pointed to the recently developed Collegiate Learning Assessment test as a breakthrough. The exam, developed by the Council for Aid to Education, a former division of the RAND Corporation, asks students to write essays and solve complex problems.

The central problem is that in order for a uniform exam to have high reliability the scoring has to be uniform. In the best case this is done by a computer. Unfortunately, this algorithmic 'right or wrong' scoring is not aligned with the typical goal of higher education to deliver higher-order cognitive skills. What is 2+2? Correct answers include 4, IV, 100 (binary), vier (German), and 1 (modulo 3). Standardized testing by its very nature ignores any component of imagination that makes its way into the answer if that imagination introduces unwanted variability (which seems terribly likely).

See my post on validity for more. The National Council of Teachers of English has a very well-considered set of standards for judging writing. There's no way that they translate into standardized testing, however, which means they would be ruled out in the proposal under discussion.

Critical thinking is one of the abilities that some standardized tests purport to assess. Unfortunately, the decision seems to be driven by the economics of test-giving (it's easy to score bubble-sheets), and a misplaced importance on reliability. One would hope that the Commission will apply some critical thinking to this problem.

Sunday, February 05, 2006

Economics and Higher Education

Daniel S. Hamermesh at UT-Austin writes about the economic value of a college degree. On his blog at NCTE, Paul Bodmer comments that

He is an economist, so he gives economics answers. And that is good, particularly from my point of view. The quick answer to the first question is that the payoff is better for students and the public now than in the 1970’s.

This strikes me as good starting place to take a broad look at college education.

Validity in Assessment

At a round table discussion at the SACS conference last year an Institutional Effectiveness guy was expressing the same kinds of frustrations we all feel in trying to comply with requirements to assess everything. He was advocating the use of convenient off-the-shelf solutions in the form of standardized tests. What got my attention was his statement to the effect that "these things have already been proven valid."

Validity, of course, is the notion that an assessment should measure what it purports to. Despite the simplicity of the idea, it's actually quite subtle. After all, smart people have been arguing about epistemology for millinnia. Assessment is no different than any other kind of science (if the assessment is done with serious intent)--there will often be areas of doubt where individual judgment prevails. This is particularly true in using someone else's design to assess your own students/programs/etc.

My response to the IE guy was this. The validity of an instrument depends on what it's used for. The SAT may be marginally useful in predicting success in college, but of no use at all if used to predict how well one plays baseball. Accepting the test-makers' claims to validity need to be examined carefully. As a case in point, see the Applied Research Center's argument that California's CBEST test is not being used in a valid way.

Validity is particularly hard to show for tests that assess complex skills like writing. Interestingly, although course grades are often dismissed as not good assessment data it seems to be common to use grades to validate these tests. Examples include the GRE-W writing assessment, as described here.

In the Atlantic article What Does College Teach, there's a pretty strong condemnation of grades:

[T]here is the direct assessment of student learning that takes place constantly on college campuses, usually symbolized by grades and grade point averages. For our purposes these are nearly useless as indicators of overall educational quality—and not only because grade inflation has rendered GPAs so suspect that some corporate recruiters ask interviewees for their SAT scores instead. Grades are a matter of individual judgment, which varies wildly from class to class and school to school; they tend to reduce learning to what can be scored in short-answer form; and an A on a final exam or a term paper tells us nothing about how well a student will retain the knowledge and tools gained in coursework or apply them in novel situations.

In the printed literature for the Collegiate Learning Assessment, which the author is hawking, we're assured that

The CLA measures were designed by nationally recognized experts in psychometrics and assessment, and field tested in order to ensure the highest levels of validity and reliability.

That's apparently all the public information there is about the validity of their instrument.

In contrast to these standardized tests is a serious set of standards for writing assessment, some of which the CLA obviously violates, published by the National Council of Teachers of English. If you take a look at these, you'll see that they are quite sensible. The only problem (from a particular point of view) is that they are not easily used to assess legions of students in a short time (sorry, even timed essays are out). I'm embarrassed to say that I had not seen these standards when we developed our own assessment of writing and other skills. We did, however, reach most of the same conclusions independently.

The NCTE and our own model attempt to ensure that context and authenticity are paramount. This means (for us) that assessment of student learning is based on data taken from coursework. This is saved in a portfolio system for longitudinal assessment.

Authentic data gathered in a meaningful context goes a long way toward validating logical uses of an assessment. In the end, there are no short cuts to making the crucial judgment about the results: "what does it really mean?" That's just not something that can be shrink-wrapped.