Validity, of course, is the notion that an assessment should measure what it purports to. Despite the simplicity of the idea, it's actually quite subtle. After all, smart people have been arguing about epistemology for millinnia. Assessment is no different than any other kind of science (if the assessment is done with serious intent)--there will often be areas of doubt where individual judgment prevails. This is particularly true in using someone else's design to assess your own students/programs/etc.
My response to the IE guy was this. The validity of an instrument depends on what it's used for. The SAT may be marginally useful in predicting success in college, but of no use at all if used to predict how well one plays baseball. Accepting the test-makers' claims to validity need to be examined carefully. As a case in point, see the Applied Research Center's argument that California's CBEST test is not being used in a valid way.
Validity is particularly hard to show for tests that assess complex skills like writing. Interestingly, although course grades are often dismissed as not good assessment data it seems to be common to use grades to validate these tests. Examples include the GRE-W writing assessment, as described here.
In the Atlantic article What Does College Teach, there's a pretty strong condemnation of grades:
[T]here is the direct assessment of student learning that takes place constantly on college campuses, usually symbolized by grades and grade point averages. For our purposes these are nearly useless as indicators of overall educational quality—and not only because grade inflation has rendered GPAs so suspect that some corporate recruiters ask interviewees for their SAT scores instead. Grades are a matter of individual judgment, which varies wildly from class to class and school to school; they tend to reduce learning to what can be scored in short-answer form; and an A on a final exam or a term paper tells us nothing about how well a student will retain the knowledge and tools gained in coursework or apply them in novel situations.In the printed literature for the Collegiate Learning Assessment, which the author is hawking, we're assured that
The CLA measures were designed by nationally recognized experts in psychometrics and assessment, and field tested in order to ensure the highest levels of validity and reliability.That's apparently all the public information there is about the validity of their instrument.
In contrast to these standardized tests is a serious set of standards for writing assessment, some of which the CLA obviously violates, published by the National Council of Teachers of English. If you take a look at these, you'll see that they are quite sensible. The only problem (from a particular point of view) is that they are not easily used to assess legions of students in a short time (sorry, even timed essays are out). I'm embarrassed to say that I had not seen these standards when we developed our own assessment of writing and other skills. We did, however, reach most of the same conclusions independently.
The NCTE and our own model attempt to ensure that context and authenticity are paramount. This means (for us) that assessment of student learning is based on data taken from coursework. This is saved in a portfolio system for longitudinal assessment.
Authentic data gathered in a meaningful context goes a long way toward validating logical uses of an assessment. In the end, there are no short cuts to making the crucial judgment about the results: "what does it really mean?" That's just not something that can be shrink-wrapped.
No comments:
Post a Comment