A recent
InsideHigherEd article "
The Faculty Role in Assessment" has stayed on the site's Most Popular list for several days. In "
Fixing Assessment" I posted my solution: differentiating clearly between strategic and tactical goals and pursuing those at an appropriate level. It is evident from the comments on the
IHE article that there is deep interest in the issue, and in this post I will take a stab at categorizing those comments.
There's no proof assessment does any good is the argument made by
RSP. This turns the tables on the psychometric-speak employed by the academic arm of the assessment profession (up to and including standardized test vendors). The argument is natural:
if you can measure learning, show me how much students have learned and how much assessment matters, qualitatively and rigorously. The community of assessment professionals is aware of this contradiction, but don't seem to take it seriously. The obvious solution to that is to be more modest about claims.
Assessment is big business is the
ad hominum comment by
skeptical_provost. It is ironically book-ended by John B. later on, who is selling the product. Now an
ad hominum may not be a bad argument. If I learned you invested with Mr.
Madoff, I may have reason to be worried for your finances without further information. In this case, the industrialization of testing has produced undesirable side effects. Marketing
overstates what the tests can do, this and politicization seem to have cultivated the idea even at the Dept of Ed that we can measure minds like weighing potatoes. This underlines the previous point of
don't believe the advertising. Do some critical thinking, right?
Oversimplification of assessments is a problem first raised by
Jonathan. I've argued this point
many times in this blog: the complexity of an assessment has to match the complexity of what you're assessing. Or else you should be very suspicious of the results. This relates to the previous comments in at least two ways. First, the industrialization of assessment requires convenience (for economic reasons) and reliability (in order to make the argument that measurement is being done). Both of these serve to reduce the complexity of assessments to standardized instruments and standardized scoring. As a prosaic example, imagine hiring a babysitter. Would you rely solely on some standardized test to tell you how good a candidate is? Or would you want to interview the person, see if you can suss out how good his/her judgment is, how well the person communicates, how bright they seem? You might want to do both to cover your bases, but the second (high complexity) step tells you things the low-complexity test cannot. This problem raises a particular question for the more aggressive claims of assessment:
how do you account for complexity? Here's another example.
A limerick is a well-defined form, which you can read about here. We can fairly easily create a rubric and standardized assessment for limericks. Imagine you've done that, and then assess the following one (borrowed from John D. Barrow's Impossibility):
There was a young man of Milan
Whose rhymes they never would scan;
When asked why it was,
He said, 'It's because
I always try to cram as many words into the last line as ever I possibly can.'
Rated honestly, you'd have to flunk it, because it's not technically a limerick at all. It fails the most basic requirements. However, it could be considered a meta-limerick, which is arguably more interesting. It's unlikely your rubric can handle this. It might explode.
It does make a difference, responds
Jeremy Penn (
RSP's comment). But there is a subtle shift here. Jeremy references what happens in the classroom and instruction techniques, NOT assessment writ large. This mismatch of meaning and vocabulary is at the heart of much misunderstanding. Let me call these points of view:
Assessment as Pedagogy (AP): use of subjective and some standardized techniques to improve the delivery of instruction at the course and program level, tied very closely to the faculty who teach the classes and the material that's being taught. It doesn't need heavy bureaucracy or assumptions about reductionism.
Assessment as Science (AS): the idea that we can actually measure learning precisely enough to determine things like the "value added" by a general education curriculum. The hallmarks of AS are large claims tied to narrow standardized instruments, theoretical models that pin parameter estimates on test statistics, and a positivist/reductionist approach.
So Jeremy is talking about AP (if I understand the comment), and
RSP's comment is probably addressed at AS.
Accountability and efficiency are mentioned by
Luisa, who points out that bureaucracies are not known for either. One fragment:
[I] become better at what I do through meaningful dialogue and narrative, rather than through inaccurate/invented data and endless bullet lists.
[...] What goes on in my classroom on a daily basis does not 'count.' What 'counts' is 'documented' learning, ie, the product-as-educational-widget.
This contrasts AP to AS. Luisa also makes the distinction between outcome and process, which I wrote about
here.
Grades are assessment says
Cal. It is by now canon in the assessment world that grades are
not assessments. This does not stop researchers and test-makers from using GPA correlations as evidence of validity, but never mind. But if we look at grades in the lens of the AP/AS definitions, it's clear that grades aren't really either one.
Allow me to pull together an observation combining Luisa and Cal's comments:
Assessment as AP can allow us to overcome bureaucratic limitations natural to higher education.
Good assessment (as AP) can allow us to do things that wouldn't otherwise be thunk of. Grades are
statistical goo that don't tell us enough about what a student actually did. There are
easy ways to get richer information (I mean
really easy, and free too). I recently
helped our nascent performing arts programs work through their assessment plans. One of their goals was to develop creativity in students. This is something that takes time to mature, and can best be observed outside the bureaucracy of class blocks. It was a great discussion, which led them to create a feedback form for students, to be used across the curriculum. A draft is shown below.
The idea is that students get feedback in an organized way throughout the whole curriculum. This can't be done with grades. It also doesn't require a big bureaucracy or expensive software to pull off. Note also, that this form is very much AP. Students will get feedback in the form of written comments telling them why their presentation (for example) is at the Freshman/Sophomore level of expectation rather than Graduate (meaning ready to graduate). It is not reductionist (as AS would be), and does not attempt to measure anything--it's a pure, subjective, professional assessment by faculty. There is a rubric (not shown) that communicates basic expectations for each level, but it remains a very flexible tool for tracking progress. I've used this technique in math too--it's a good exercise to have students rate themselves and then compare to your own ratings. In the arts, critiques are important, so you can do peer ratings also.
By way of contrast, the AS approach is to reduce learning into narrow bins (to increase reliability), assign ratings, and then aggregate to create
dimension-challenged statistical goo. This quickly becomes so disconnected to what's happening in the classroom, that it's a head-
scratcher to figure out what actions might be implied by the numbers (see
this, for example).
Emphasis on input vs output is one of several points made by commenter
G. Starkey. This may be another way to refer to process versus outcomes. Note that AP is (or can be) deeply involved with process, and its messy subjectivity. AS tries to stand aloof from that by looking only at outcomes. Starkey mentions "the hierarchical nature of assessment," which sounds like a reference to AS. This is underlined by subsequent comments, so the criticism seems to me to be "AS is imposed from the top for accountability," which is an "inherently ill-defined problem." Although AS produces simple, standardized results, the inputs (students, processes) are anything but standardized.
A charge of incompetence is leveled by
Charles Bittner, who sees failure in public schools and links that to "failed methods and techniques" of education departments. I presume this to refer to AS-type activities, since massive low-complexity standardized tests are in place in public education.
Fuzziness of rubrics and metrics is not to be taken seriously, says
Gratefully uninvolved in an gratuitously elitist comment (referring to those who have never taught at "real universities").
Assessment makes a difference argues the aptly named
Ms. Assessment, who supports this with anecdotes about curriculum maps and rubrics and feedback. Although there are not enough details to be sure, it seems to me that she is referring to AP approaches that have direct influence in the classroom and across classes, not a grand AS approach with standard deviations to be moved. But I could be wrong.
AP and AS are different, is the point of
BDL, when translated into my definitions above.
BDL identifies the "meaning gap" better than I did, talking about the question
"what are our students learning, and how do we know this?":
This is no simple bean-counting question for summative accountability. Instead it is an ongoing question of self-reflection that ought to be integral in everything from our curriculum planning, to our classroom practice, and whatever occurs between.
BDL also points at bureaucracy as being unhelpful:
And we continue to measure "student success" in opaque grades, with transcripts that like so many 19th century factories account, mostly, for how many hours students have "spent" in class, over how many years. What they've learned, we can only assert as a matter of faith and belief.
If this idea resonates with you, you might be interested in "
Getting Rid of Grades."
AS is symbolic, AP is useful, is my translation of
Charles McClintock:
While assessment clearly has a symbolic use to external audiences to signal that resources are being used wisely, it can have its greatest instrumental value for students.
He gives examples of AP being successful. For example "The greatest value of these learning outcomes, in my view, is that they communicate to students what is expected of them in their graduate work."
AS is difficult or impossible, says
Raoul Ohio.
Taking issue with Ms. Assessment's comment is
Peter C. Herman, who says anonymous anecdotes are not evidence. Here he assumes that she is talking about AS, whereas I assume she's talking about AP, in which case the question of evidence is moot. He underlines this by talking about the problems with an AS program. For example: "'Assessment' assumes that students are inert objects, to be made or marred by their professors, and that is just not a valid assumption." Commenter Bear says "me too" to this idea. Both seem to assume that Assessment = AS, and ignore AP. This is not perhaps a conscious decision, because the context at their respective institutions may be a solely AS-oriented approach.
The next comment, by
midwest prof, also criticizes AS:
Standardized measures throughout became the benchmark of effective assessment, with resistance depicted as evidence that faculty either did not know how to teach or simply did not want to be held accountable.
midwest prof advises to "find value in what faculty are already doing and find a way to organize it in a way that those who feel assessment is important can utilize." I don't know if this is cynical (tell the administration some good news and get on with things) or constructive use of crowd-sourcing (see
Evolutionary Thinking). Another thread to this comment is lack of preparation in graduate school for assessment, and recommends mentoring.
It takes money, says
Unfunded Mandate, including faculty time. It's not clear what kind of assessment program is the subject here, but something time-consuming and expensive is a good guess. This applies more to AS than AP, I think, although both require investment in staff positions and professional development. The next comment, by
Wendell Motter, puts some numbers to time spent doing classwork-based (probably AP) assessment, and makes that case that time is proportional to students, and therefore there is a limit beyond which it is not reasonable to expect good results.
Decoupling is the result of bureaucratization argues
sk: that (presumably) institutional top-down AS isolates process from result. I interpolated a lot there.
Administration detracts from teaching, asserts Professor C, and argues against "endless rubrics and data formats." This seems like a criticism from the Assessment = AS perspective.
Rubrics are not a panacea, is the theme from an
Alfie Kohn article linked by Professor D. The crux of the criticism is perhaps in this paragraph:
Consistent and uniform standards are admirable, and maybe even workable, when we’re talking about, say, the manufacture of DVD players. The process of trying to gauge children’s understanding of ideas is a very different matter, however. It necessarily entails the exercise of human judgment, which is an imprecise, subjective affair.
Again, this is AS contrasted to AP. Rubrics can be used for either--it depends on the approach and use of results. Rubrics as scientific data-measuring are AS, rubrics as communications tools and guides are AP. The whole article is worth reading--it contrasts the two approaches to assessment in different language than I have used here, but it's the same issue.
SA isn't proven to work, is how I interpret
John W. Powell's comment, who writes:
Rubrics, simplistic embedded assessments constructed with an eye to efficient end-of-term review, and the resulting reports which overwhelm chairs and assessment coordinators, all betray a severely stunted vision of the multifarious answers to the question, what is education for?
John goes further, challenging the quality of debate: "[...Outcomes assessment] looks more and more like a fundamentalist religious faith." Given the nature of the argument (e.g. "its justifications abjure the science we would ordinarily require"), the critique is best addressed to an AS approach.
Inappropriate measures are a problem, says
humanities prof, comparing actual learning outcomes:
I try to teach my students to systematically break down and evaluate complex, abstract and subjective questions.
to the official assessmen:
A four to five question multiple choice pre- and post-test consisting of basic terms and concepts.
Humanities prof blames this on the need to quantify and standardize, which is a charge that would only apply to AS.
Crosstalk is noted by
Betsy Vane, who says that advocates are arguing about the reasons for doing assessment and opponents are criticizing the means. Betsy suggests a rapprochement, but I don't see that as possible without deciding first to what extent assessment is going to be AS or AP. I don't see faculty ever buying in to the pure form of AS, for example. Administrations out there spending kilo-dollars to implement data-tracking systems based on the AS idea might want to think hard about this.
Stating the AS agenda is
Brian Buerke, who advises that "The key for assessing higher-order skills is to break them down into fundamental, well-defined elements and then assessing each element separately." So that: "[S]
tudent progress on specific elements can be charted. Aggregating the results over multiple elements allows student learning to be assessed in the more complex areas." I put this in the AS category. Depending on how seriously you took the numbers generated, it's possible this could be AP. But I doubt it. One of the problems I have with the sort of aggregation I imagine is meant here is that you end up averaging different dimensions and then calling the result some other (new) dimension. For example, if you sum up the rubric scores for "correctness", "style", and "audience" and call the result "writing", it's not much different from adding up height, weight, and hat size of a person and calling it "bigness." Maybe you can abstract something useful out of bigness once in a while, but it's become statistical goo and suspect as information.
The idea that you can take something complex like "critical thinking" and create linear combinations of other stuff to add up to it is suspect to me. I've written a lot about that
elsewhere. I think part of the problem is that we get the idea of complexity backwards sometimes because of
Moravec's Paradox. For example, solving a linear system of differential equations is probably far lower complexity than judging
someone's emotion by looking at their face, even though the first may be hard for us and the second easy.
Arguing against the validity of AS is
FK, who writes
Many of the processes that permeate our teaching, in the humanities and in the natural and social sciences, cannot be encapsulated in how we measure students’ learning of facts or methods of analysis through tests or papers, exams or presentations, etc.
I interpret this as complexity argument, which is a theme here.
FK describes the practical problem this poses.
If part of the “assessment” is describing what our learning outcomes are, and there are no definitive ways of measuring these outcomes after a course or within the 4-years of the students’ academic career, do we de-value these outcomes and only focus on what is measurable—even when certain courses, especially in the humanities, involve a lot more intangible outcomes than tangible ones?
Note: So you know what my stake is in this, I'll describe my own bias. I'm responsible for making sure that we do assessment and the rest of institutional effectiveness at a small liberal arts school. All my direct experience is with such schools. I've tried and failed to make AS work both as a faculty member and now as admin. I can see the usefulness of an AS approach in very limited circumstances, but I find the claims (for example) of the usefulness of standardized tests of critical thinking to be dubious. On the other hand, I've had good success with AP methods in my own classroom and in working with other programs. So I have a bias in favor of AP because of this. Given the comments I've tried to synthesize, it seems that I'm not alone.