Thursday, June 03, 2010

Assessment Crosstalk

A recent InsideHigherEd article "The Faculty Role in Assessment" has stayed on the site's Most Popular list for several days.  In "Fixing Assessment" I posted my solution: differentiating clearly between strategic and tactical goals and pursuing those at an appropriate level.  It is evident from the comments on the IHE article that there is deep interest in the issue, and in this post I will take a stab at categorizing those comments.

There's no proof assessment does any good is the argument made by RSP.  This turns the tables on the psychometric-speak employed by the academic arm of the assessment profession (up to and including standardized test vendors).  The argument is natural: if you can measure learning, show me how much students have learned and how much assessment matters, qualitatively and rigorously. The community of assessment professionals is aware of this contradiction, but don't seem to take it seriously.  The obvious solution to that is to be more modest about claims. 

Assessment is big business is the ad hominum comment by skeptical_provost.  It is ironically book-ended by John B. later on, who is selling the product. Now an ad hominum may not be a bad argument.  If I learned you invested with Mr. Madoff, I may have reason to be worried for your finances without further information.  In this case, the industrialization of testing has produced undesirable side effects.  Marketing overstates what the tests can do, this and politicization seem to have cultivated the idea even at the Dept of Ed that we can measure minds like weighing potatoes.  This underlines the previous point of don't believe the advertising.  Do some critical thinking, right?

Oversimplification of assessments  is a problem first raised by Jonathan.  I've argued this point many times in this blog: the complexity of an assessment has to match the complexity of what you're assessing.  Or else you should be very suspicious of the results.  This relates to the previous comments in at least two ways.  First, the industrialization of assessment requires convenience (for economic reasons) and reliability (in order to make the argument that measurement is being done).  Both of these serve to reduce the complexity of assessments to standardized instruments and standardized scoring.  As a prosaic example, imagine hiring a babysitter.  Would you rely solely on some standardized test to tell you how good a candidate is?  Or would you want to interview the person, see if you can suss out how good his/her judgment is, how well the person communicates, how bright they seem?  You might want to do both to cover your bases, but the second (high complexity) step tells you things the low-complexity test cannot.  This problem raises a particular question for the more aggressive claims of assessment: how do you account for complexity?  Here's another example.
A limerick is a well-defined form, which you can read about here.  We can fairly easily create a rubric and standardized assessment for limericks.  Imagine you've done that, and then assess the following one (borrowed from John D. Barrow's Impossibility):

There was a young man of Milan
Whose rhymes they never would scan;
When asked why it was,
He said, 'It's because
I always try to cram as many words into the last line as ever I possibly can.'
Rated honestly, you'd have to flunk it, because it's not technically a limerick at all.  It fails the most basic requirements.  However, it could be considered a meta-limerick, which is arguably more interesting.  It's unlikely your rubric can handle this.  It might explode.

It does make a difference, responds Jeremy Penn (RSP's comment).  But there is a subtle shift here.  Jeremy references what happens in the classroom and instruction techniques, NOT assessment writ large.  This mismatch of meaning and vocabulary is at the heart of much misunderstanding.  Let me call these points of view:
Assessment as Pedagogy (AP): use of subjective and some standardized techniques to improve the delivery of instruction at the course and program level, tied very closely to the faculty who teach the classes and the material that's being taught.  It doesn't need heavy bureaucracy or assumptions about reductionism.

Assessment as Science (AS): the idea that we can actually measure learning precisely enough to determine things like the "value added" by a general education curriculum.  The hallmarks of AS are large claims tied to narrow standardized instruments, theoretical models that pin parameter estimates on test statistics, and a positivist/reductionist approach.
So Jeremy is talking about AP (if I understand the comment), and RSP's comment is probably addressed at AS. 

Accountability and efficiency are mentioned by Luisa, who points out that bureaucracies are not known for either.  One fragment:
[I] become better at what I do through meaningful dialogue and narrative, rather than through inaccurate/invented data and endless bullet lists.

[...] What goes on in my classroom on a daily basis does not 'count.' What 'counts' is 'documented' learning, ie, the product-as-educational-widget.
This contrasts AP to AS.  Luisa also makes the distinction between outcome and process, which I wrote about here

Grades are assessment says Cal.  It is by now canon in the assessment world that grades are not assessments.  This does not stop researchers and test-makers from using GPA correlations as evidence of validity, but never mind.  But if we look at grades in the lens of the AP/AS definitions, it's clear that grades aren't really either one. 

Allow me to pull together an observation combining Luisa and Cal's comments:
Assessment as AP can allow us to overcome bureaucratic limitations natural to higher education.
Good assessment (as AP) can allow us to do things that wouldn't otherwise be thunk of.  Grades are statistical goo that don't tell us enough about what a student actually did. There are easy ways to get richer information (I mean really easy, and free too).  I recently helped our nascent performing arts programs work through their assessment plans. One of their goals was to develop creativity in students.  This is something that takes time to mature, and can best be observed outside the bureaucracy of class blocks.  It was a great discussion, which led them to create a feedback form for students, to be used across the curriculum.  A draft is shown below.

The idea is that students get feedback in an organized way throughout the whole curriculum.  This can't be done with grades.  It also doesn't require a big bureaucracy or expensive software to pull off.  Note also, that this form is very much AP.  Students will get feedback in the form of written comments telling them why their presentation (for example) is at the Freshman/Sophomore level of expectation rather than Graduate (meaning ready to graduate).  It is not reductionist (as AS would be), and does not attempt to measure anything--it's a pure, subjective, professional assessment by faculty.  There is a rubric (not shown) that communicates basic expectations for each level, but it remains a very flexible tool for tracking progress.   I've used this technique in math too--it's a good exercise to have students rate themselves and then compare to your own ratings.  In the arts, critiques are important, so you can do peer ratings also.

By way of contrast, the AS approach is to reduce learning into narrow bins (to increase reliability), assign ratings, and then aggregate to create dimension-challenged statistical goo.  This quickly becomes so disconnected to what's happening in the classroom, that it's a head-scratcher to figure out what actions might be implied by the numbers (see this, for example).

Emphasis on input vs output is one of several points made by commenter G. Starkey.  This may be another way to refer to process versus outcomes.  Note that AP is (or can be) deeply involved with process, and its messy subjectivity.  AS tries to stand aloof from that by looking only at outcomes.  Starkey mentions "the hierarchical nature of assessment," which sounds like a reference to AS.  This is underlined by subsequent comments, so the criticism seems to me to be "AS is imposed from the top for accountability," which is an "inherently ill-defined problem."  Although AS produces simple, standardized results, the inputs (students, processes) are anything but standardized.

A charge of incompetence is leveled by Charles Bittner, who sees failure in public schools and links that to "failed methods and techniques" of education departments.  I presume this to refer to AS-type activities, since massive low-complexity standardized tests are in place in public education.

Fuzziness of rubrics and metrics is not to be taken seriously, says Gratefully uninvolved in an gratuitously elitist comment (referring to those who have never taught at "real universities").

Assessment makes a difference argues the aptly named Ms. Assessment, who supports this with anecdotes about curriculum maps and rubrics and feedback.  Although there are not enough details to be sure, it seems to me that she is referring to AP approaches that have direct influence in the classroom and across classes, not a grand AS approach with standard deviations to be moved.  But I could be wrong.

AP and AS are different, is the point of BDL, when translated into my definitions above.  BDL identifies the "meaning gap" better than I did, talking about the question "what are our students learning, and how do we know this?":
This is no simple bean-counting question for summative accountability. Instead it is an ongoing question of self-reflection that ought to be integral in everything from our curriculum planning, to our classroom practice, and whatever occurs between.
BDL also points at bureaucracy as being unhelpful:
And we continue to measure "student success" in opaque grades, with transcripts that like so many 19th century factories account, mostly, for how many hours students have "spent" in class, over how many years. What they've learned, we can only assert as a matter of faith and belief.
If this idea resonates with you, you might be interested in "Getting Rid of Grades."

AS is symbolic, AP is useful, is my translation of Charles McClintock:
While assessment clearly has a symbolic use to external audiences to signal that resources are being used wisely, it can have its greatest instrumental value for students.
He gives examples of AP being successful.  For example "The greatest value of these learning outcomes, in my view, is that they communicate to students what is expected of them in their graduate work."

AS is difficult or impossible, says Raoul Ohio.

Taking issue with Ms. Assessment's comment is Peter C. Herman, who says anonymous anecdotes are not evidence.  Here he assumes that she is talking about AS, whereas I assume she's talking about AP, in which case the question of evidence is moot.  He underlines this by talking about the problems with an AS program.  For example: "'Assessment' assumes that students are inert objects, to be made or marred by their professors, and that is just not a valid assumption."  Commenter Bear says "me too" to this idea.  Both seem to assume that Assessment = AS, and ignore AP.  This is not perhaps a conscious decision, because the context at their respective institutions may be a solely AS-oriented approach. 

The next comment, by midwest prof, also criticizes AS:
Standardized measures throughout became the benchmark of effective assessment, with resistance depicted as evidence that faculty either did not know how to teach or simply did not want to be held accountable.
midwest prof advises to "find value in what faculty are already doing and find a way to organize it in a way that those who feel assessment is important can utilize."  I don't know if this is cynical (tell the administration some good news and get on with things) or constructive use of crowd-sourcing  (see Evolutionary Thinking).  Another thread to this comment is lack of preparation in graduate school for assessment, and recommends mentoring.

It takes money, says Unfunded Mandate, including faculty time.  It's not clear what kind of assessment program is the subject here, but something time-consuming and expensive is a good guess.  This applies more to AS than AP, I think, although both require investment in staff positions and professional development. The next comment, by Wendell Motter, puts some numbers to time spent doing classwork-based (probably AP) assessment, and makes that case that time is proportional to students, and therefore there is a limit beyond which it is not reasonable to expect good results. 

Decoupling is the result of bureaucratization argues sk: that (presumably) institutional top-down AS isolates process from result.  I interpolated a lot there.

Administration detracts from teaching, asserts Professor C, and argues against "endless rubrics and data formats."  This seems like a criticism from the Assessment = AS perspective.

Rubrics are not a panacea, is the theme from an Alfie Kohn article linked by Professor D.  The crux of the criticism is perhaps in this paragraph:
Consistent and uniform standards are admirable, and maybe even workable, when we’re talking about, say, the manufacture of DVD players.  The process of trying to gauge children’s understanding of ideas is a very different matter, however. It necessarily entails the exercise of human judgment, which is an imprecise, subjective affair.   
Again, this is AS contrasted to AP.  Rubrics can be used for either--it depends on the approach and use of results.  Rubrics as scientific data-measuring are AS, rubrics as communications tools and guides are AP. The whole article is worth reading--it contrasts the two approaches to assessment in different language than I have used here, but it's the same issue.

SA isn't proven to work, is how I interpret John W. Powell's comment, who writes:
Rubrics, simplistic embedded assessments constructed with an eye to efficient end-of-term review, and the resulting reports which overwhelm chairs and assessment coordinators, all betray a severely stunted vision of the multifarious answers to the question, what is education for?
John goes further, challenging the quality of debate: "[...Outcomes assessment] looks more and more like a fundamentalist religious faith." Given the nature of the argument (e.g. "its justifications abjure the science we would ordinarily require"), the critique is best addressed to an AS approach.

Inappropriate measures are a problem, says humanities prof, comparing actual learning outcomes:
I try to teach my students to systematically break down and evaluate complex, abstract and subjective questions.
to the official assessmen:
 A four to five question multiple choice pre- and post-test consisting of basic terms and concepts.
Humanities prof blames this on the need to quantify and standardize, which is a charge that would only apply to AS.

Crosstalk is noted by Betsy Vane, who says that advocates are arguing about the reasons for doing assessment and opponents are criticizing the means.  Betsy suggests a rapprochement, but I don't see that as possible without deciding first to what extent assessment is going to be AS or AP.  I don't see faculty ever buying in to the pure form of AS, for example.  Administrations out there spending kilo-dollars to implement data-tracking systems based on the AS idea might want to think hard about this.

Stating the AS agenda is Brian Buerke, who advises that "The key for assessing higher-order skills is to break them down into fundamental, well-defined elements and then assessing each element separately."  So that: "[S]tudent progress on specific elements can be charted. Aggregating the results over multiple elements allows student learning to be assessed in the more complex areas."  I put this in the AS category.  Depending on how seriously you took the numbers generated, it's possible this could be AP.  But I doubt it. One of the problems I have with the sort of aggregation I imagine is meant here is that you end up averaging different dimensions and then calling the result some other (new) dimension.  For example, if you sum up the rubric scores for "correctness", "style",  and "audience" and call the result "writing", it's not much different from adding up height, weight, and hat size of a person and calling it "bigness."  Maybe you can abstract something useful out of bigness once in a while, but it's become statistical goo and suspect as information.

The idea that you can take something complex like "critical thinking" and create linear combinations of other stuff to add up to it is suspect to me.  I've written a lot about that elsewhere.  I think part of the problem is that we get the idea of complexity backwards sometimes because of Moravec's Paradox.  For example, solving a linear system of differential equations is probably far lower complexity than judging someone's emotion by looking at their face, even though the first may be hard for us and the second easy.

Arguing against the validity of AS is FK, who writes
Many of the processes that permeate our teaching, in the humanities and in the natural and social sciences, cannot be encapsulated in how we measure students’ learning of facts or methods of analysis through tests or papers, exams or presentations, etc.
I interpret this as complexity argument, which is a theme here.  FK describes the practical problem this poses.
If part of the “assessment” is describing what our learning outcomes are, and there are no definitive ways of measuring these outcomes after a course or within the 4-years of the students’ academic career, do we de-value these outcomes and only focus on what is measurable—even when certain courses, especially in the humanities, involve a lot more intangible outcomes than tangible ones?

Note:  So you know what my stake is in this, I'll describe my own bias.  I'm responsible for making sure that we do assessment and the rest of institutional effectiveness at a small liberal arts school.  All my direct experience is with such schools.  I've tried and failed to make AS work both as a faculty member and now as admin. I can see the usefulness of an AS approach in very limited circumstances, but I find the claims (for example) of the usefulness of standardized tests of critical thinking to be dubious.  On the other hand, I've had good success with AP methods in my own classroom and in working with other programs.  So I have a bias in favor of AP because of this.  Given the comments I've tried to synthesize, it seems that I'm not alone.


  1. Unfunded Mandate10:32 PM

    You wonder what kind of assessment is meant here. In your notation, it is AS, but any externally mandated assessment regime requires serious time if it is to be serious.

    Your phrase "something time-consuming and expensive" sounds dismissive to me, as if you don't really believe these assessment regimes are. In any case, it is up to those who create and mandate them to come up with a financial accounting for them (or a rewards structures that weighs teaching and student learning higher than research), but they never do.

    Follow the money, baby.

  2. AP can be an external mandate (see my prior post on the possible role of accreditors), it just can't be used for accountability. "time-consuming and expensive" was meant sincerely: any AS project will have to be. An AP project still costs money and time, but nowhere near as much. As for cost/benefit, AS is dubious, AP is probably reasonable.

  3. This is a very problematic analysis that deflects criticism of dominant and real assessment approaches by creating a red herring, or rather an imaginary construct called "AP." In a very Orwellian style, all critiques of assessment are brushed aside as addressing AS but nothing really touches AP!

    Arguing that "pedagogy" and/or integrated practices of pedagogy could serve, function,or stand in for "assessment" is wordplay. Self reflective practices and intuitive approaches to how learning is taking place as an integral part of edcuation is not what is at issue in assessment, and it is irresponsible to deflect serious concerns about reductive and controllling processes in academia, especially in the name of a delusion!

  4. @FK AP may be poorly named or inadequately defined above, but it's real--I've been using it for years. The fact that the criticisms in the comments aren't directed at it probably means that it's not perceived as a threat the way AS is. Pointing out that there are useful alternatives to AS would seem to be a good thing. Are you saying that a reasonable alternative to AS deflects criticism from assessment in general, and that's therefore bad?

  5. The comment by RSP to the effect that there are no clear examples of improved learning due to assessment regimes (AS, in your terms) seems to me to be devastating, if it is so. As you say in the comment above, faculty will tend to believe "AS is dubious, AP is probably reasonable;" but if there are not clear examples of improved educational quality due to AS, faculty -- it seems to me -- should resist it as much as ever they can. It would indeed be educational Lysenkoism.

    So where are the examples? Must there not be examples?

  6. @Brendan I agree. If there are no examples, it doesn't live up to the billing of "science." There are some who would say that there are such examples, such as Richard Hake, who advocates inquiry-based learning (see, which has some grounding in research. Of course, this just changes the questions. Doctors aren't medical researchers. Are teachers supposed to be researchers of teaching effectiveness?

  7. In regard to the proposition that grades are not assessment, it could be opposed that grades are not useful for student outcome assessment because they are blended, . But it seems like an effective counter to this would be a careful system of discrimination within courses and record keeping to match. If my program posits six student learning outcomes as the basis for a major and every course in that major notes which learning outcomes it addresses and then students are given, in effect, a grade for each of these outcomes as well as the blended grade for each course, then I think it would be perfectly valid to use student grades (the pre-blended grades for the various learning outcomes) to determine the students' progress in meeting those learning outcomes. If we make the grading fit the outcomes, then they should be useful, right?

  8. @Brendan, I think that under the conditions you described, a grade would be a 'summative' assessment. I think most assessment directors would leap for joy if faculty created something like that. Personally, I can't see much justification for mashing together an average of a bunch of qualitatively different things in order to reach an A-F grade, simply for the sake of data compression. But I could be wrong about that. Perhaps there is some part of the brain's functionality that can be actually measured (through factor analysis on some big data set perhaps) AND corresponds to a very low-dimensional representation in the way 'Q' is supposed to represent intelligence. In this case, however, we ought to have scientifically determined weights to know how much to assign to each learning objective in order to reach the total. This seems far beyond anything a course instructor ought to be expected to do.

  9. But when I decide to weigh 60% of a course grade for exams (of which about one-third might be based on so-called objective questions, one third on analysis-critique, and one third narrative explanation), 30% on analysis writing (e.g., a research paper), and 10% based on informed participation in discussion, what I am doing is not so far from what you describe, n'est-ce pas? All I need to do is define the graded elements according to the learning outcomes for the program.

  10. That's pretty much how a rubric-based approach works. AS purists would insist on standardization of elements, weights, and reliable scoring. Then it all gets saved in some repository for further analysis. For example, factoring out gen ed type skills across the curriculum. Throw in a few standardized tests, and this is pretty much the AS program. What you've described could also be used for AP, however: establishing a clear vocabulary about expectations seems to really help. For example, a good description of what comprises excellent analysis-critique sets the bar and establishes usable vocabulary (with both students and faculty). I find this very helpful.

    So, I agree that a grade could be an assessment of the sort usually obtained with rubrics and whatnot, but in most cases I would guess this isn't what happens. Also, since grades are political in various ways, they get torqued away from outcomes assessments sometimes.

  11. Do you think this focus on student outcomes could have the result of encouraging faculty to use single-outcome course evaluation? For the learning data outcome, we could have a course that relies on one or a few long, multiple-choice, fact-based examinations for determining grades. If analysis of data according to certain criteria is a learning outcome, the instruction would be based on showing how these criteria are applied in various cases and student assessment would be entirely based on writing up such analyses of various data sets. And on we go. If there are 6 learning outcomes, there would be six types of courses each geared to one specific outcome and evaluated according to that outcome and no other.

  12. There undoubtedly is a simplification process that results in one or a few assessments in cases I've seen. Part of the motivation is probably just to get something done to satisfy the assessment requirement.

    I'm not sure how well it would work to do one learning outcome per course, but it's an interesting idea I've not heard before. I think the learning outcomes might end up being big fuzzy ones, which is okay as long as the assessment is big and fuzzy too.