Wednesday, April 15, 2009

Response on Rubrics

Gary Brown responds below to my post on rubrics, in which I refer to the Washington State University's critical thinking project. This is taken from his comment to my previous post here. His detailed and thoughtful response deserves better visibility and formatting than the comment section provides, so I've reproduced it in full below.

David makes some interesting points and opens a critical discussion (pun intended), using WSU’s critical thinking rubric as foil for his admittedly geeky folly. I see his geek, and raise him my own folly.

First, thanks for the kind words about our rubric. Having looked at innumerable rubrics, none including ours being perfect, we do think ours is pretty good—especially as it has been vetted by friends (and foes) from around the globe and across many many cultures, all of whom have added to and helped to refine the work. Still, as good as the rubric gets, it is never any better than the use to which it is applied.

To clarify, let me tackle some of David’s specific points. He says critical thinking is “too fuzzy.” In our view, however, the extent of that fuzziness is wholly a reflection of the context to which it is put. As William Carlos Williams the doctor poet said, “no idea but in things.” In practice this means we always assess not only the work students share, but the assignments that evince that work. (More on this in a bit.)

And the purposeful dialogue David extols is also, in our view, the definition of assessment we work hard to promote in our use of the rubric with individual faculty and, more often, with faculty in a common program. Let me say that like this—David says the rubric is not best used in assessment, but that is a narrow definition of assessment to which I believe he refers. We prefer assessment as dialogue as in the root of the word, assay, or sitting next to. That is distinct from definitions and visions of assessment in which the assessor is placed above the one (or the work of the one) assessed, a point to which I think David refers when he admonishes the use of the critical thinking rubric for assessment.

But pedagogy, in my view, should never be separated from assessment of the assay kind, from the perspective of one mindful of the scholarship of learning , from learning from (rather than just grading) students, and from peers, and professional partners as well.

I do dispute the assertion that complex rubrics are problematic, though what we mean be “massively complex” may reveal in the end less disagreement than this response reflects. Our own research is that the sophistication of the rubric and the facilitation of the teaching with the rubric will evince commensurate results. Sophisticated rubrics, in other words, yield sophisticated thinking when the instructor is skilled. Alternately, as George Kuh has pointed out summarizing his own work with the NSSE, we in education tend too often to place the bar so low learners trip over it…..

David is absolutely correct on the mistake of using a data point in a line chart to represent the average. In our formal presentations we try to substitute the presentation so the average is represented as separate from the constituent data points that make up that average. But his arguments about weighting is opaque, confused by suggesting the “average is somehow related to the actual critical thinking ability of the student.” Our dispute, I suspect, is confusing “student” with the work of a student in the context in which the assessment takes place. We hope not to judge students but to judge their work and the context of that work. That’s an important distinction with lots of ramifications. We don’t advocate rubric based assessment in high stakes evaluation, or tests that sort or usher students onto more advanced levels of study and opportunity. Rather, we advocate using the rubric to reflect upon the outcomes of a program, which is different in many ways from evaluating students for ranking and sorting. Further, we almost always try to encourage our partners to turn the rubric on the assignments that evinced the outcomes, which, not surprisingly, is the best predictor of student performance we’ve isolated to date. Students are pretty good at doing what we ask them to do. Much better, it seems, than we are at asking them to do meaningful things that require sophisticated thinking. The fact that the vast majority of assignments we have assessed tend to ask students to report and recapitulate this source or that authority says much about the paucity of evidence for deep and wide practice of critical thinking, however one defines it.

On another point, David uses an analogy to suggest the dimensions in the rubric are not related. Analogies, we know from classical rhetoric are terrific for clarifying points but weak when making arguments. The dimensions of the rubric are best presented with line charts precisely because they are related; they reflect the evidence of one’s performance in a single piece of writing, a poster, a speech, or some other artifact. Again, if one dimension or performance criterion is not aligned with the others, then the assessment has worked powerfully precisely because it shines a light on something that either matters—what can I do, for instance, to help my students surface the assumptions that underpin EVERY issue—or they don’t. Maybe I don’t care that they don’t draw conclusions because I’ve already told them the right answer—write a two page blog post on what David Eubanks says about dimensionality….

David also asserts that “each dimension is assessed subjectively.” Of course there is probably too much written already about the issue of subjectivity and objectivity in assessment, but our approach emphasizes building consensus or expert inter-rater reliability among participants, and we try to expand the participation to include faculty in the discipline, their students, and professionals and partners in the community. It’s that dialogue point that, like David, we value most. Technically, however, it also turns out that validity based on working with a group of faculty to establish standards of agreement on what they value and make that value manifest statistically by calibrating their expertise to agree about 80% of the time will find that approach to assessment trumps most faculty’s ability to design and use what are purportedly the OBJECTIVE measures of their tests. That’s a good thing, in my view. I’m a big fan of faculty expertise, especially as it moves beyond the idiosyncratic to promote a coherent curriculum that is further aligned with what experts outside the academy expect. (Not capitulating to employers, but expanding the dialogue again.) If the technical implications of this discussion are annoying, imagine the amazed experience of a student who comes to discover that what is expected of her is clear and consistent among many of the faculty in the program she pursues as well as by the employer who may hire her. And at the same time, consider the implications of that disembodied multiple choice alternative usually used to assess her. Can we really claim those tests are more useful in any way than a shared and well implemented rubric?

As for creativity, it is our long held view that good critical thinking is creative, and like David illustrates, the rubric is always dependent upon the associated tasks. We advocate authentic learning experiences, which are almost always ill-structured problems. Ill structured problems call for creativity in many ways. For instance, problem identification—the first dimension of the rubric—calls for some measure of creativity to even understand the problem that one might address.

Much of this discussion, especially when one comes to talking about various formal logical fallacies, reflects one's epistemology. It’s that pejorative tendency to regard the subjective that sits at the heart of much debate. Again, if I ask you to identify the right answer on a multiple choice test, for instance, what does 1 + 1 = ? You will probably get it right. Now I ask, if you have one apple and one orange, how many fruit do you have? You will probably identify the right answer. But in the first I tested your knowledge of addition. In the second I tested you knowledge of addition AND your knowledge of fruit. Do you now have two correct answers in your basket, or do you have three? Are the two questions equal in weight? Should they be? The privilege we give to “objectivity” because it produces a result that is easy to count does not mean that we should. But that is another debate. When David, finally, calls for relating critical thinking to something meaningful, I agree and suggest again that half the assessment should point to those assignments we ask of our students. But David also says we should relate it to something “meaningful through predictive validity,” presumably like the regression procedures used in tests like SATs and GREs. It is only slightly ironic that regression analysis has no solid mathematical proof that underpins the common statistical procedure. Yet another debate for another day.


  1. I think we don't disagree on much. Anyone who follows my philosophical and applied trek through assessment the last few years knows that I'm not a fan of using objective (i.e. standardized) tests for anything except the simplest assessment tasks. My point of departure from the mainstream is that I think that 'real' critical thinking (or substitute another complex skill) is what would be assessed by an employer or supervisor after graduation. That is, in the sense of commonly used language.

    My statements about the complexity of critical thinking are meant in the computational sense. I like better analytic/deductive versus creative/inductive as distinctions that can be made much more easily.

    But all this aside, I think we would likely agree that a rubric or other device is only as good as the use to which is it is put. That is, the validity of any assessment depends on implementation and interpretation. The simple faith that "I've applied remedy R to solve my assessment problem" strays more easily into 'theology' than science...

  2. Gary did not leave links to some of the more recent work in CTLT with the CT rubric.

    He points to assessing the assignment which we have done here, but could have also pointed to community conversation about the utility of the constructs

    We have also been exploring this radar graph to display multi-dimensional data, which in some cases seems better for comparisons

    And these explorations of assessment have been coming together in Gary's notion of a "harvesting gradebook" which is a transformative view on the traditional gradebook that makes "grading" useful for course and program assessment as well as offering another perspective in the CLA debate.

    Finally, the most recent of this series is an exploration at AAC&U of the impacts of this thinking (community-based learning assessed by the community of practice) on shifting faculty roles.

  3. Thanks for the links. I like the radar graph, and particularly the idea of showing employer and faculty statistics alongside student assessments (n is quite small in this one, of course). We used surveys of internship supervisors to gather similar information. I'm assuming the plot shows average ratings. It would be interesting to see histogram-type information in the same format. Might be a trick to cram all that information into one graph, though.

    I'll be interested to follow the conversation about the harvesting gradebooks.