Wednesday, April 15, 2009

Rubrics and Dimensionality

Maybe I'm just a geek, but I thought it was really cool the first time I ran into the idea of dimension analysis in a physics class. Wikipedia has a nice page on it here, but the basic idea is simple. In common parlance you can't compare apples and oranges. If you derive some cool new equation, but then discover that the left side is a number and the right side is a vector, something went wrong. A related problem, which we'll consider together, is that of mismatched units. You can't legitimately ask "If three apples together weight 2 pounds, and a stick is six inches long, which is more?" My dad claims that in basic training the sergeant used to yell "Ahright yous guys! Line up in alphabetical order according to height!"

The picture of a town sign I used a few days ago illustrated the comedy of mismatched units nicely, summing up population, feet about sea level, and the year founded to get a total.

I was prompted to think about dimensions because of the rubric I came across for Washington State's critical thinking project (CT). The rubric has seven dimensions, with levels of accomplishment ranging from emerging to developing to mastering. The whole thing seems polished, well-thought out, and generally nicely done. If you want at critical thinking rubric, this is probably as good as it gets. Personally, I think the idea of critical thinking itself is too fuzzy to be as useful as it seems, and have written about that previously here and here.

In reading through the CT material, it seems that one of the main effects comes from classes that simply use the rubric in class. Quoting from their findings:
In the four courses where the rubric was used variously for instruction and evaluation, the papers received significantly higher ratings than in the four courses in which the rubric was not used.
Dialogue is almost always better than monologue in a teaching environment. See here for more on that idea. A rubric used to spark conversation and generate content is better used than one that simple waits in the drawer until rating time. I guess this should be obvious, but I didn't realize how powerful that technique is until I started using it in the classroom myself. Of course, you can't have a massively complex rubric or you'll just get the 1040 effect (tax form reference--it's April 15, after all!).

Because the WSU example is a good, well developed rubric, we can use it to explore some problems with rubrics in general, particularly as regards dimensionality. For example, the WSU graph of results and accompanying chart makes the usual mistake of averaging the dimensions to get a "CT average score." The resulting numerical goo may actually correlated with something (that is, have some predictive validity), although I didn't see that claim made. But philosophically such operations are always suspect for the same reasons grades are--all dimensionality is surrendered. The implicit assumption is that all dimensions are equally weighted, that the average is somehow related to the actual critical thinking ability of the student. It's similar to saying something like:
I've measured a multidimensional box, of which I only saw seven dimensions. There may be more I don't know about. But the average dimension of the box was 3.4 inches.
Actually, the situation is much worse if we assume that the dimensions have different units--that would be like averaging pounds and inches. However, as I said, this is a common practice, and this kind of graph shows up a lot in reports.

Each dimension is assessed subjectively, for example to what extent does a student integrate issues using OTHER (disciplinary) perspectives and positions (dimension 5). Ultimately the rater chooses a numerical score from 1 to 6, representing subjective judgments like for "developing":
Ideas are investigated, if in a limited way, and integrated, if unevenly.
This is in the middle of the rubric, for a score of 3 or 4. Not all dimensions apply equally to all problems, however. Taking an actual example, look at the puzzle How many triangles?
It shows a geometric figure and asks you to count how many triangles there are. Is this a critical thinking problem? If so, which dimensions apply to it, and which do not? Are there dimensions that are not found in the rubric that are important? For this particular problem, there is a needed insight--a flash of creativity--that is required to notice that small triangles can overlap to make big triangles. I don't see any dimension in the rubric that corresponds directly to creativity. Is creativity part of critical thinking?

Most of the problem is with the idea of critical thinking, not the rubric. But we should be cautioned not to put too much faith in numerical goo that comes out of such grading schemes. The real utility is in the classroom, to spark discussion about techniques of thinking and to focus in on particular ones for particular assignments. But there is another problem that is philosophically worse: the assumption that we can obtain macro-level assessments from micro-level ones by adding up components. This is a mereological fallacy.

What is the real critical thinking score? What if we simply asked instructors, supervisors, or other educated adults to observe the students and assess directly (subjectively) the students' critical thinking ability? This is no more suspect than the subjective assessments of the dimensions given in the rubric, after all. If subjectivity is a problem, then we have to throw out most rubrics altogether. We're already wired with the ability to make such complex judgments. But they are almost certainly not simple linear relationships of components we observe. We understand intuitively that some demonstrations of thought (critical thinking, analytical thinking, creative thinking, effective writing, etc.) will heavily use some techniques and not others. Critical thinking takes so many forms that it's impossible to say from one instance to the next what will be the most important element or mode of analysis.

In summary: don't average rubric scores unless you can relate it to something meaningful through predictive validity, look instead for direct subjective measures, don't take the idea of dimension and measurement too seriously, and realize that the rubrics probably leave out all kinds of important stuff. The best use of the rubric is for pedagogy, not assessment. In fact, it would be a good exercise to have students critique the rubric to see what's missing. That would be a real exercise in critical thinking.


  1. David makes some interesting points and opens a critical discussion (pun intended), using WSU’s critical thinking rubric as foil for his admittedly geeky folly. I see his geek, and raise him my own folly.

    First, thanks for the kind words about our rubric. Having looked at innumerable rubrics, none including ours being perfect, we do think ours is pretty good—especially as it has been vetted by friends (and foes) from around the globe and across many many cultures, all of whom have added to and helped to refine the work. Still, as good as the rubric gets, it is never any better than the use to which it is applied.

    To clarify, let me tackle some of David’s specific points. He says critical thinking is “too fuzzy.” In our view, however, the extent of that fuzziness is wholly a reflection of the context to which it is put. As William Carlos Williams the doctor poet said, “no idea but in things.” In practice this means we always assess not only the work students share, but the assignments that evince that work. (More on this in a bit.)
    And the purposeful dialogue David extols is also, in our view, the definition of assessment we work hard to promote in our use of the rubric with individual faculty and, more often, with faculty in a common program. Let me say that like this—David says the rubric is not best used in assessment, but that is a narrow definition of assessment to which I believe he refers. We prefer assessment as dialogue as in the root of the word, assay, or sitting next to. That is distinct from definitions and visions of assessment in which the assessor is placed above the one (or the work of the one) assessed, a point to which I think David refers when he admonishes the use of the critical thinking rubric for assessment.
    But pedagogy, in my view, should never be separated from assessment of the assay kind, from the perspective of one mindful of the scholarship of learning , from learning from (rather than just grading) students, and from peers, and professional partners as well.
    I do dispute the assertion that complex rubrics are problematic, though what we mean be “massively complex” may reveal in the end less disagreement than this response reflects. Our own research is that the sophistication of the rubric and the facilitation of the teaching with the rubric will evince commensurate results. Sophisticated rubrics, in other words, yield sophisticated thinking when the instructor is skilled. Alternately, as George Kuh has pointed out summarizing his own work with the NSSE, we in education tend too often to place the bar so low learners trip over it…..
    David is absolutely correct on the mistake of using a data point in a line chart to represent the average. In our formal presentations we try to substitute the presentation so the average is represented as separate from the constituent data points that make up that average. But his arguments about weighting is opaque, confused by suggesting the “average is somehow related to the actual critical thinking ability of the student.” Our dispute, I suspect, is confusing “student” with the work of a student in the context in which the assessment takes place. We hope not to judge students but to judge their work and the context of that work. That’s an important distinction with lots of ramifications. We don’t advocate rubric based assessment in high stakes evaluation, or tests that sort or usher students onto more advanced levels of study and opportunity. Rather, we advocate using the rubric to reflect upon the outcomes of a program, which is different in many ways from evaluating students for ranking and sorting. Further, we almost always try to encourage our partners to turn the rubric on the assignments that evinced the outcomes, which, not surprisingly, is the best predictor of student performance we’ve isolated to date. Students are pretty good at doing what we ask them to do. Much better, it seems, than we are at asking them to do meaningful things that require sophisticated thinking. The fact that the vast majority of assignments we have assessed tend to ask students to report and recapitulate this source or that authority says much about the paucity of evidence for deep and wide practice of critical thinking, however one defines it.

    On another point, David uses an analogy to suggest the dimensions in the rubric are not related. Analogies, we know from classical rhetoric are terrific for clarifying points but weak when making arguments. The dimensions of the rubric are best presented with line charts precisely because they are related; they reflect the evidence of one’s performance in a single piece of writing, a poster, a speech, or some other artifact. Again, if one dimension or performance criterion is not aligned with the others, then the assessment has worked powerfully precisely because it shines a light on something that either matters—what can I do, for instance, to help my students surface the assumptions that underpin EVERY issue—or they don’t. Maybe I don’t care that they don’t draw conclusions because I’ve already told them the right answer—write a two page blog post on what David Eubanks says about dimensionality….
    David also asserts that “each dimension is assessed subjectively.” Of course there is probably too much written already about the issue of subjectivity and objectivity in assessment, but our approach emphasizes building consensus or expert inter-rater reliability among participants, and we try to expand the participation to include faculty in the discipline, their students, and professionals and partners in the community. It’s that dialogue point that, like David, we value most. Technically, however, it also turns out that validity based on working with a group of faculty to establish standards of agreement on what they value and make that value manifest statistically by calibrating their expertise to agree about 80% of the time will find that approach to assessment trumps most faculty’s ability to design and use what are purportedly the OBJECTIVE measures of their tests. That’s a good thing, in my view. I’m a big fan of faculty expertise, especially as it moves beyond the idiosyncratic to promote a coherent curriculum that is further aligned with what experts outside the academy expect. (Not capitulating to employers, but expanding the dialogue again.) If the technical implications of this discussion are annoying, imagine the amazed experience of a student who comes to discover that what is expected of her is clear and consistent among many of the faculty in the program she pursues as well as by the employer who may hire her. And at the same time, consider the implications of that disembodied multiple choice alternative usually used to assess her. Can we really claim those tests are more useful in any way than a shared and well implemented rubric?
    As for creativity, it is our long held view that good critical thinking is creative, and like David illustrates, the rubric is always dependent upon the associated tasks. We advocate authentic learning experiences, which are almost always ill-structured problems. Ill structured problems call for creativity in many ways. For instance, problem identification—the first dimension of the rubric—calls for some measure of creativity to even understand the problem that one might address.
    Much of this discussion, especially when one comes to talking about various formal logical fallacies, reflects one's epistemology. It’s that pejorative tendency to regard the subjective that sits at the heart of much debate. Again, if I ask you to identify the right answer on a multiple choice test, for instance, what does 1 + 1 = ? You will probably get it right. Now I ask, if you have one apple and one orange, how many fruit do you have? You will probably identify the right answer. But in the first I tested your knowledge of addition. In the second I tested you knowledge of addition AND your knowledge of fruit. Do you now have two correct answers in your basket, or do you have three? Are the two questions equal in weight? Should they be? The privilege we give to “objectivity” because it produces a result that is easy to count does not mean that we should. But that is another debate. When David, finally, calls for relating critical thinking to something meaningful, I agree and suggest again that half the assessment should point to those assignments we ask of our students. But David also says we should relate it to something “meaningful through predictive validity,” presumably like the regression procedures used in tests like SATs and GREs. It is only slightly ironic that regression analysis has no solid mathematical proof that underpins the common statistical procedure. Yet another debate for another day.

  2. Thanks for the very thoughtful response! I turned it into a post here for better visibility.