I also had rather unsatisfactory results in using writing rubrics to assess student portfolios. We went to great effort, and had decent inter-rater reliability, but the scores weren't convincing. Of course, we created problems for ourselves by admitting far too many types of writing in the mix, and omitted the assignments themselves. The database wasn't set up to do that. So in fairness, it may not have been the rubric-based approach that was the problem there.
Competing with this are success stories (see Gary Brown's comments [here]). Gary quotes good numbers for reliability on their widely-used critical thinking rubric, and makes a convincing case that the right approach can yield good results. My new institution uses the CT rubric, but I think our results are not as good (I am in the process of gathering data now). Anecdotally, the instructors tend to over-rate students. This may be fixable with a determined effort, but I'm too new to have a sense of how difficult that might be.
The lesson is that success or failure doesn't lie solely with the choice of a technique. I suppose that's obvious, but there does seem to be a degree of superficiality to the idea of using assessment outcomes in higher education. By this I mean that there's a lot of emphasis (for example from outside reviewers) on having an acceptable form or model. Inevitably this model includes a lot of paper-pushing, which is supposed to guarantee documented success. I think it's a lot harder than that, and a lot more subtle. But I want to stay concrete here, so let me present my ideas for some interesting tests I'm considering.
The research question is "what do we know about critical thinking?" I mean this in a very narrow context--at my university, and relying mostly on data that we already have. Because critical thinking is in the mission statement as well as our Quality Enhancement Plan (a SACS accreditation requirement), it's important. Here are the lines of inquiry:
- What's our inter-rater reliability using the rubric?
- If we do a survey of faculty to ask what they think holistically of student abilities to think critically, how do those ratings compare to the ones on the rubric? (This requires generating a survey quickly, but it can be done.)
- How many actual dimensions are represented in the rubric-derived data? For example, using factor analysis we should be able to see how many components are needed to capture a large percentage of the data. My first impression is that the dimensions are highly correlated.
- How does the rubric data and the general survey data compare to the standardized test? We have MAPP data to compare, which has a critical thinking sub-score. How much correlation is there between these?