Saturday, January 28, 2012

Assessing a QEP

On Wednesday, Guilford College hosted a NCICU meeting about SACSCOC accreditation. I had volunteered to do a very short introduction to my experience with the Quality Enhancement Plan (QEP) at Coker College, since I had seen the thing from inception to impact report. I got permission from Coker to release the report publicly, so here it is:

The whole fifth year report passed with no recommendations, and the letter said nice things about the QEP, so it's reasonable to assume that it's an acceptable exemplar to use in guiding your own report.

Assessment of the QEP program is an important part of the impact report, and this is a good place to record how that worked. The QEP at Coker was about improving writing effectiveness in students, and we tried several ways of assessing success. Only one of these really worked, so I will describe them in enough detail so you don't repeat my mistakes. Unless you just feel compelled to.

Portfolio Review.
I hand-built a web-based document repository (see "The Dropbox Idea" for details) to capture student writing. After enough samples were accumulated, I spent a whole day randomly sampling students in four categories: first year/fourth year vs day/evening. There were 30 of each, for 120 students. Then I sampled three writing samples from each to create a student portfolio. There was some back and forth because some students didn't have three samples at that point. I used a box cutter to redact student names, just like I imagine the CIA does. Each portfolio got an ID number that would allow me to look up who it was. The Composition coordinator created a rubric for rating the samples, and one Saturday we brought in faculty, administrators, adjuncts, and a high school English teacher to rate the portfolios. We spent a good part of the day applying the rubric to the papers, and many of the papers were rated three times. All were rated at least twice by different raters.

The results were disappointing. There were some faint indications of trends, but mostly it was noise, and not useful for steering a writing program. In retrospect, there were two conceptual problems. First, the papers we were looking at were not standardized. It's hard to compare a business plan to a short story. Second, the rubrics were not used in the assignments, but conjured later when we wanted to assess. It's essential for rubrics to be effective that they be as integrated as possible into the construction of the assignment.

So this was a lot of work for a dud of a report, most of which is probably my fault.

Pre-post Test
One of the administrators decided we should apply a writing placement test, which we already had data for, as a measure of writing gain by giving it again as a post-test after students took the ENG 101 class. The assignment was to find and correct errors in sample sentences. The English instructors told us it wouldn't work and it didn't. More noise.

Discipline-Specific Rubrics
We did, in fact, learn something from the rubric fiasco. We allowed programs to create their own rubrics, which could be applied to assignments in the repository. So an instructor could look at a work, pull up the custom rubric, and rate it right there and then. Since the prof knew the assignment, this seemed like a way to get more meaningful results. I think this would have worked, but by the time we got all the footwork done, the QEP program was a couple of years under way. I left the college before it was possible to do a large-scale analysis of the results that were in the database. In summary: good idea, executed too late.

Direct Observation by Faculty
Back in 2001, when I got the job of being SACSCOC liason, I got a copy of the brand new Principles and started reading. The more I read, the more I was terrified. And nothing frightened me more than CS 3.5.1, the standard on general education. I didn't know at the time that the standard said one thing, but everyone interpreted in a completely different way (it was written as a minimum standard requirement, but everyone looked for continuous improvement). So I was one of those people you see at the annual meeting who look like they are on potent narcotics, drifting around with a dazed look at the enormousness of the challenge. (Note: I think they should hand out mood rings at the annual meeting so you can see how stressed someone is before you talk to them.)

In an act of desperation, I led an effort to create what we now call the Faculty Assessment of Core Skills (FACS), which is nothing more than subjective faculty ratings of liberal arts skills demonstrated by students in their classes. The skills included writing effectiveness. At the end of the semester, each instructor was supposed to give a subjective rating to each student taught for observed skills on the list. You can read all about this in the Assessing the Elephant manuscript, or on this blog, or in one of the three books I wrote chapters for on the subject.

Because we had started the FACS before the QEP, we had baseline data, plus data for every semester during the project's life. Thousands and thousands of data points about student writing abilities. When we started the FACS I didn't have much hope for it--it was a "Hail Mary" pass at CS 3.5.1. But as it turns out, it was exactly what we needed. We were able to show that FACS scores improved faster for students who had used the writing lab than those students who didn't. Moreover, this effect was sensitive to the overall ability of the student, as judged by high school grades.  See "Assessing Writing" for the details.

I have given many talks about the FACS over the years, and get interesting reactions. One pair of psychologists seemed amazed that anything so blatantly subjective could be useful for anything at all, but they were very nice about it. When I post FACS results on the ASSESS-L list serve, you can hear the crickets chirping afterwards. I guess it doesn't seem dignified because it doesn't have a reductionist pedigree.

So I was shocked at the NCICU meeting, when SACSCOC Vice President Steve Sheeley said things like (my notes, probably not his exact words) "Professors' opinions as professionals are more important than standardized tests," and "Professors know what students are good at and what they are not good at."

The reason for my reaction is that when one hears official statements about assessment, it's almost always emphasized that it has to be suitably scientific. "Proven valid and reliable" is a standard formula, and certainly "measurable" figures (see "Measurement Smesurement" for my opinion on that). However it is stated, there isn't much room for something as touchy-feely as subjective opinions of course instructors. I do give good arguments for both validity and reliability in Assessing the Elephant, but FACS is never going to look like a psychometrician's version of assessment. So it was a shock and a very pleasant surprise to hear a note of common sense in the assessment symphony. I think when Steve made that remark, he assumed that this special knowledge professors acquire after working with students was simply inaccessible as assessment data. But it's not, and by now Coker has many thousands of data points over more than a decade to prove it. And it turned out to be the key to showing the QEP actually worked.

I have implemented the FACS at JCSU, and created a cool dashboard for it. I showed this off at the meeting, and you can download a sample of it here if you want. The real one is interactive so you can disaggregate the data down to the level you want to look at, even generating individual student reports for advisors. Setting up and running the FACS is trivial. It costs no money, takes no time, and you get rich data back that can be used for all kinds of things. Everyone should do this as a first, most basic, method of assessment.

1 comment:

  1. Dave, my interpretation of Steve's comment was that he was talking about the interpretation of data. I wrote down 'professional judgment is essential to process'. So, even where one has a reliable/valid test, interpretation of the results should be made by the faculty. What does the data tell us about our students?