I'm scheduled to give a talk on grade statistics on Monday 10/26, reviewing the work in the lead article of JAIE's edition on grades. The editors were supportive in not just accepting my dive into the statistics of course grades and their relationship to other measures of learning, but they recruited others in the assessment community to respond. I got to have the final word in a synthesis.
That effort turned into a panel discussion at the Institute, which will follow my research presentation. The forum is hosted by AAC&U and JAIE, comprising many of the authors from the special edition who commented. One reason for this post is to provide a stable hyperlink to the presentation slides, which are heavily annotated. The file is saved on researchgate.net, and to cite the slides directly use:
If you want to cite the slides in APA format, you can use:
Eubanks, D. (2023, October 26). Grades and Learning. ResearchGate. https://www.researchgate.net/publication/374977797_Grades_and_Learning
The panel discussion will also have a few slides, but you'll have to get those from the conference website.
Ancient Wisdom
As I've tracked down references in preparation for the conference, I've gone further back in time and would like to call out two books by Benjamin Bloom, whose name you'll recognize from taxonomy fame.
The first of these was published more than sixty years ago.
Assessment Philosophies
1. Product Control by Management
2. Educational Research
Managing hundreds of SLO reports for accreditors is soul-destroying, and the work that many of us prefer to do is (1) engage with faculty members on a more human level and lean on their professional judgment in combination with ideas from published research, or (2) do original research on student learning and other outcomes.
The research strand has a rich history and a lot of progress to show for it. The early days assessment movement, before being co-opted by accreditation, led to Scholarship of Teaching and Learning and a number of pedagogical advances that are still in use.
Research often starts with preconceived ideas, theories, and models of reality that frame what we expect to find. But crucially, all of those ideas have to eventually bend to whatever we discover through empirical investigation. There are many questions that can't be answered at all, per Feynman. It's messy and difficult and time-consuming to build theories from data, test them, and slowly accumulate knowledge. It's bottom-up discovery, not top-down control.
What's new in the last few years is the rapid development of data science: the ease with which we can quickly analyze large data sets with sophisticated methods.
The Gap
There's a culture gap between top-down quality control and bottom-up research. I think that explains why assessment practice seems so divorced from applicable literature and methods. I also don't think there's much of a future in SLO report-writing, since it's obvious by now that it's not controlling quality (see the slides for references besides Denning).
This is a long way to go to explain why all those cautions about using grades have evolved. Grades are mostly out of the control of authorities, so it was necessary to create a new kind of grades that were under direct supervision (SLOs and assessments). It's telling that one of the objections to grades is that they might be okay if the work in the course is aligned to learning outcomes. In other words, if we put an approval process between grades and their use in assessment reports, it's okay--it's now subject to management control.
Older Posts on Grades
You might also be interested in these scintillating thoughts on grades:
- Are you calculating GPA wrong?
- Remaking grades (adjusting for course difficulty)
- Using grade data for improvement
- Grade transition maps
Update 1
The day after I posted this, IHE ran an article on a study of placement: when should a student be place in a developmental class instead of the college introductory class. You can find the working paper here. From the IHE review, here's the main result:
Meanwhile, students who otherwise would have been in developmental courses but were “bumped up” to college-level courses by the multiple-measures assessment had notably better outcomes than their peers, while those who otherwise would have been in college-level courses but were “bumped down” by the assessment fared worse.
Students put in college-level courses because of multiple measures were about nine percentage points more likely than similar students placed using the standard process to complete college-level math or English courses by the ninth term. Students bumped up to college-level English specifically were two percentage points more likely than their peers to earn a credential or transfer to a four-year university in that time period. In contrast, students bumped down to developmental courses were five to six percentage points less likely than their peers to complete college-level math or English.
Note that the outcome here is just course completion. No SLOs were mentioned. Yet the research seems to lead to plausible and significant benefits for student learning. I point this out because, at least in my region, this effort would not check the boxes for an accreditation reports for the SLO standard. It fails the first checkbox: there are no learning outcomes!
If you want to read my thoughts on why defining learning outcomes is over-rated, see Learning Assessment: Choosing Goals.
Update 2
This forum was the culmination of several years of work on my part, and perhaps more importantly due to the vision and execution of the editors of J. Assessment and IE. Mark was there to MC the panel, which largely comprised authors who responded to my lead article on grades with their own essays that are found in the same issue. Kate couldn't make it to the panel, but she deserves thanks for making space for us on the AAC&U track at the conference. We didn't get a chance to ask her what she meant by "third rail" in the title. Certainly, if you have tried to use course grades as learning assessments in your accreditation reports, you run the risk of being shocked, so maybe that's it.
Peter Ewell was there! He's been very involved with the assessment movement since the beginning and has documented its developments in many articles. I got a chance to sit down with him the day before the panel and discuss that history. My particular interest was a 1984 NIE report that recommended to accreditors to adopt the define-measure-improve formula that's still the basis for most SLO standards, nearly forty years later. My short analysis of the NIE report and its consequences will be found in December, 2023 edition of Assessment Update. Peter led that report's creation, so it was great to hear that I hadn't missed the mark in my exegesis. We also chatted about the grades issue, how the prejudice got started (accident of history), and the ongoing obtuseness (my words) of the accrediting agencies.
One of my three slides in my introductory remarks for the panel showed the overt SACSCOC "no grades" admonition found in official materials. I could have added another example, taken from a Q&A with one of the large accreditors, (lightly edited)
Q: [A program report] listed all of the ways they were gathering data. There were 14 different ways in this table. All over the place, but course grades weren't in the list anywhere, and I know from working in this for a long time that there has been historically among the accreditors a kind of allergy to course grades. Do you think that was just an omission, or is there active discouragement to use course grades?
A: We spend a considerable amount of time training institutions around assessment, and looking at both direct and indirect measures, and grades would be an indirect measure. And so we encourage them to look at more direct ways of assessing student learning.
It's not hard to debunk the "indirect" idea--there's no intellectual content there, e.g. no statistical test for "directness." What it really means is "I don't like it." and "we say so."
I should note, as I did in the sessions, that this criticism of accreditor group-think is not intended to undermine the legitimacy of accreditation. I'm very much in favor of peer accreditation, and have served on eleven review teams myself. It works, but it could be better, and the easiest thing to improve is the SLO standard. Ironically, it means just taking a hard look at the evidence.
I won't try to summarize the panelists. There's a lot of overlap with the essays they wrote in the journal, so go read those. I'll just pull out one thread of conversation that emerged, which is the general relationship between regulators and reality.
A large part of a regulator's portfolio (including accreditors, although they aren't technically regulators) is to exert power by creating reality. I think of this as "nominal" reality, which is an oxymoron, but an appropriate one. For example, a finding that an instructor is unqualified to teach a class effectively means that the instructor won't be able to teach that class anymore, and might lose his or her job. An institution that falls too short may see its accreditation in peril, which can lead to a confidence drop and loss of enrollment. These are real outcomes that stem from paperwork reviews.
The problem is that regulators can't regulate actual reality, like passing a law to "prove" a math theorem, or reducing the gravitational constant to reduce the fraction of overweight people in the population. This no-fly zone includes statistical realities, where the trouble starts for SLO standards. Here's an illustration. A common SLO is "quantitative reasoning," which sounds good, so it gets put on the wish list for graduates. Who could be against more numerate students? The phrase has some claim to existence as a recognizable concept: using numbers to solve problems or answer questions. But that's far too general to be reliably measured, which is reality's way of indicating a problem. Any assessment has to be narrow in scope, and the generalization of a collection of narrow items to some larger skill set has to be validated. In this case, the statement is too broad to ever be valid, since many types of quantitative reasoning depend on domain knowledge that student's can be assumed to have.
In short, just because we write down an SLO statement doesn't mean it corresponds to anything real. Empiricism usually works the other way around, by naming patterns we reliably find in nature. As a reductio ad absurdum consider the SLO "students will do a good job." If they do a good job on everything they do, we don't really need any other SLOs, so we can focus on just this one--what an advance that would be for assessment practice! As a test, we could give students the task of sorting a random set of numbers, like 3,7,15,-4,7, 25,0,1/2, and 4, and score them on that. Does that generalize to doing a good job on, say, conjugating German verbs? Probably not. If we called the SLO "sorting a small list of numbers" we would be okay, but "doing a good job" is a phantasm.
So you can see the problem. When regulators require wish lists that probably don't correspond to anything real, we get some kind of checkbox compliance, and worse: a culture devoted to the study of generalized nonsense.
Two Paths to Relevance
You might object that faculty are working with us now because of the SLO report requirements, and if that goes away they may stop taking our calls. This is true, but it has administrative and regulatory implications. If accreditation standards take a horizontal approach and ask for evidence of teaching quality, it's a natural fit to a more faculty development style of assessment office. And it will likely produce better results, since an institution can currently place no value on undergraduate education and still pass the SLO requirement with the right checkboxes.
But I don't claim to have all the answers. It's encouraging that the conversation has begun, and I think my final words resonated with the group, viz. that librarians and accountants set their own professional standards. After more than three decades of practice, isn't it time that the assessment practitioners stopped deferring to accreditors to tell them what to do and set their own standards?