Friday, August 16, 2024

A Canticle for Bloom*

Introduction

Stephen Jay Gould promoted the idea of non-overlaping magisteria, or ways of knowing the world that can be separated into mutually exclusive domains, where each "holds the appropriate tools for meaningful discourse and resolution." The tension Gould was trying to resolve was between religion and science: 

Science tries to document the factual character of the natural world, and to develop theories that coordinate and explain these facts. Religion, on the other hand, operates in the equally important, but utterly different, realm of human purposes, meanings, and values—subjects that the factual domain of science might illuminate, but can never resolve. -- Stephen Jay Gould from Rock of Ages

I'll call these "ways of knowing" or WOKs, which seems more down to earth than "magisteria." Each WOK contains cross-checks on knowledge that are particular to the domain.  Scientific questions are judged by scientific standards. Personal choices are based on experience and usually don't have "correct" answers but degrees of validation.

Should you give a friend a loan? That's a lived experience question. Are your car's spark plugs failing to ignite? That's a science question. Should you take a recommended drug despite severe side effects? That's somewhere in between. 

There's a third WOK we need to talk about: ideology. Steven Mintz provided a crisp definition recently in his blog on InsideHigherEd

Ideologies simplify, clean up and package reality into something easily consumable, palatable and appealing to a mass audience. In doing so, ideologues discard the messy, complex and often unpleasant aspects of reality, presenting only what fits neatly within their framework. Ideology, thus, distorts reality by filtering out anything that does not conform to its narrative.

Ideologies are particularly powerful when associated with a utopia. For example, the way that Marxism evolved into an intellectual justification for Stalin's USSR. Lysenkoism enforced ideology on biological research.  

Here's a diagram of our three WOKs with what we might find in the overlaps.

Galileo was interested in the physical reality of the cosmos (among other things), at the center of the diagram, creating new methods for WOK 2. But the correct way of speaking about the universe needed to adhere to church doctrine (ideology, WOK 3). This is the "correctness" overlap, only some of which corresponded to reality (geocentricism did not). See Steven Shapin's The Scientific Revolution for a nuanced narrative; it's not as simple as the usual telling of Galileo vs the Church. Jennifer Michael Hecht would rightfully insist on putting "ritual" in the overlap between ideology and lived experience, and argue that the ritual can fulfill a social and personal need. Ideology isn't bad; we need it. It can just overlap in odd ways with other WOCs.

SLO Assessment 

Yesterday I participated in a conversation with a small group of experienced assessment directors as part of an ongoing project to fix accreditation standards. The discussion echoed a theme I heard last year when I interviewed a dozen peer reviewers from various accreditors, that there's value in the formal kind of assessment that gathers data and does statistics on it, but it's more common to see success by just getting faculty together to talk about learning goals. We also talked about accreditation standards. I suggest that there are three important WOKs in assessment:
  1. Professional judgment and collaboration
  2. Educational measurement and inferential statistics
  3. Adjudication of accreditation policy
If we find that first-generation students have abysmal pass rates in math (WOK 2), this will affect conversations about pedagogy, support services, course prerequisites, and so forth that would happen in a department meeting (WOK 1). Conversely, if the math faculty all agree that Calculus 1 isn't adequately preparing students for Calculus 2 based on classroom experiences (WOK 1), it might prompt a more formal analysis of grades and test scores (WOK 2).

Perhaps in a perfect world, assessment offices would operate within these two WOKs, with a large faculty support role to facilitate conversations and share knowledge (WOK 1), with a separate function to gather and warehouse data, do research, and connect with the wider research community to bring ideas back (WOK 2). It's not clear that universities would fund such an outfit, however. Assessment offices are expensive and only exist because of accreditation requirements, to get the reports done (WOK 3).

 

The Third Circle

Any bureaucracy draws from a kind of ideology, at least implicitly. Paperwork and procedure serve to "simplify, clean up and package reality into something easily consumable," in Mintz's formulation. Behind the paperwork is a purpose: the DMV's goal is safer roads. The EPA's is  a clean environment. The validation of WOK 3 uses a formalized classification of the world (e.g. driver's test) to assign cases to policy distinctions (driver's license granted or denied).
 
Accreditation requirements provide the motivation to run assessment operations, but they also impose a particular ideology. I described its origins and effects in "Assessment standards are broken." In short, the ideology can be abbreviated as "define-measure-improve" and is a version of "scientific management," descendants of Taylor and Drucker and others, of which Six Sigma is a variation. There is an intended overlap with WOK 2, almost subsuming the scientific WOK within the ideology, with the goal "we're going to require you to use science to improve the state of education."
 
Robert Birnbaum catalogs variations of this idea, calling the phenomenon 
 
a paradox of complexity and simplicity. Its central ideas may appear brilliantly original. Yet at the same time they are so commonsensical as to make us wonder why we had not thought of them ourselves, and so obviously reasonable as to defy disagreement. 
-- Management Fads in Higher Education, pg 5.
 
The result is that the internal validation of knowledge in WOK 3, which is done by trained peer review teams, assumes the preeminence of WOK 2 (scientific knowledge). 
 
Any new idea has to compete with existing ones, and faculty tradition was the natural enemy of the define-measure-improve protocol, despite the obvious overlap with what teachers do on the job, and what they want to accomplish. In the Sturm und Drang of 1983's A Nation at Risk, educators took a lot of heat (a theme in US politics). In higher education, teaching work had to be mapped from existing practices, seen as inefficient, to ones that aligned with the scientific management principles.
  • Defining learning

    • Old: choose textbooks, write syllabus, approve curriculum, create tests or other assessments

    • New: write statements of student learning objectives, often in a hierarchy of course, program, institution

  • Measurement

    • Old: grade tests, writing samples, performances, etc. Build a shared sense of acceptability via faculty consensus and constant exposure to students, assign summative course grades

    • New: use only a few approved methods, including specific assignments, papers associated with rubrics (no grades!). Learning is seen as distinct from more general student success. Emphasis on outputs instead of inputs.

  • Improvement

    • Old: Professional growth in teaching practice, department or institution level consensus on change (WOK 1), using data summaries like grade or test averages or pass rates to identify needs for improvement (WOK 2).

    • New: Averages or frequencies of approved data sources to find deficiencies, then imagine a way to remedy them
The requirements to write reports using the new methods lobotomized WOK 1 for faculty. To comply with the reporting requirements they had to start over using approved replacement methods in order to be allowed to "know" how their students were doing. Naturally they resented this. Hated it, even, and have produced a genre of articles complaining about assessment. We still deal with the effects.
 
It's worth repeating that what assessment directors say works best is WOK 1--what the scientific approach was intended to replace--and because that's what actually works, the accreditation reviews have relaxed over time to allow more room for WOK 1. Your situation depends on your accreditor, but it's still an awkward fit because of the need to perform the other rituals (defining and gathering data in the approved way) in order to validate WOK 1, which doesn't really need that extra work to function.
 
As I described in "Assessing for Student Success" the science project falls apart immediately, because what students learn in a college curriculum is a lot of detailed topics with interconnections. I estimated several hundred topics (SLOs if you like) in a math curriculum. These can't be described, let alone measured, within the parallel framework the accreditors created. 
 
It's worth noting that in the industrial setting, where these ideas were formed, it is possible to define and measure everything important, and have real-time data from instruments on an assembly line. That doesn't translate well to education, where there's little standardization.

The accreditation requirements strengthen the main thing that works (WOK 1) by creating more opportunities for faculty to talk about student learning, especially when facilitated by a good assessment director. But the requirements also diminish the effectiveness of faculty work by heaping on artificial requirements in the name of science. This poor attempt to mandate WOK 2 fails the validity checks within WOK 2: sample sizes are too small, too noisy, and the causal models used are too simple. 

This collision between science and policy gets resolved ad baculum: you will be beaten with the stick of non-compliance until you at least pretend to believe that rubrics are always valid measures. In short, the SLO accreditation requirements co-opt the authority of science, but replace scientific standards with ideologically-correct ones. This has created a self-sustaining culture of compliance, abetted by consultants, vendors, and peer review training that maintains this closed garden of bureaucracy as science.

 

A Litmus Test

A few years ago, I concluded that the best way to illustrate how accreditation standards drove us into an epistemological ditch was to spotlight their allergy to course grades. In the age of big data, the idea that we'd arbitrarily throw out millions of data points covering the whole history of students at our institution is absurd. It only makes sense within the accreditation bubble, and so it puts a spotlight on the difference between the scientific claims of accreditors versus the reality. To heighten the contradictions, so to speak. You can find my summary of research on grades and learning here. Starting on page 23 you'll find a list of standard objections to using course grades as data about student learning. 
 
I won't rehash that material here. It suffices to quote an article that cites Bloom (the taxonomy guy) from 1976, eight years before define-measure-improve kicked off:
 
Perhaps the most productive use of GPA is as a covariate. GPA has the potential to explain nearly half the variance in education research models (Bloom, 1976), thus shedding light on the variance explained by other variables of interest, such as changes in course or curriculum design.  
 
-- Bacon, D. R., & Bean, B. (2006). GPA in Research Studies: An Invaluable but Neglected Opportunity. Journal of Marketing Education, 28(1), 35–42.
 
 
When I have the opportunity, I ask accreditors what they think of course grades as data. This is a litmus test for self-reflection. So far they've all failed it. The wise ones don't want to come out too strongly against grades, because it implies they don't think transcripts are meaningful. They generally hedge, calling grades "indirect measures." But there's no test for directness in the vocabulary of WOK 2. You won't find a way to create p-values on directness in the manual on educational measurement, because the idea isn't statistical. We already have a robust vocabulary on reliability and validity, and there's no need to confuse the issue with "directness." 

When pushed on that point, one accreditor representative cited an anecdote about how a student didn't feel like the grade reflected learning. This would be ironic if the scientific claims of the define-measure-improve protocol were serious about the science. After dismissing faculty consensus as opinion, reciting "the plural of anecdote isn't data," and distributing buttons at conferences that read "show me the data," a high priest of the order can simply use an anecdote to dismiss the whole research record on course grades. The remarkable thing is that this doesn't seem to cause any cognitive dissonance. 

The point of this illustration is that the overlapping circles in the three WOKs don't presently stand the light of public exposure. The accreditors will look ridiculous. We need to fix it before that happens, to create a more sensible overlap of the WOKs.

There's More

This article is long enough, and I'm going to stop here. I have not discussed the effective use of WOK 2 (educational measurement and inferential statistics) as a successful assessment tool. That may be addressed in a future post. Suffice to say that that requirements of a good research project (e.g. large data sample, tests of reliability) aren't feasible in 99% of department-level accreditation reports. From a practical point of view, it's a lot of extra work to do a real research project to get a tiny amount of credit, if any at all (none at all if you research retention instead of learning). The irony is that WOK 3 assumes that it contains WOK 2, when in fact the intersection is nearly empty.