Friday, October 02, 2009

Friday Potpourri

This morning my cognitive cogs are engaged in so many projects it seems appropriate to dish up a small buffet of assessment-related items:

1. SES gets you a seat: An article in InsideHigherEd this morning tells that as access to higher ed boomed in the US, so did demand, so that competition naturally favored those best able to negotiate the barriers to admission. Guess who that was? This quote is attributed to the researcher, Sigal Alon:
[T]esting became a more important factor in admissions at more institutions, and that wealthier families are much speedier to adapt to changes in admissions rules.
This sounds like the "economic value of error" stuff I was prattling on about a few weeks back. Although I can't find the source article online, a kind of appendix is here. She reiterates what everyone paying attention knows:
Test scores thus seem largely redundant with high school grades (Crouse and Trusheim 1988). Test scores do have some predictive value for college success, but is this improvement worth the negative impacts of testing?
She goes on to advocate a more aggressive program than simply using noncognitives to better identify talent in lower SES groups. She's probably right--if only because the high SES groups will learn how to negotiate the new system of merit just as well as they did the previous one.

2. Rubics are bad for your brain: Well, maybe I exaggerate, but two articles posted to the ASSESS listserv provide cautionary advice for those constructing rubrics. On another level, there is (I find) a tendency to put too much faith in the things to begin with. Maybe that's because a rubric is tangible evidence of a plan. It looks like science--with its dissecting of some learning outcome into dimensions of quality and component. I won't reiterate my own annoyances with this quasi-religious belief here. You can find better and more constructive comments in the papers themselves:
3. NSSE Dimensions: At a recent IR meeting locally, I saw a very nice analysis of one institution's NSSE data from several years. The question being considered was whether or not the scorecard-type ratings that NSSE reports out are valid. You can see what I'm talking about in the 2008 annual report here. I clipped out a bit from page 8 to show some of the "dimensions" in question:

Here you can see that Level of Academic Challenge, Active and Collaborative Learning, and Student-Faculty Interaction are indices consolidated, summarized, and reported out as being meaningful. There is a great power to words, which gets abused routinely by the testing and assessment profession (like misuse of the word "measurement"). To be fair, it happens anywhere there's room for doubt--opportunity to advertise something more than what's there without being caught. We humans seem to prefer certainty over doubt, even if it guarantees we're wrong. But I digress.

The putative dimensions listed (Enriching Educational Experiences and Supportive Campus Environment are the other two, not shown above) are drawn from the items on the NSSE. In order to validate such things, one would test the construct validity using factor analysis. You can find lots of reports at NSSE referencing validity here. I will have to go through them later, to see what I can find in this vein.

The presentation at the IR meeting used confirmatory factor analysis (CFA), which posits that the dimensions are defined according to given rules, and then tries to verify that using the results at hand. Ideally, the items related to Academic Challenge would all show up as elements of a single vector when the correlation matrix's singular value decomposition is calculated. The actual results were not convincing.

I poked around to see if there were other studies out there, and found this one. Here, researchers at James Madison University and Eastern Mennonite University used similar methods and found similar results. They summarize their findings:
[A] CFA of the benchmarks as specified by NSSE literature produced poor model fit. For the sample of students examined for this study, the five-factor benchmark model supported by NSSE was not upheld, thus a five-factor solution is uninterpretable for the sample. A comparison of benchmark scores from this sample to scores from a sample at another university should not be made, as the benchmark scores from this sample were not empirically supported. Until further studies can provide evidence for an interpretable set of benchmarks for this and similar samples, one should not make policy or programmatic decisions based on the benchmark scores.
The whole article is worth reading. It's well-written, and there are interesting details about particular loadings.


  1. I think that two other articles of central relevance to the NSSE and its benchmarks are:

    LaNasa et al. 2007: The Construct Validity of Student Engagement, a CFA Approach


    Gordon et al. 2008: Validating NSSE Against Student Outcomes

    Also of interest is the "Driving Data Down" presentation, found on this page:

    On the fourth and fifth slides, the presenters present numbers on the within- and between-institution variance on the NSSE benchmarks. The difference between the University of Phoenix and Harvard accounts for about five percent of the variance in student engagement.

    I'm very wary of anything produced by IUB researchers on the NSSE - after all, many of them have made their careers on it, and its revenues support a stable of researchers. I wish more institutional users would report on what they've found with their NSSE results, but I doubt that many IR/assessment offices have the people, skill set and administrative blessing to do so. I'm definitely lacking the former and the latter, so all I can do is stew... and post comments on blogs.

  2. Matt, thanks for the links. I haven't delved into the NSSE validity as much as SAT and CLA. I think there is a unifying thread, though: people in charge want simple solutions to complicated, perhaps ill-defined or unsolvable problems. Face validity counts more than anything else in that circle, it seems to me.

  3. I agree completely (and with the content of your post from Tuesday - sorry, not sure how best to continue the thread). I think it's especially interesting in academia, where you see hard-nosed scientists-turned-administrators flipping out over statistically insignificant changes in fundamentally invalid measures.

    As I'm contemplating grad school, I'm mostly looking at statistics programs - but I'm also considering UCLA's Brain, Mind and Society interdisciplinary program, because I think we still have so much to learn about how humans' neurological limits affect the structure of our organizations (and induce absurd information-compression behaviors like you've discussed here).

  4. I guess one of my rules of thumb kicks in here: people do things that look crazy, but it makes sense to THEM why they're doing it. There's some kind of continuum from real meaning of learning at the classroom level (as real as it gets), increasingly abstracted and compressed as it goes down the university digestive chain to the epistemological vanishing point. At each bureaucratic stop along the way, there's pressure to make sense of it all so the agencies we report to are happy (as happy as they get). It's easy to rationalize away everything you know about statistics if you know the audience doesn't care about such niceties. The cynical view of what happens at the upper levels is a cargo-cult sort of thing (see R. Feynman's article on that), that has the appearance of meaning, but could very easily have little to do with the reality in the classroom. But I don't think anyone sets out to do that. I image this sort of thing happens in any big bureaucracy.

    It would be great if brain science caught up to this ambitious program. Good luck with the grad school choice!