Monday, February 08, 2010

Validity for Gen Ed Tests

I came across a relatively new page on validity studies on the Voluntary System of Accountability (VSA) site. The studies focus on the CAAP, CLA, and MAPP, which are all aimed at critical thinking and writing to one degree or another. There are several reports to be found there, including an executive summary. In Interpretation of "Findings of the Test Validity Study Conducted for the Voluntary System of Accountability," we find that the correlation between the tests in question, when compared at the school level ranged from .75 to . 87. Here, "School-level reliability refers to score consistency (i.e., a school receiving a similar mean score regardless of the sample of students taking the test)."

Correlations between test sections is show in the table below, from page 6 of the report.

This is presented as evidence of construct validity. I don't see any evidence that they did a confirmatory factor analysis to check the dimensions.

The researchers tried to tease out effect size by looking at the difference between freshmen and seniors, but had to jiggle the statistics to account for the fact that the seniors and freshmen were different people, with different characteristics (there was a difference in SAT average, for example). This is not surprising because of survivorship. The actual method for this renorming is not given in this report (maybe it's in one of the others), but there's a graph to illustrate that after fiddling, the seniors look better than freshmen. Except in math.

This finding is interesting to me as a math teacher. I find it to be plausible, since college freshmen would generally have just come out of a several continuous years of math instruction, while college seniors would not.

There's more in the report, including some cautions on page 11, that's worth reading.


  1. I find some of the correlations in table 6 to be implausibly high. Should not the upper limit on the correlation between different items be less than the correlation of the item with itself? How can the MAPP critical thinking portion be more correlated to the MAPP mathematics test than to itself (0.93 vs 0.95)? If this were a pattern that happened only once, I might be inclined to think it an aberration due strictly to random chance, but it happens quite a lot in the table in page 6.

    The correlation between the CLA PT and the MAPP writing is especially bizarre (#3 and #11). Here, the difference is more than 0.15! Perhaps there is something about the report that I am missing, but this is very concerning to me.

  2. The ones on the diagonal are reliability scores calculated by choosing different samples (randomly, I assume). If N was smallish to begin with, I suppose we'd see more variability in these. At any rate, it's a different calculation from the correlations.

  3. This is essentially a multi-trait multi-method matrix. The fact that the reliabilities are not consistently higher than the correlations with other measures is very worrisome, at least if one ascribes to Fisk and Campbell's definitions of construct validity.