I came across a relatively new page on validity studies on the Voluntary System of Accountability (VSA) site. The studies focus on the CAAP, CLA, and MAPP, which are all aimed at critical thinking and writing to one degree or another. There are several reports to be found there, including an executive summary. In Interpretation of "Findings of the Test Validity Study Conducted for the Voluntary System of Accountability," we find that the correlation between the tests in question, when compared at the school level ranged from .75 to . 87. Here, "School-level reliability refers to score consistency (i.e., a school receiving a similar mean score regardless of the sample of students taking the test)."
Correlations between test sections is show in the table below, from page 6 of the report.
This is presented as evidence of construct validity. I don't see any evidence that they did a confirmatory factor analysis to check the dimensions.
The researchers tried to tease out effect size by looking at the difference between freshmen and seniors, but had to jiggle the statistics to account for the fact that the seniors and freshmen were different people, with different characteristics (there was a difference in SAT average, for example). This is not surprising because of survivorship. The actual method for this renorming is not given in this report (maybe it's in one of the others), but there's a graph to illustrate that after fiddling, the seniors look better than freshmen. Except in math.
This finding is interesting to me as a math teacher. I find it to be plausible, since college freshmen would generally have just come out of a several continuous years of math instruction, while college seniors would not.
There's more in the report, including some cautions on page 11, that's worth reading.
Subscribe to:
Post Comments (Atom)
-
The student/faculty ratio, which represents on average how many students there are for each faculty member, is a common metric of educationa...
-
(A parable for academic workers and those who direct their activities) by David W. Kammler, Professor Mathematics Department Southern Illino...
-
The annual NACUBO report on tuition discounts was covered in Inside Higher Ed back in April, including a figure showing historical rates. (...
-
In the last article , I showed a numerical example of how to increase the accuracy of a test by splitting it in half and judging the sub-sco...
-
Introduction Stephen Jay Gould promoted the idea of non-overlaping magisteria , or ways of knowing the world that can be separated into mutu...
-
I'm scheduled to give a talk on grade statistics on Monday 10/26, reviewing the work in the lead article of JAIE's edition on grades...
-
Introduction Within the world of educational assessment, rubrics play a large role in the attempt to turn student learning into numbers. ...
-
"How much data do you have?" is an inevitable question for program-level data analysis. For example, assessment reports that attem...
-
Inside Higher Ed today has a piece on " The Rise of Edupunk ." I didn't find much new in the article, except that perhaps mai...
-
Introduction A few days ago , I listed problems with using rubric scores as data to understand learning. One of these problems is how to i...
I find some of the correlations in table 6 to be implausibly high. Should not the upper limit on the correlation between different items be less than the correlation of the item with itself? How can the MAPP critical thinking portion be more correlated to the MAPP mathematics test than to itself (0.93 vs 0.95)? If this were a pattern that happened only once, I might be inclined to think it an aberration due strictly to random chance, but it happens quite a lot in the table in page 6.
ReplyDeleteThe correlation between the CLA PT and the MAPP writing is especially bizarre (#3 and #11). Here, the difference is more than 0.15! Perhaps there is something about the report that I am missing, but this is very concerning to me.
The ones on the diagonal are reliability scores calculated by choosing different samples (randomly, I assume). If N was smallish to begin with, I suppose we'd see more variability in these. At any rate, it's a different calculation from the correlations.
ReplyDeleteThis is essentially a multi-trait multi-method matrix. The fact that the reliabilities are not consistently higher than the correlations with other measures is very worrisome, at least if one ascribes to Fisk and Campbell's definitions of construct validity.
ReplyDeletehttp://www.socialresearchmethods.net/kb/mtmmmat.php