Monday, June 08, 2009

Noncognitives as Indicators of Success

As Institutional Research Director at Coker College some years ago, I was involved in creating a predicted GPA matrix, which was used to set merit awards in our financial aid leveraging process. I noticed that the amount of variance explained by the traditional measures of high school GPA and SAT (or ACT equivalent) was not great--perhaps 30% in the linear model using both variables. I tried using other data at hand, and found that these two variables were about the best usable ones at our disposal. Unusable ones included sex and income. This last was troubling because it was clear that students from a low socio-economic group got lower grades and standardized test scores on average, and hence received less merit aid. I began to wonder if it could really be true that these students were less able to do college work. Clearly some of them could do quite well--even at the very bottom of our acceptance pool, rated by predicted GPA, about half the students would do quite well academically. This was the genesis of the idea to try to find ways to identify the actual potential of these poorly-rated students. There are two advantages to this idea. The first is that it's fairer to students--rather than the usual (implicit) practice of adding merit aid on top of high family incomes, a closer inspection can even out the aid (and accepted applications, for that matter). Secondly, it's beneficial to the institution. This is because high GPA, high SAT students are competed for among many institutions--they are visibly "high potential" applicants, despite the 70% of variance in actual performance that is unexplained by these predictors. By developing better predictors, it would be possible to partially avoid the bidding-war that drives up discount rates and puts the cost of running a university disproportionally on low-predicted students (who, as we've noted are on average of lower socio-economic status to begin with).

It's somewhat curious that with this win-win advantage to better predictors that there is not wider acknowledgment that the usual cognitive predictors (GPA, SAT, class rank, etc.) are not sufficient. It's doubly odd since the College Board itself makes no secret of the limitations of the SAT, for example. This applies in general as well as to particular demographics. From the 2008 College Board Research Report Differential Validity and Prediction of the SAT: "The results for race/ethnicity show that for the individual SAT sections, the SAT is most predictive for white students, with correlations ranging from 0.46 to 0.51. The SAT appears less predictive for underrepresented groups, in general, with correlations ranging from 0.40 to 0.46." Those correlations are really only useful as an adjunct to high school grades (and perhaps rank), so one has to take them with a grain of salt. Even so, squaring these to get r2 yields 20% or so of the variance in first year college grades. This is hardly enough to justify the confidence that the general public and even higher education administrators seem to have in the test. I imagine that college ratings systems like U.S. News accentuate this distortion.

We have therefore established that there is a great potential for indicators other than the usual cognitive ones in order to chip away at the 70% or so unknown variance in predictive power. Why noncognitives?

Recently results from an unscientific but interesting faculty poll were announced in a staff meeting at my university. The faculty members had been asked what qualities they would like to see in students. As the deans read their lists, I categorized them for my own amusement and found that two thirds of the traits listed were things like "works hard," "is interested in the subject," and "takes studies seriously." In other words, descriptions of noncognitives--attitudes and behaviors other than grades and test scores. It had been long clear to me by that point that we needed better insight into those qualities of our students, but I was surprised to see how much research had been done on it already.

Even the big standardized test companies are becoming interested in noncognivites. In 2008 ETS announced that it would begin testing the use of noncognitives for augmenting the GRE with a "Personal Potential Index". In this case, this means using standardized letters of recommendation from advisors and professors. The College Board has funded a project (GRANT) at Michigan State University to study noncognitives.

Below are a few selected sources from the literature on noncognitives.

The University of Chicago Chronicle, Jan. 8, 2004, Vol. 23 No 7:

Heckman, the Henry Schultz Distinguished Service Professor in Economics and one of the world’s leading figures in the study of human capital policy, has found that programs that encourage non-cognitive skills effectively promote long-term success for participants.

Although current policies, particularly those related to school reform, put heavy emphasis on test scores, practical experience and academic research show that non-cognitive skills also lead to achievement.
"Numerous instances can be cited of people with high IQs who fail to achieve success in life because they lacked self-discipline and of people with low IQs who succeeded by virtue of persistence, reliability and self-discipline," Heckman writes in the forthcoming book, Inequality in America: What Role for Human Capital Policies? (Massachusetts Institute of Technology Press), which he co-authored with Alan Krueger.
Noncognitive predictors of academic performance: Going beyond the traditional measures, by Susan DeAngelis in Journal of Allied Health, Spring 2003.
The purpose of this study was to provide an initial investigation into the potential use of the PSI as a noncognitive predictive measure of academic success. As programs continue to experience demands by allied health professions for their graduates, admissions committees should employ highly predictive and valid criteria to select the most qualified applicants. Although it is impossible to select only candidates who are ultimately successful, admissions decisions must be based on the best data available to reduce the risk of attrition and to increase the number of newly licensed graduates entering the profession. The preliminary findings indicate that the PSI moderately enhanced the predictive capacity of the traditional cognitive measures of entering GPA and ACT score.
Probably the best source for the practical use of noncognitives in higher education, and a rich source of literature on the topic, is William E. Sedlacek's Beyond the Big Test: Noncognitive Assessment in Higher Education. The book describes itself thus:

William E. Sedlacek--one of the nation's leading authorities on the topic of noncognitive assessment--challenges the use of the SAT and other standardized tests as the sole assessment tool for college and university admissions. In Beyond the Big Test, Sedlacek presents a noncognitive assessment method that can be used in concert with the standardized tests. This assessment measures what students know by evaluating what they can do and how they deal with a wide range of problems in different contexts. Beyond the Big Test is filled with examples of assessment tools and illustrative case studies that clearly show how educators have used this innovative method to:

• Select a class diverse on dimensions of race, gender, and culture in a practical, legal, and ethical way

• Teach a diverse class employing techniques that reach all students

• Counsel and advise students in ways that consider their culture, race, and gender

• Award financial aid to students with potential who do not necessarily have the highest grades and test scores

• Assess the readiness of an institution to educate and provide services for a diverse student body
Sedlacek identifies eight dimensions of interest:

1. Positive self-concept
2. Realistic self-appraisal
3. Successfully handling the system
4. Preference for long-term goals
5. Availability of strong support person
6. Leadership experience
7. Community involvement
8. Knowledge acquired in a field

He also gives research findings and processes for reviewing applications, for example, to identify and rate these traits.

Research like the sources cited above show promise in the use of noncognitive predictors. And there is clearly a need for better predictors of student success. Nevertheless, we must be careful not to underestimate the difficulties or imagine that "noncognitives" are a panacea. Dr. Sedlacek's work and others indicates to modest gains in predictive power based on surveys and other methods of gather information about attitudes and behaviors. In his paper with Julie R. Ancis "Predicting the Academic Achievement of Female Students Using the SAT and Noncognitive Variables," the noncognitive instrument only predicted a handful of percentage points in the variance of GPA, a result that has been indicated by others as well.

Additional complications are introduced if the methods of assessing noncognitives are standardized and published. Even tests of cognitive skills are gamed, as with SAT preparation tests that teach not just content, but also test-taking strategies. It's easy to imagine how much easier it would be for an applicant to a competitive institution to "game" a formalized noncognitive survey. This imposes constraints on how we can approach the problem.

Within these limitations, there is still opportunity to develop the use of noncognitives in useful ways. Obviously, applying such indicators in an admissions setting must be done carefully. But the usefulness of knowledge about what traits lead to success is not limited to admissions. Student intervention, orientation, and even the curriculum itself can benefit from consideration of skills that are not traditionally academic. The AAC&U's LEAP initiative, for example, lists noncognitives in their recommendations for general education (teamwork and civic engagement, for example).

In summary: there is motivation to find new predictors of academic performance, and there are indications that noncognitives are promising. There are, however, obstacles that can be best met with a collaborative, wide-ranging effort that focuses on practical effect.


  1. Indeed. Every time the Chronicle posts a note about some institution permitting students to submit SAT subject tests instead of the SAT I, the first comment is invariably someone spewing snark about how the college is letting go of its academic standards to get more paying students. But I've also tried to predict GPA with the SAT, and it is frankly pitiful.

    Of course, I also had a slew of noncognitives available to me - NSSE and BCSSE items. They were of very little help.

    Part of the problem with predicting GPA is that GPA is a horrible measure. The distribution of grades at the course level varies madly from course to course, yet we still try to treat GPA like an unweighted mean. Not a new idea - Goldman and Slaughter 1976, in the Journal of Educational Psychology.

    Furthermore, the distribution of GPAs is censored at 4, which potentially distorts the coefficients of the predictors.

    I'm increasingly convinced that the correct way to estimate GPA is to get grade distributions at the course level, estimate a student's grade on that distribution, and then extract GPAs from various possible course schedules. Then you could even get major-specific: students might not do well in many majors, but we really only care how they'll do in the one they choose.

  2. @Matt -- Your observation about GPA being a horrible measure is probably the heart of the matter (I call it statistical 'goo'). I've had better success with predicting retention (with logistic regression) using some of the variables on the CIRP in conjunction with, say, out-of-pocket costs.

    On the topic of assessment it's amazing to me that the established position is that grades are statistical goo (okay, so far), but averaged ratings from rubrics are somehow better.

  3. That's a very good point, although at least that can be remedied by high inter-rater reliability. I feel like that (experts agreeing on an evalluation) is about as good as assessment will get. Rather depressing.

  4. I think the usefulness of the rubric gets lost in the shuffle somehow. The fact that the desired outcome is stated and agreed upon all by itself is a big step. Then if you get decent inter-rater reliability, that's super. The part I don't get is why then (many times) these ratings are mashed together in some kind of unholy matrimony (usually by averaging) to try to say something about more general things like "effective writing" or "critical thinking." That's where the logic breaks down for me.