## Monday, February 05, 2007

### Predicting GPA

Every year we get an IR request to find better predictors for student success. This comes at me from three angles. One is the Standards Committee, which periodically reviews the cut-off values for admission and provisional admission. The other two interested parties are Admissions and the Budget Committee. We could improve the student body's entering statistics quite rapidly by eliminating the bottom half of the entering class, but that would cause heartburn in the Budget Committee when the calculations for revenue are done. Balancing these two isn't easy.

A large part of the problem is figuring out how to identify the students who are likely to fail. We can then either not admit them, or give them the help they need to succeed. Traditionally we have used composite SAT and high school GPA for this purpose. A linear regression model is used to predict GPA at the College. As an experiment I tried something new this year. In the Fall we gave 'Assessment Day' surveys to over a third of the traditional students. There were 100 questions, many of which pertain to students who are in attendance. Some of the questions could be answered by applicants, though, if we chose to do that. (For example: has either of your parents graduated from college?). I identified these and did a step-wise regression to see if any of them could add information to SAT and HSGPA in predicting grades. I found two. Both are surprising.

The two items are:
• I work harder on homework than most students.
• Family commitments often make studying difficult.
The responses were on a 1-5 Likert scale from Strongly Agree to Strongly Disagree. What's really interesting about these items is that they were significantly correlated (p < .03), and the effect sizes were as great as SAT. Strangely, both of the coefficients were the reverse of what I would have guessed.

The students who reported working harder on homework tended to have lower grades, and those who cited family commitments tended to have higher grades.

This is all experimental at present, but we're exploring the idea of using questions like these on an application form, so that we can better identify matches for the institution's goals. Without such additional information, our models can explain only about 44% of the variance in college grades. This is not bad--we can eliminate 56% of the 'bad' students at a cost of losing only 5% of the 'good' students. There are ways to improve this using demographic characteristics, but that has always seemed unethical to me, due to biases built into the SAT, for example.