Tuesday, November 19, 2013

Creating a Competition Analysis Chart from IPEDS and Clearinghouse Data

The National Student Clearinghouse provides information about applicants to your college that ended up enrolling somewhere else. By combining this with admissions and financial aid data that you have (e.g. high school grades, standardized test score, home state), and with public IPEDS data on institutional characteristics, you can graph the average characteristics of competitor institutions. It's not perfect--the IPEDS data is a couple of years out of date. Nevertheless, it's insightful.

There are many types of reports you can generate using this combination of data, but I'll focus on one that resembles a supply/demand relationship between the quality of student and the quality of the the attended institution. An example is shown below.

Here, the students are split into three categories: those who attended your institution (horizontal black lines), those who chose a private college other than yours (green), and those who went to a public institution. The levels under the circles indicate the academic preparedness as determined by the admissions process (usually a logistic regression model on HSGPA and standardized test scores). I've left the scale off of the graphs so as not to give away all our secrets, but the arrangement of dots in the scatterplot is real.  It shows that students generally sort themselves out pretty well, matching their academic ability (proxied by HSGPA) to the quality of the institution (proxied by graduation rate). Moreover, there is a pretty consistent gap in institutional quality that at least partially justifies the price difference (you can also create a scatterplot like this using graduation rate versus tuition). 

In my actual plot, I have a horizontal line that shows my college in relationship to these. Instead of showing you that, I've drawn three possible ones called case 1 through case 3. If your college's line looks like Case 1, you're not competing on quality because you're losing students to lower quality institutions. In Case 3, the opposite is true, where quality may be an important factor--definitely something to follow up with, e.g. the Admitted Student Questionnaire from the College Board. The middle line (Case 2) leads to a guess that your college is competing on quality with other privates and on something else (like price) for the mid-to-lower tier student, but that for the best qualified students, quality is still an issue. 

None of these conclusions is demonstrated here to be causal, of course. This is a starting point, not an ending point for analysis. But one of the advantages of this type of graph is that it's easily communicated to stakeholder, and generates the right kinds of questions about admissions requirements, marketing, and financial aid.

If you want to try this at home, I've already downloaded and combined some of the data you need. Here's a list:
  • [download] Selected IPEDS categories from the most recent years (2010,11), with the OPE institutional ID included, so you can easily connect to your Clearinghouse data
  • [download]A rudimentary data dictionary. I renamed columns to be more intuititve than the IPEDS ones.
  • [download] A Perl script for joining tables without using a database. You'll need a little experience with Perl (you have to install a module or two), but it's a real time-saver.
Tips:
  1. Remove cross-enrolled high school/college credit by filtering on the enroll date, so you don't accidentally pick up a community college a student earned credit in while still in high school.
  2. Remove duplicates and generally give the data a careful look as you would for any such project.
  3. I recommend Tableau as a reporting tool. It's expensive but worth it.
Feel free to contact me with questions, or leave a note below.

Edit: As a bonus, here's a scatterplot of just the IPEDS data, comparing the 75th percentile of SATR and SATM combined versus graduation rate (click to see a bigger version).