Tuesday, November 11, 2008

A Nifty Trick for Mining Either/Or Data

It's often the case that I am interested in generating some data about an either/or condition like student retention. Either the student re-enrolled or did not. You might want to use a pivot table to compare retention rates for male/female, athlete/non-athlete, or by GPA or financial aid award amount. I find myself commonly with two tables of student IDs: those who were here last year, and those who are here now. I should be able to compare those to see who didn't come back, right? We can do that with an appropriate join, and a little magic.

Notice that the join properties are set so that all students from Fall 2007 are included. Data for FAll 2008 will show up as blank for those students.

Now here's the magic: create a field in the query like

Retain: sgn(nz([ID_Nbr],0))


Here, the student ID [ID_Nbr] is from the Fall 2008 set--it may be a null. But the nz() function is set to convert null values to zero. Then it gets passed to the sgn() function, which returns a numerical sign (-1,0,1) of the number. In effect, this takes students who were retained and assigns a one, and students who attrited and assigns a zero. This is exactly what we want. I have included a few other fields of interest in the query, like student gender.

Now run the query as a pivot table. Include our new Retain field in the data section, and set its aggregation property to average. This will average in a one for each retained student in the category, and a zero for the others--exactly the same as a percentage of retained students!

So with a single query, viewed as a pivot table, we're able to compute percentages of retained students based on whatever variables we have at hand.

You can create either/or variables quite easily. For example, suppose you want to know the percentage of students with GPA >2.5 who took ENG 101. Create a field based on overall GPA, and use something like:

GPAQuality: int([overallGPA]/2.5)


This will round down anything less than 2.5 to a zero, and anything bigger (well, up to 5) to a 1. Setting this field to average in the data part of a pivot table will compute the percentages for you.

No comments:

Post a Comment