Skip to main content

Outsourcing Prediction Models

Higher education can benefit from many sorts of predictors. One of the most common is modeling attrition: trying to figure out how to identify students who are going to leave before they actually do. Because you might be able to do something about it if you understand the problem, right?

Over the years, I've been underwhelmed by consultants who cater to higher education, and it seems to me that we get second-rate services.  I have seen only a few consultants that are worth the money, and I'll just mention one: SEM Works out of Greenville, NC. Jim Black seems to know everything there is about admissions, and I've seen direct benefits from his expertise.

There are various attractions to going outside for expertise. Maybe you want to make cuts, and need a hatchetman. Or maybe the decision makers are convinced that a "secret ingredient" is required, and the consulting firm has enough mystique to sell it. I don't believe much in secret ingredients. 


What's particularly galling is that the advanced methods of building predictive models come from universities. Then they get used at and Netflix and what have you, while our administrations still chug along on 50 year old technology. So we go out and buy consultants. Who also use 50 year old technology.

You may not be suspecting this twist, but I actually came across a solution for this problem. That's the whole reason I'm writing this post. I'll just let free market talk now, from

So instead of contracting up front for who-knows what sort of analysis, you can actually bid out the work and get a solution before you pay! Of course, you have to think hard about what data to submit, and spend some time cleaning it and prepping for such a submission. But judging from the rates on the site, it would be a lot cheaper, with an almost guaranteed better result, than buying a big name consultant. I'll leave you with this quote from their About page:
The motivation behind Kaggle is simple: most organizations don't have access to the advanced machine learning and statistical techniques that would allow them to extract maximum value from their data. Meanwhile, data scientists crave real-world data to develop and refine their techniques. Kaggle corrects this mismatch by offering companies a cost-effective way to harness the 'cognitive surplus' of the world's best data scientists.
Update: Also see "Not so Expert" in The Economist.


  1. I admire the idea of your secret ingredients of your article here. I also like the video you used as an example of your post.


Post a Comment

Popular posts from this blog

Course Grades as Data

Introduction To borrow a phrase from John Barth, within assessment circles there is a pernicious enthymeme that grades don't matter. Course grades are discounted outright as useful data about learning, or they are relegated to the purgatory of "indirect evidence." This ban is one of the data purity rules that also excludes surveys and really anything that is not:

tied to a specific piece of students work, andclassified by grading, rubric, etc. in an approved manner.  This is a standardized testing approach that retains only part of the standardization: (1) common point-in-time student work, and (2) a similar-looking rating method. I say "similar-looking," because a real standardized approach would evaluate reliability to ensure that the ratings have some statistical stability. This is rarely done in assessment practice. If it were done, it would reveal that the reliability is quite low most of the time.
There are standard rhetorical objections to using grades, …

Thinking in Predictors

There are some ideas that help us see the world more clearly. For example, the idea that a flipped coin has no memory of past events helps us realize that a recent string of heads does not increase the likelihood of tails (if the coin-flipping is done fairly). Thinking in predictors is one of those ideas that can make complex problems a little easier to think about.

This post is an introduction to how to visualize predictive power in the simplest case, where the outcome we want to know about only has two values. Examples include predicting first year student retention, student graduation, and applicants enrolling after being admitted. Having only two outcomes means there are only two possible predictions in a particular case--we predict that the outcome is one or the other of the two possibilities.

Let's suppose that you want to predict which of the new first-year students will return for their second fall. Suppose for a moment that I have a crystal ball and know the answer for ea…


This post continues the previous topic of thinking in predictors. I showed a measure of predictiveness, called AUC, which can be thought of as the probability of correctly guessing which of two different outcomes is which.Here I'll show how to visualize predictive power and relate the graph to that probability.

Suppose we want to predict which students will do poorly academically in a gateway course, where we define that by the student receiving C, D, or F. If we can predict that outcome far enough in advance, maybe we can do something to help. Note that this is a binary outcome (i.e. binominal), meaning there are only two possibilities.

The predictors we might consider are class attendance, use of tutoring services, prior grades, etc. I'll use the first two of these as illustrations. For example, back to the cups of the prior post, suppose one cup has the ID of a student who earned a C, D, or F, and the other cup has the ID of a student who earned an A or B. You don't kn…