Wednesday, December 23, 2015

Inter-Rater Facets

[Note: if the online app doesn't work, it's because my hourly limit for the month has been reached.]

The question of inter-rater agreement comes up when we have multiple assessments of the same subject using the same scale. For example, rubric scores of student work with multiple raters. The point of using more than one judge of the material is to find out the extent to which they agree. Unfortunately, the existing statistics for doing this aren't of much help in understanding what's going on with the rubric ratings. There are a variety of measures, including Cohen's Kappa, Fleiss's Kappa, and several others. These are designed to assess agreement between raters, and comparing the nominal agreement with what we might find if the raters were just making assignments randomly. This is called "chance correction." I find that a single index is not very helpful, and although there are conditional versions out there, they are not easy to find or use.

So I created one that I like, and have found very useful. The technical notes are in this paper, which is a pretty good draft at this point. I created an online app do to all the hard work, and put the code on Github. (Note: on 12/30 I renamed the repo to reflect the actual name).

Here's an example of the output. It shows a three-point scale, from 0 to 2, with comparisons between each response and each other response. The lower right graph is 1 versus 2, which shows how much agreement there was between those ratings. If a subject was rated with 1s and/or 2s, he or she would show up as part of this graph. The thin line is what we would expect from random assignments (given certain assumptions), and the darker line is actually one dot per observation. The vertical and horizontal components are perfect agreement--the subject got either all 1s or all 2s from all raters. The diagonal part is where the ratings are mixed, e.g. a 1 and a 2 or two of each.

The p-value is for a hypothesis test of the data against the null hypothesis of random assignments. Kappa is a computation very similar to Fleiss--it represents the available agreement beyond chance that we see here. So for the lower right graph, 45% of the possible 'extra' agreement is observed. Perfect agreement, with K=1, would look like an inverted L.

The program is designed to work with three types of data. Follow the on-screen guides to loading.You'll need it in CSV format, which is easy from Excel.

I started a wiki on the GitHub page.

Tuesday, July 29, 2014

OK Trends

If you're not familiar with the OKCupid blog, check out it out here. Christian Rudder slices and dices data from the dating site to try to reveal human nature. I find it great fun to follow along with his train of thought, which he presents engagingly and well-illustrated with graphics. The articles could serve as examples to students of what 'critical thinking' might be.

The report linked above is particularly interesting because it addresses ethical issues. If you look at the comments, you'll see a range of reactions from "that's cool" to "how dare you!", include a couple by putative psychology researchers who mention IRB processes. This comes on the heals of Facebook's research on manipulating attitudes, and the resulting media fiasco.

This is a looming problem for us in higher education, too. As an example, imagine a software application that tracks students on campus by using the wi-fi network's access points and connections to cellphones. This could be used to identify student behaviors that are predictive of academic performance and retention (e.g. class attendance, social activity). Whereas a manual roll-taking in class is an accepted method of monitoring student behavior, cellphone tracking crosses a line into creepy. The only way to proceed with such a project would be transparently, in my opinion, which could be done with an opt-in program. In such a program, students would be given a description and opportunity to sign up. In return, they receive information back, probably both the detailed information that is being gathered as well as summary reports on the project. I have been looking for examples of colleges taking this approach. If you know of one, please let me know!

See also: "okcupid is the new facebook? more on the politics of algorithmic manipulation" at

Sunday, July 27, 2014

Survey Prospector

(last updated 5/30/2015)

Survey Prospector is a web-based interface for quickly exploring discrete data. It is intended to support a "want-know-do" cycle of intelligent action. It allows you to quickly execute a predictor-finding workflow to try to find potential cause/effect relationships you care about.

  1. Normalize scalar or nominal data into bins or small numbers of categories if necessary. 
  2. Apply any filters of interest (e.g. just males or just females).
  3. Identify a target (dependent) variable and create a binary classification.
  4. List the independent variables in decreasing order of predictive power over the dependent variable, with graphs and suitable statistics automatically generated.
  5. Browse these top predictors to get a sense of what is important, including linear and non-linear relationships between pairs of them.
  6. Visually inspect correlational maps between the most important independent variables.
  7. Create multivariate predictors using combinations of the best individual predictors.
  8. Cross-validate the model by keeping some data back to test the predictor on.
  9. Assign modeled probabilities to cases, e.g. to predict attrition. 

This all happens in real time, so that this can be used in meetings to answer questions if you like. A common malady of IR offices is that it's much easier to ask questions than to answer them. This tool can be used to effectively prioritize research. It lends itself to data warehousing, where you might build a longitudinal history of student data across a spectrum of types. Then it becomes trivial to ask and answer questions on the fly like "what student non-cognitives predict good grades their first semester?" or "what's the effect of work-study on first year attrition?"

Here is the application: Survey Prospector v2-1-15 [Note: if the online app doesn't work, it's because my hourly limit for the month has been reached.]. A video demo can be found here, using this data set. If you need a primer on predictors and ROC curves, try this.

Here's a video tour using a 45M CIRP national data set. You can do something similar with a small sample by downloading these files:
  • CIRP1999sample.csv a random sample of 401 rows taken from the 38,844 in HERI's 1999 CIRP survey data set
  • CIRPvars.csv, an index to the items and responses

Technical details. If you want to try this on your own data, please don't upload anything with student identifiers or other sensitive information. The data should be in this format:
  • Data files need a header row with variable names.
  • Index files do not have a header. They are just a variable name, a comma, and the description without commas. Only one comma per line unless you want to put the description in quotes. Index files are optional, but they help decipher results later on. Download the example CIRPvars.csv listed above to see one.
I'm in the process of creating a real website for this project, but it's not finished yet.


This is the main tab, where the predictors are sorted and displayed in order of importance.

The graphs show the data distribution (with case types in blue or pink), a ROC curve for assessing predictor power, ratios with confidence intervals to assess statistical significance, and a table with case numbers and ratios in order to identify useful thresholds. 

The screen capture above shows a dynamic exploration of the correlations between best predictors (red lines are negative correlations). The sliders at the top allow for coarse- or fine-grain inspection of variables. This allows you to see how predictors cluster visually. In researching attrition, this technique easily allowed identification of major categories of risk evident in our data: academic, financial, social engagement, and psychological.  

Please leave feedback below or email me at

Monday, July 14, 2014

Finding and Using Predictors of Student Attrition

A while back, I wrote about finding meaning in data, and this has turned into a productive project. In this article I'll describe some findings on causes of student attrition, a conceptual framework of actions to prevent attrition, and an outline of the methods we use.

In order to find predictors of attrition, we need to find information that was gathered before the student left. A different approach is to ask the student why he or she is leaving during the withdrawal process, but I won't talk about that here. We use HERI's The Freshman Survey each fall and get a high response rate (the new students are all in a room together). Combining this with the kinds of information gathered during the admissions process gives several hundred individual pieces of information. These data rows are 'labeled' per student with the binary variables for attrition (first semester, second semester, and so on). In several years of data, and relying on two different liberal arts colleges, we get the same kinds of predictors of attrition:

  • Low social engagement
  • High financial need
  • Poor academics
  • Psychology that leads to attrition: initial intent to transfer, extreme homesickness, and so on.
These can occur in various combinations. A student who is financially stressed and working two jobs off campus may find it hard to keep up her grades. At the AIR Forum (an institutional research conference) this June, we saw a poster from another liberal arts college that identified the same four categories. They used different methods, and we have a meeting set with them to compare notes.
For purposes of forming actions, we assume that these predictive conditions are causal. There's no way to prove that without randomized experiments, which are impossible, and doing nothing is not an option. In order to match up putative causes with actions, we relied on Vincent Tinto's latest book Completing College: Rethinking Institutional Action, taking his categories of action and cross-indexing them with our causes. Then we annotated it with the existing and proposed actions we were considering. The table below shows that conceptual framework for action.
Each letter designates some action. The ones at the bottom are actions related to getting better information. An example of how this approach generates thoughtful action is given next.

Social Engagement may happen through students attending club meetings, having work-study, playing sports, taking a class at the fitness center, and so on. This can be hard to track. This led us to consider adopting a software product that would do two things: (1) help students more easily find social activities to engage with, and (2) help us better track participation. As it turned out, there was a company in town that does exactly this, called Check I'm Here. We had them over for a demo, and then I went to their place to chat with Reuben Pressman, the CEO and founder. I was very impressed with the vision and passion of Reuben and his team. You can click through the link to their web site for a full rundown of features, but here's a quote from Reuben:

The philosophy is based around a continuous process to Manage, Track, Assess, & Engage students & organizations. It flows a lot like the MVP idea behind the book "The Lean Startup" that talks about a process of trying something, seeing how it goes, getting feedback, and making it better, than starting over again. We think of our engagement philosophy the same way:
  • Manage -- Organize and structure your organizations, events, and access to the platform.
  • Track -- Collect data in real-time and verify students live with mobile devices
  • Assess -- Integrate newly collected data and structurally combine it with existing data to give real-time assessment of what works and doesn't and what kinds of students are involved
  • Engage -- Use your new information to make educated decisions and use our tools for web and mobile to attract students in new ways
  • Rinse, and Repeat for more success!
A blog post talking more about our tracking directly is here. We take a focus on Assessing Involvement, Increasing Engagement, Retaining Students, and Successfully Allocating Funding.
Currently, we can get card-swipe counts for our fitness center, because it's controlled for security reasons. An analysis of the data gives some indication (this is not definitive) that there is an effect present for students who use the fitness center more than those who don't. This manifests itself after about a year, for a bonus of three percentage points in retention. The ability to capture student attendance at club events, academic lectures, and so on with an easy portable card-swipe system like Check I'm Here is very attractive. It also helps these things happen--students can check an app on their phones to see what's coming up, and register their interest in participating.


I put this last, because not everyone wants to know about the statistics. At the AIR forum, I gave a talk on this general topic, which was recorded and is available through the organization. I think you might have to pay for access, though.

The problem of finding which variables matter among hundreds of potential ones is sometimes solved with  step-wise linear regression, but in my experience this is problematic. For one thing, it assumes that relationships are linear, when they might well not be. Suppose the students who leave are those with the lowest and the highest grades. That wouldn't show up in a linear model. I suppose you could cross multiply all the variables to get non-linear ones, but now you've got tens of thousands of variables instead of hundreds.

There are more sophisticated methods available now, like lasso, but they aren't attractive for what I want. As far as I can tell, they assume linearity too. Anyway, there's a very simple solution that doesn't assume anything. I began developing the software to quickly implement it two years ago, and you can see an early version here.

I've expanded that software to create a nice work flow that looks like this:
  1. Identify what you care about (e.g. retention, grades) and create a binary variable out of it per student
  2. Accumulate all the other variables we have at our disposal that might be predictors of the one we care about. I put these in a large CSV file that may have 500 columns
  3. Normalize scalar data to (usually) quartiles, and truncate nominal data (e.g. state abbreviations) by keeping only the most frequent ones and calling the rest 'other'. NAs can be included or not.
  4. Look at a map of correlates within these variables, to see if there is structure we'd expect (SAT should correlate with grades, for example)
  5. Run the univariate predictor algorithm against the one we care about, and rank these best to worst. This usually takes less than a minute to set up and run.
  6. Choose a few (1-6 or so) of the best predictors and see how they perform pairwise. This means taking them two at a time to see how much the predictive power improves when both are considered. 
  7. Take the best ones that seem independent and combine them in a linear model if that seems appropriate (the variables need to act like linear relationships). Cross-validate the model by generating it on half the data, testing against the other half, do this 100 times an plot the distribution of predictive power.
Once the data set is compiled, all of this takes no more that four or five minutes. A couple of sample ROC curves and AUC histograms are show below.

This predictor is a four-variable model for a largish liberal arts college I worked with. It predicts student retention based on grades, finances, social engagement, and intent to transfer (as asked by the entering freshman CIRP survey).

At the end of this process, we can have some confidence in what predictors are resilient (not merely accidental), and how they work in combination with each other. The idea for me is not to try to predict individual students, but to understand the causes in such a way that we can treat them systematically. The example with social engagement is one such instance. Ideally, the actions taken are natural ones that make sense and have a good chance of improving the campus. 

Saturday, July 12, 2014

A Cynical Argument for the Liberal Arts: Part Sixteen

Previously: Part Zero ... Part Fifteen

The "Cynical Business Award" goes to ReservationHop, which I read about in CBC News here. Quote:
A San Francisco startup that sells restaurant reservations it has made under assumed names is raising the ire of Silicon Valley critics.  
ReservationHop, as it's called, is a web app with a searchable database of reservations made at "top SF restaurants." The business model is summed up in the website's tagline: "We make reservations at the hottest restaurants in advance so you don't have to."  
Users can buy the reservations, starting at $5 apiece, and assume the fake identity ReservationHop used to book the table. "After payment, we'll give you the name to use when you arrive at the restaurant," the website says. 
 The "coin of the realm" in this case is the trust between diner and restaurant that is engaged when a reservation is placed. This is an informal social contract that assures the diner a table and assures the restaurant a customer. Sometimes these agreements are broken in either direction, but the system is valuable and ubiquitous. The ReservationHop model is to take advantage of the anonymity of phone calls to falsely gain the trust of the restaurant and essentially sell it at $5 a pop. This erodes trust, debasing the aforementioned coin of the realm. Maybe the long term goal of the company is to become the Ticketmaster of restaurant reservations.

One can imagine monetizing all the informal trust systems in society in this way. Here's a business model, free of charge: you know all those commercial parking lots that give you a time stamped ticket when you drive in? It's easy to subvert that with a little forethought. Imagine an app that you use when you are ready to leave. With it, you meet up with someone who is just entering the parking lot and exchange tickets with them. You get to leave by paying for almost no time in the lot. If they do the same, so can they, ad infinitum. Call it ParkingHop.

One can argue that these disruptive innovations lead to improvements. In both the cases above, the debasement of the trust-coin is due to anonymity, which nowadays can be easily fixed. The restaurant can just ask for cell-phone number to verify the customer by instead of a name, for example, and check it by calling the phone. This isn't perfect, but the generally the fixing of personal identity to actions creates more responsible acts. The widely-quoted faking of book reviews, for example, is greatly facilitated by paid-for "sock puppet" reviewers taking on many identities. So anonymity can be a power multiplier, the way money is in politics.  The natural "improvement," if we want to call it that, is better record keeping and personal registration of transactions. This is what the perennial struggle to get an intrusive "cybersecurity" law passed is all about (so kids can't download movies without paying for them), and the NSA's vacuuming up of all the data it can. We move from "trust," to "trust but verify," to "verify."

These are liberal artsy ideas about what it is to be human and what it is to be a society. The humanities are dangerous. How many millions have died because of religion or ideology? I've been wondering lately how we put that tension in the classroom. Imagine a history class with a real trigger warning: Don't take this class if you have a weak disposition. If you aren't genuinely terrified by the end of a class, I haven't done my job.

Thursday, June 19, 2014

A Cynical Argument for the Liberal Arts, Part Fifteen

Previously: Part Zero ... Part Fourteen

After being derailed last time by a quotation, I'll take up the question of constructive deconstruction: how can the corrosive truth-destroying effect of Cynical "debasing the coin of the realm" lead to improvements within a system? Note that a system (loosely defined: society, a company, a government) is the required  'realm' in which to have a 'coin.' My modern interpretation is official or conventional signalling within a group. Waving hello and smiling are social coins of the realm. Within organizations, a common type of signal is a measure of goal attainment, like quarterly sales numbers. It's this latter, more formal type of signal, that is our focus today.

A formalization of organizational decision-making might look like this (from a slide at my AIR Forum talk):

Along the top, we observe what’s going on in the world and encode it from reality-stuff into language. That’s the R → L arrow. This creates a description like “year over year, enrollment is up 5%,” which captures something we care about via a huge reduction in data. R → L is data compression with intent.

As we get smarter, we build up models of how the world behaves, and we reason inside our language domain about what future observations might look like with or without our intervention. We learn how to recruit students better under certain conditions. When we lay plans and then act on them, it is a translation from language back into the stuff of reality—we DO something. By acting, we contribute to the causes that influence the world. We don’t directly control the unfolding of time (red arrow), and all pertinent influences on enrollment are taken into account by the magical computer we call reality. Maybe our plans come to fruition, maybe not.

Underlying this diagram are the motivations for taking actions. Our intelligence, whether as an individuals or as an organization, is at the service of what we want. It's an interesting question as to whether one can even define motivation without a modicum of intelligence--just enough to transform the observable universe (including internal states) into degrees of satisfaction with the world. As animals, we depend on nerve signals to know when we are hungry or in pain. These are purely informational in form, since the signals can be interrupted. That is, there is no metaphysical "hunger" that is a fundamental property of the universe--it's simply an informational state that we observe and encode in a particular way. In a sense, it's arbitrary.

For organizations, analogs to hunger include many varieties of signals that are presumed to affect the health of the system. Financial and operational figures, for example. These are often bureaucratized, resulting in "best salesman of the quarter" awards and so forth. An essential part of the bureaucracy is the GOAL, which is an agreed-upon level of motivation achievement, measured by well-defined signals. An example from higher education is "Our goal for first year student enrollment is an increase of 6% over three years."

A new paper from the Harvard Business School "Goals gone wild," by Lisa D. Ordóñez, Maurice E. Schweitzer, Adam D. Galinsky, and Max H. Bazerman, provocatively challenges the notion that formal goals are always good for an organization. This itself is a cynical act (challenging established research by writing about it), but the paper itself is a great source of examples of how Cynical employees can react to a goal bureaucracy in two ways:

  • Publicly or privately subverting goals through actions that lead to real improvements in the organization, or 
  • Privately debasing the motivational signals that define goals for personal gain.
The first case is desirable when the goals of an organization, when taken too seriously, are harmful to it. Too much emphasis on short-term gains at the expense of long-term gains is an example. This positive reaction to bad goals is not the topic of the paper, but proceeds from our discussion here. The second case is ubiquitous and unremarkable: any form of cheating-like behavior that inflates one's nominal rank, reaping goals-rewards while not really helping the organization (or actually harming it).

Earlier, I referenced a AAC&U survey of employers that claimed they wanted more "critical thinkers." Employees who find clever ways to inflate their numbers is probably not what they have in mind. Before the Internet came around, I used to read a lot of programming magazines, including C Journal and Dr. Dobbs. One of those had a (possibly apocryphal) story about a team manager that decided to set high bug-fixing goals for the programmers. The more bugs they found and fixed, the higher they were ranked. Of course, being programmers, they were perfectly placed to create bugs too, and the new goals created an incentive for them to do just that: create a mistake, "find" it, fix it, get credit, repeat. This is an example of organizational "wire heading." 

From the Harvard paper's executive summary, we can see problems that an ethical Cynic can fix;
The use of goal setting can degrade employee performance, shift focus away from important but non-specified goals, harm interpersonal relationships, corrode organizational culture, and motivate risky and unethical behaviors.
Armed with the liberal-artsy knowledge of signals and their interception, use, and abuse, AND prepared with an ethical foundation that despises cheating, an employee has some immunity to the maladies described above. Of course, this depends on his or her position within the organization, but generally, Cynical sophistication allows the employee to see goals as what they really are--conventions that poorly approximate reality. This is the "big-picture" perspective CEO-types are always going on about. Moreover, an ethical Cynic who is in a position of power is less likely to misuse formal goal-setting as a naive management tool. 

The paper itself is engagingly written, and is worth reading in its entirety. With the introduction above, I think you'll see the potential positive (and negative) applications of Cynical acts. Informally, the advice is "don't take these signals and goals too seriously," which is a point the original Cynics repeatedly and dramatically made. More formally, we can think of a narrow focus on a small set of goals as a reduction in computational complexity, in which our intelligent decision making has to make do with a drastic simplification of the world. Sometimes that doesn't work very well, and the Cynics are the ones throwing the plucked chickens to prove it.

Next: Part Sixteen

Friday, June 06, 2014

A Cynical Argument for the Liberal Arts, Part Fourteen

Previously: Part Zero ... Part Thirteen

Because Cynical "debasing the coin of the realm" corrupts the way in which individuals or organizations are aware of the world, it may be hard to see how one gets beyond this destruction. Now I would like to speculate on how we can think of Cynicism as a constructive method.

Cynical attacks on categorical ways of knowing (I called them signals earlier) challenge the categories by turning something into its negation. A counterfeit is, and is not, a coin. When I started thinking about using a destructive method of constructing, the Sherlock Holmes quote came to mind. This led me down a rabbit hole that ironically illustrates the point. When I searched for the quote, I found the following attributed to Sir Arthur Conan Doyle:
Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.
I wanted to be sure, and it took some time to track down a reference. One site put these words in the mouth of Doyle's Sherlock Holmes in "The Adventure of the Beryl Coronet". However, this doesn't seem to be the case. After more work I found the complete works of Doyle in a single text file here. The word 'eliminate' is only used twice in the whole body of work. Here's the one we want, in the first chapter of The Sign of Four,  with Watson speaking first:
"How, then, did you deduce the telegram?"
"Why, of course I knew that you had not written a letter, since I sat opposite to you all morning. I see also in your open desk there that you have a sheet of stamps and a thick bundle of postcards.  What could you go into the post-office for, then, but to send a wire? Eliminate all other factors, and the one which remains must be the truth."
[Update: I subsequently found the actual quote by searching for 'improbable'. Leaving the rest of the post unchanged.]

Now it's possible that Doyle used the first quote outside of a novel, but the story becomes odder when we look at the record. Using Google's ngram server, I searched for "eliminate the impossible" and "eliminate all other factors." Here's the result:

Red line is "eliminate all other factors", blue is "eliminate the impossible"
Tantalizingly, this has the common quote coinciding with the publication date of The Sign of Four, according to this site. The oldest edition I could find was a book (not the original magazine, in other words) scanned from University of Michigan's library:

Its publication date is "MDCCCCI," or 1901. I tried adding the "you" in front of the first quote, and got no hits on the ngram server, but it may have a limit on how long phrases can be. After more searching I found a scanned copy of the original 1890 Lippincott's Monthly Magazine. It has the same text as the book shown above. This would seem to rule out a change in the language between the original magazine publication in 1890 and subsequent compilation in 1901.

Google Books results show that there are 19th century instances of "eliminate the impossible," which probably explain the eruption of the blue line in the graph before the (book) publication of The Sign of Four in 1901. I looked for other partial phrases too, like "no matter how improbable." These didn't turn up Doyle, but other treasures. From Doing Business in Paris:

 and this one:
The sentiments of both of these is that untruths can propagate wildly, given the right conditions, which may be what we're seeing here, and a good reason for Cynical weeding out.

Moving forward in time, I scanned 1900-1920 with Google Books, and found a magazine called Woman Citizen with this text from April 26, 1919.

The offhand rephrasing switches "Eliminate all other factors" with "Eliminate the impossible." Six years later, we find:

This has the "however improbable" drama that Doyle didn't put in the mouth of Holmes. The attributed quote finally appears in full form in 1949:

Not only is the hyperbolic "however improbable" in appearance, it's in italics for extra effect. The book seems to be an edited volume by various authors of Doyle/Holmes-related biography, anecdotes, and lit crit, although all I have to go by is the random samples that the stingy owner of copyright allows Google to produce. Worldcat coughed up only five copies, none of which are online. Let us allow the matter to come to rest here.

By eliminating possibilities, we come to the tentative conclusion that Doyle's famous quote is actually due to fan fiction, which can easily be seen as a Cynical debasement of the original. This example illustrates this by switching out Doyle's concise "Eliminate all other factors, and the one which remains must be the truth" with the nearly breathless "Eliminate the impossible, and whatever remains, however improbable, must be the truth." The fact that the latter one is much more well known, and attributed to Doyle is a clear debasement of the latter's words, and is a minor disturbance in the reality of anyone who believes the wrong version. The famous quote is, and is not, by Sherlock Holmes.

It takes a lot of work to separate signal from noise, which makes Cynics dangerous. Now what about that construction? That will have to wait until after the rabbit soup.

Next: Part Fifteen