Tuesday, November 19, 2013

Creating a Competition Analysis Chart from IPEDS and Clearinghouse Data

The National Student Clearinghouse provides information about applicants to your college that ended up enrolling somewhere else. By combining this with admissions and financial aid data that you have (e.g. high school grades, standardized test score, home state), and with public IPEDS data on institutional characteristics, you can graph the average characteristics of competitor institutions. It's not perfect--the IPEDS data is a couple of years out of date. Nevertheless, it's insightful.

There are many types of reports you can generate using this combination of data, but I'll focus on one that resembles a supply/demand relationship between the quality of student and the quality of the the attended institution. An example is shown below.

Here, the students are split into three categories: those who attended your institution (horizontal black lines), those who chose a private college other than yours (green), and those who went to a public institution. The levels under the circles indicate the academic preparedness as determined by the admissions process (usually a logistic regression model on HSGPA and standardized test scores). I've left the scale off of the graphs so as not to give away all our secrets, but the arrangement of dots in the scatterplot is real.  It shows that students generally sort themselves out pretty well, matching their academic ability (proxied by HSGPA) to the quality of the institution (proxied by graduation rate). Moreover, there is a pretty consistent gap in institutional quality that at least partially justifies the price difference (you can also create a scatterplot like this using graduation rate versus tuition). 

In my actual plot, I have a horizontal line that shows my college in relationship to these. Instead of showing you that, I've drawn three possible ones called case 1 through case 3. If your college's line looks like Case 1, you're not competing on quality because you're losing students to lower quality institutions. In Case 3, the opposite is true, where quality may be an important factor--definitely something to follow up with, e.g. the Admitted Student Questionnaire from the College Board. The middle line (Case 2) leads to a guess that your college is competing on quality with other privates and on something else (like price) for the mid-to-lower tier student, but that for the best qualified students, quality is still an issue. 

None of these conclusions is demonstrated here to be causal, of course. This is a starting point, not an ending point for analysis. But one of the advantages of this type of graph is that it's easily communicated to stakeholder, and generates the right kinds of questions about admissions requirements, marketing, and financial aid.

If you want to try this at home, I've already downloaded and combined some of the data you need. Here's a list:
  • [download] Selected IPEDS categories from the most recent years (2010,11), with the OPE institutional ID included, so you can easily connect to your Clearinghouse data
  • [download]A rudimentary data dictionary. I renamed columns to be more intuititve than the IPEDS ones.
  • [download] A Perl script for joining tables without using a database. You'll need a little experience with Perl (you have to install a module or two), but it's a real time-saver.
Tips:
  1. Remove cross-enrolled high school/college credit by filtering on the enroll date, so you don't accidentally pick up a community college a student earned credit in while still in high school.
  2. Remove duplicates and generally give the data a careful look as you would for any such project.
  3. I recommend Tableau as a reporting tool. It's expensive but worth it.
Feel free to contact me with questions, or leave a note below.

Edit: As a bonus, here's a scatterplot of just the IPEDS data, comparing the 75th percentile of SATR and SATM combined versus graduation rate (click to see a bigger version).

Sunday, October 27, 2013

Assessment Institute 2013

I'm on the schedule for two presentations this week in Indianapolis at the 2013 Assessment Institute. I've included titles and descriptions below, with links to the slides. The slides will probably change some before it's over.










[link to slides]



[Link to slides]

Saturday, May 11, 2013

Curiosity is the Engine of Achievement

The title is a quote from a Ken Robinson Education TED talk. Another is "Teaching is not a delivery system." It's worth a listen:



One quibble. He says that the purpose of education is learning. I know that's obvious, but it's also easily misplaced, because it leads to the business of measuring learning. The truth is that nobody really knows what "learning" is, and it's probably not one thing at all. The simple-minded view that a teacher shows you how to perform a task, and then you can do it yourself is ubiquitous, but dangerously incomplete. Going back to the title of the post, the real purpose of education is to produce achievement. But schools almost uniformly get a pass on this--students aren't expected to achieve anything real, just pass tests. Ironically perhaps, this is absolutely not true in athletics, where it's not enough to play a good practice game. Nobody really cares about that.

The effect of this shift in perspective is subtle but powerful. If we want achievement, we should cultivate curiosity and intrinsic motivation, and this is a completely different pedagogy than lecture-test-certify. The thesis of an ongoing research project at my college is that the Internet allows the kind of direct engagement with the world that makes this achievement possible in all areas of study. Others are moving in that direction too. You can see an example in Auburn University's Quality Enhancement Plan. They don't take the idea as far as we are trying to, but it's a very nice project, and I love the tag line: Learn it. Live it. Share it.

Sunday, February 24, 2013

Interfaces and Education

In my last article, I used a cartoon model of intelligence to examine different aspects of whatever that thing is we call critical thinking. The usefulness of the schematic goes well beyond that exercise, however. Specifically, there's the fascinating idea of a "unit of usefulness" often called an interface. It's worthwhile examining how it works in the context of education.

An interface allows a trip to be made all the way around the diagram, which I'll reproduce below.

An elevator is a good example. We start with a motivation:
  1. We want to be on a different floor of the building
  2. We observe the elevator controls, which are carefully constructed so as to be unambiguous, and clearly map reality to a specialized language.
  3. Our internal model (i.e. ontology) of the physical arrangements allow for a simple calculation to map what we want into the language of the elevator's button scheme, so that we can properly predict what's about to happen, and
  4. Act accordingly by pushing a button.
  5. Finally, we observe that our motivation is satisfied by arriving on the right floor, designated with language that lets us know that has happened.
This process is the main function of intelligence (predicting what actions will satisfy motivations), and we do this all the time with and without interfaces. The difference is that an interface makes it transparently easy, by means of a carefully designed language and apparatus for physical change that are congruent. 

Technology Produces Interfaces at an astonishing rate these days. The miraculous devices we carry around are obvious examples, but there's another influence that may not be as apparent: the evolution of societies into machine-like systems creates interfaces too. The Department of Motor Vehicles is a sometimes-reviled interface with government bureaucracy. It's unpleasantness isn't merely from the long wait, but from being treated like a pile of documents rather than a human being. But it's ubiquitous. The person working the check-out line at the grocery store is an interface too, and we can choose to limit our interaction to that mode of operation entirely, rather than acknowledging that this is a human being. The opening sequence of Sean of the Dead portrays this zombie-like element of our lives, even before the inevitable infections begin. It is typified by automatic behavior that characterize a problem already solved.

This is the result of any interface too--no matter how complex an airliner is, to the passenger it's primarily a way to get from one place to another. Everything is standardized. It may be stressful, but that's not because you don't know what the plan is. Much of modern life is developing facility with standard interfaces--like driving a car or using a phone, and the pace of technology creates stress because we can't keep up with all of the new ones. Perhaps that's why Apple products are so popular--they make the interfaces easy to use.

So much of our internal ontology--the way we understand the world--is now tied up in technological and social interfaces that would be very foreign to our forebears. 

Interfaces in Education abound. The whole bureaucracy is designed to partition reality into neat categories of "nominal reality." Unfortunately for this endeavor, humans are not very good subjects for this, and it leads to a lot of mischief. Take, for example, the "inoculation" folk theory of education that crops up continually in the academy. Students take a class on composition, and then are assumed to be able to write. Any subsequent problems with writing point fingers back to that class, which did not fulfill its role. The assumption is that Comp 101 is a reliable factory-like interface that takes raw material and produces good writers. This is a poor reflection of reality because writing is a tremendously complex endeavor that is more akin to kindling a flame than filling a vessel, to employ the ancient metaphor.

We parse learning into boxes called courses and majors and learning outcomes, and institutions certify these with their stamps of approval--theoretically providing an interface for consumers of the product. Of late, the advertised quality of that product (and the cost of producing it) has come into question, perhaps most infamously in Academically Adrift.

We could spend a lot of time dissecting why the interface model fails. Many of the articles I've written in this blog concern how assessments (particularly standardized tests) can create nominal realities that apparently create interfaces, but fail to reflect reality. The result is optimizing only the appearance of satisfying motivations, which is the central idea behind self-limiting intelligence.

Rather than rehashing the problems with the Reality-Language (i.e. measurement) detail of educational assessment, however, I'd like to comment on the expectations that students have. Particularly those who have come from a test-heavy public education, it seems (anecdotally) that they expect a college course to be a clearly-delineated interface, similar to the check-out counter. For example, they don't seem to take to open-ended problems naturally. An interface-centric attitude results in the following expectation: "show me exactly what I need to do to get an A," as if education were an algorithm. It's very easy to teach math courses that way (at least until creativity is required in later courses), but I think it does students a disservice. For example, they can learn how to take derivatives of functions without having any real intuition about what that means. This is well documented in an ongoing research project based on the Calculus Concept Inventory, which seeks to assess conceptual understanding. A quote from one paper (source):
the first term we did TEAL on term, the over all course evaluation was terrible, the lowest of any course I have been associated with at MIT, but I can plausibly argue that that term the students learned twice as much as under the lecture system, using assessment based on Hake normalized gains.
TEAL stands for Technologically Enhanced Active Learning, which de-emphasizes lectures in favor of more active approaches. The comment implies that it worked, but students didn't like it. I presume the reason is that emphasis on active learning of concepts is open-ended and less interface-like that what others have called the 'rent model': if I sit in class long enough, you pass me.

There's a reason why it took our species a couple million years to come up with calculus, and it's not because it's complex. It's NOT complex--anyone can learn the power law in a few minutes. It's the conceptual subtlety of thought and the precise language that expresses it that is the real value of the subject. And it's not just calculus, of course.

The parts of our environment that we can control with interfaces is the easy part--that's what interfaces do. Higher education (especially the liberal arts) should not just be a catalog of new interfaces to learn, but should cultivate the general ability to wrestle with problems that don't have interfaces. Questions of politics and ethics, and the expression of creativity cannot be reduced to a pre-packaged I/O device (despite the strident voices of ideologues who argue for just such a thing). 

If higher education is going to fulfill this role, it has to do as much work in unmaking minds as building them up, because many of our students are well-trained to expect (and demand) A-B-C-degree. The automated satisfaction of motivations by itself is a wonderful thing, but it can also make us dull with expectations that everything is push-button easy.

(The image is from Staples, where you can buy one of these. Their advertising gimmick is a sign of the times.) 


Monday, February 18, 2013

Teaching Critical Thinking

I just came across a 2007 article by Daniel T. Willingham "Critical Thinking: Why is it so hard to teach?" Critical thinking is very commonly found in lists of learning outcomes for general education or even at the institution level. In practice, it's very difficult to even define, let alone teach or assess. The article is a nice survey of the problem.

The approach I've taken in the past (with the FACS assessment) I've simplified 'critical thinking' into two types of reasoning that are easy to identify: deductive and inductive. Interestingly, this shows up in the article too, where the author describes the difference (in his mind) between critical and non-critical thinking:
For example, solving a complex but familiar physics problem by applying a multi-step algorithm isn’t critical thinking because you are really drawing on memory to solve the problem. But devising a new algorithm is critical thinking.
Applying a multi-step algorithm is deductive "follow-the-rules" thinking. He's excluding that from critical thinking per se. To my mind this is splitting hairs: one cannot find a clever chess move unless one knows the rules. We would probably agree that deductive thinking is absolutely prerequisite to critical thinking, and this point is made throughout the article, where it's included in "domain knowledge."

In the quote above, the creation of a new algorithm exemplifies critical thinking--this is precisely inductive thinking, a kind of inference.

Now I don't really believe that even the combination of deductive and inductive reasoning covers all of what people call 'critical thinking,' because it's too amorphous. It's interesting to consider how one might create a curriculum that focuses on 'critical' rather than 'thinking.' It could be a course on all the ways that people are commonly fooled, either by themselves or others. It would be easy enough to come up with a reading list.

Another alternative is to focus on the 'thinking' part first. This seems like a very worthy goal, and in retrospect it's striking that we don't seem to have a model of intelligence that we apply to teaching and learning. We have domain-specific tricks and rules, conventions and received wisdom, but we generally don't try to fit all those into a common framework, which we might call "general intelligence" as easily as "critical thinking." Usually it's the other way around--how do I embed some critical thinking object into  my calculus class? This latter method doesn't work very well because the assessments results (despite our desires) don't transfer easily from one subject to the next. This is the main point of the article linked at the top--domain-specific knowledge is very important to whatever "critical thinking" may be.

A Model for Thinking

I don't presume to have discovered the way thinking works, but it's reasonable to try to organize a framework for the purposes of approaching 'critical thinking' as an educational goal. The following one comes from a series of articles I wrote for the Institute for Ethics and Emerging Technologies (first, second, third), which all began with this article. The theme is how to address threats to the survival of intelligent systems, and it's informed by artificial intelligence research.

A schematic of the model is shown below.


We might think of this as a cycle of awareness, comprising perception, prediction,  motivation, and action. If these correspond to the whims of external reality, then we can reasonably be said to function intelligently.

The part we usually think of as intelligence is the top left box, but it has no usefulness on its own. It's a general purpose predictor that I'll refer to as an informational ontology. It works with language exclusively, just as a computer's CPU does, or the neurons in a our brains do (the "language" of transmitted nerve impulses). Languages have internal organization by some convention (syntax), and associations with the real world (semantics). The latter cannot exist solely as a cognitive element--it has to be hooked up to an input/output system. These are represented by the lower left and right blue boxes. The left one converts reality into language (usually very approximately), and the right one attempts to affect external reality by taking some action described in language.

All of these parts are goal-oriented, as driven by some preset motivation. All of this perfectly models the typical view of institutional effectiveness, by the way, except that the role of the ontology is minimized--which is why IE looks easy until you try to actually do it.

Each of these components is a useful point of analysis for teaching and learning.  Going around the figure from bottom left:

Measurement/Description When we encode physical reality into language, we do so selectively, depending on the bandwidth and motivation, and our ability to use the result in our ontology. At the beach, we could spend the entire day counting grains of sand, so as to get a better idea of how many there are, but we generally don't because we don't care to that level of precision. We do care that there's sand (the point of going to beach), but there are limits to how accurately we want to know.

Some language is precise (as in the sciences), and other sorts not (everyday speech, usually). What makes it usefully precise is not the expression of the language itself (e.g. I drank 13.439594859 oz of coffee this morning), but how reliably that information can be used to make predictions that we care about. This involves the whole cycle of awareness.

Example 1: According to wikipedia, the mass of a proton is 1.672621777×10e-27. This is a very precise bit of language that means something to physicists who work with protons. That is, they have an ontology within which to use this information in ways they care about. Most of us lack this understanding, and so come away with merely "protons weigh a very tiny amount."

Example 1: Your friend says to you "Whatever you do, don't ride in the car with Stanislav driving--he's a maniac!" Assuming you know the person in question, this might be information that you perceive as important enough to act on. The summary and implication in your friend's declaration constitutes the translation from physical reality into language in a way that is instantly usable in the predictive apparatus of the ontology. Assuming you care about life and limb, you may feel disinclined to carpool with Stanislav. On the other hand, if the speaker is someone whom you think exaggerates (this is part of your ontology), then you may discount this observation as not useful information.

The point of these examples is that description is closely tied with the other elements of awareness. This is why our ways of forming information through perception are very approximate. They're good enough for us to get what we want, but no better. (This is called Interface Theory.)

Here are some questions for our nascent critical thinkers:
  1. Where did the information come from? 
  2. Can it be reliably reproduced?
  3. What self-motivations are involved?
  4. What motivations do the information's source have?
  5. What is the ontology that the information is intended to be used in?
  6. How does using the information affect physical reality (as perceived by subsequent observations)?
Notice that these questions are also very applicable to any IE loop.

Question five is a very rich one because it asks us to compare what the provider of the information believes versus what we believe. Every one of us has our own unique ontology, comprising our uses of language, beliefs, and domain-specific language. If I say that "your horoscope predicts bad luck for you tomorrow," then you are being invited to adopt my ontology as your own. You essentially have to if you want to use the information provided. This represents a dilemma that we face constantly as social animals--which bits of ontology do we internalize as our own, and which do we reject? Which brings us to the 'critical' part of 'critical thinking.'

It's interesting that the discussion around critical thinking as an academic object focuses on the cognitive at the expense of the non-cognitive. But in fact, it's purely a question of motivation. I will believe in astrology if I want to, or I will not believe in it because I don't want to. The question is much more complicated than that, of course, because every part of the ontology is linked to every other part. I can't just take my whole system of beliefs and plop astrology down in the middle and then hook up all the pipes so it works again. For me personally, it would require significant rewiring of what I believe about cause and effect, so I'd have to subtract (stop believing some things) part of the ontology. But this, in turn, is only because I like my ontology to be logical. There's no a priori reason why we can't believe two incompatible ideas, other than we may prefer not to. In fact, there are inevitably countless contradictions in what we believe, owing to the fact that we have a jumble of motivations hacked together and presented to us by our evolutionary history.

Intelligence

The usefulness of intelligence lies in being able to predict the future (with or without our active involvement) in order to satisfy motivations. The way we maintain these informational ontologies is a dark mystery. We seem to be able to absorb facts and implications reasonably easily (Moravec's Paradox notwithstanding); we can't deduce nearly as quickly as a computer can, but we manage well enough. It's the inductive/creative process that's the real mystery, and there is a lot of theoretical work on that, trying to reproduce in machines what humans can do. Within this block are several rich topics to teach and assess:
  1. Domain-specific knowledge. This is what a lot of course content is about: facts and deductive rules and conventions of various disciplines, ways of thinking about particular subjects, so that we can predict specific kinds of events. This connects to epistemology when one adds doubt as an element of knowledge, which then leads to...
  2. Inference. How do we get from the specific to the general? At what point do we believe something? This links to philosophy, the scientific method, math and logic, computer science, neuroscience, and so on. Another connection is the role of creativity or random exploration in the process of discovering patterns. We might sum up the situation as "assumptions: you can't live with them, and you can't live without them." Because inference is a fancy word for guessing, it's particularly susceptible to influence from motivation. Superstition,  for example, is an application of inference (if I break a mirror, then I will have bad luck), and one's bias toward or away from this sort of believe comes from a motivational pay-off (e.g. a good feeling that comes from understanding and hence controlling the world).
  3. Meta-cognition. This is the business of improving our ontologies by weeding out things we don't like, or by making things work better by pruning or introducing better methods of (for example) inference. This is what Daniel Kahneman's book Thinking, Fast and Slow, is about. That book alone could be a semester-length course. Any educational treatment of critical thinking is about meta-cognition.
  4. Nominal versus real. Because we live in complex information-laden societies, we deal not just with physical reality but also with system reality. For more on these, refer to my IEET articles. One example will suffice: a system pronouncement of "guilt" in a trial may or  may not correspond to events in physical reality. At the point the verdict is announced, it becomes a system reality (what I call a nominal reality). The ontology of the system becomes a big part of our own personal version, and one could spend a long time sorting out what's real and what's nominal. For more on that topic, see this paper I wrote for a lit conference.
Motivation
Humans and the systems we build are very selective about what we want to know, and what we do with that knowledge. Understanding our own motivations and the that of others (theory of mind), and the ways these influence the cycle of perceive-predict-act, is essential in order to make accurate predictions. That is, intelligence has to take motivation into consideration. This invites a conversation about game theory, for example. The interpretation of critical thinking as the kind of thing that investigative reporters to, for example, must take motivations of sources into consideration as a matter of course.

In economics, motivation looks like a utility function to be optimized. Part of what makes humans so interesting is that we are laden with a hodge-podge of motivations courtesy of our genes and culture, and they are often contradictory (we can be afraid of a car crash, yet fall asleep at the wheel). The search for an 'ultimate' motivation has occupied our race for a long time, with no end in sight.
Here's a critical thinking problem: If motivations are like utility functions, they must be acted on in the context of some particular ontology, which goes out of date as we get smarter. How then are we to update motivations? A specific example would be physical pain--it's an old hardwired system that helped our ancestors survive, but it's a crude instrument, and leads to a lot of senseless suffering. The invention of pain-killers gives us a crude hack to cut the signal, but they have their own drawbacks. Wouldn't it be better to re-engineer the whole system? But we have to be motivated to do that. Now apply that principle generally. Do you see the problem?
Taking Action
This isn't usually thought of in connection with intelligence or critical thinking, but it's integral to the whole project. This is generally not the approach we take in formal education, where we implicitly assume that lectures and tests suffice to increase student abilities. Come to think of it, we don't even have a word for "active use of intelligence." Maybe 'street smarts' comes close, because of its association with 'real-world' rather than academic, but that's an unproductive distinction. I've heard military people call it the X-factor, which I take to mean a seamless connection between perception, prediction, and action (all tied to some underlying motivation, of course).

But of course the point of all this intelligence apparatus is to allow us to act for some purpose. There are great illustrations of this in Michael Lewis's book The Big Short, which show the struggle between hope and fear (motivations) in the analysis of the looming mortgage disaster, and the actions that resulted.

I've argued before (in "The End of Preparation," which is becoming a book) that technological and societal changes allow us to introduce meaningful action as pedagogy. It's the actual proof that someone has learned to think critically--if they act on it.

Being Critical
If some framework like the one described above can be used to examine intelligence in a curriculum, where exactly does the modifier come in? What's critical about critical thinking? Perhaps the simplest interpretation is that critique allows us to distinguish between two important cases (which may vary, but correspond to motivations). For example, in a jury trial, the question is whether or not to convict, based on the perceptions and analysis of the proceedings. It's these sorts of dichotomies--the aggravating fact that we can't take both paths in the wood--that makes intelligence necessary in the first place.

This general task is big business these days, in the form of machine learning, where distinguishing between a movie you will like and one you won't is called a classification problem. Netflix paid a million dollars to the winner of a contest to find a find a better classifier for assigning movie ratings.

It also makes a nice framework for teaching, and it's a common technique to set up a A vs. B problem and ask students to defend a position (and there's a whole library of resources set up to provide support for this kind of thing). In the abstract, these undoubtedly have some value it honing research and debate skills, but it seems to me that they would be more valuable when connect to real actions that a student might take. Is it worth my while to go to Washington to protest against X? Or go door-to-door to raise money for Y? Or invest my efforts in raising awareness about Z with my project? Maybe we need a new name for this: active critical thinking, perhaps.

So as educators, we are then left with the meta-question: is this worth doing?

Next: "Interfaces and Education" continues this line of thought.

Tuesday, February 05, 2013

Networking 2.0 for Assessment Professionals

That assessment has grown as a profession is obvious from the size and number of conferences devoted to the topic, and there is a thriving email list at ASSESS-L where practitioners and theoreticians can hold asynchronous public conversations. There are, however, limitations to this approach, and the purpose of this post is to speculate on more modern professional social networking that might benefit the profession.

I just turned 50, so my first response to any new idea is "Why is this important? I don't have much time left, you know."  So let's start with...

Why?

  1. To find out what other people think about something related to assessment.
  2. To connect with others who have similar assessment interests.
  3. To disseminate information, such as job listings, conference announcements, or research findings.
  4. To help establish a portfolio of professional activity.
One of the things on my personal wish list is a repository for learning outcomes plans and reports than could be seen and commented on by others. I think this transparency would reduce the variability in (e.g.) accreditation reviews of same.

This leads to...

How?

Below I'll describe some of the models that I have come across. There are surely others.

Email Lists 
This is currently done. There's a searchable archive, but it's not tagged with meta-data to make browsing and searching easier. My purely subjective ratings by motivations 1-4 listed above are:
  1. Email is great for finding out what others think, but the relative merit of any one response to a question is not easy to ascertain from the responses; there's a silent majority. Conversations are only threaded by subject line.
  2. Connecting with others is easy enough, but searching their post history to look at the subjects is not.
  3. Disseminating information is a strength of the email list, until it becomes spam-like.
  4. Participation on email lists is probably not something you can put on your resume.
Reddit-Style Discussion Board
Reddit.com is a segmented combination of news aggregator and discussion board, with threaded comments and a voting system to allow a consensus to emerge.
 It's easy to create a 'sub-reddit' on the site itself, or one can use the open-source platform to start from scratch.

Comments related to the motivations:

1. One can write "self-posts" that are like public text messages of reasonable length, to invite others' opinions, OR post a hyperlink to something interesting on the internet. It's very flexible as a general-purpose way to share ideas and create threaded conversations. Voting is a low threshold to involvement, and so there's more participation.

2. One can easily see someone else's post history, but these are not tagged with meta-data. There is a 'friend' feature to follow people of interest, and private messaging within the system is possible.

3. Reddit is a 'pull' rather than 'push' communication, meaning you have to actually go look at the site to see posts, as compared to emails, which arrive in your in-box whether you want them to or not. For the purpose at hand, this is probably preferred by some and not by others. Many assessment professionals are probably too busy to go surf the internet during the day. There are RSS feeds, however.

4. Reddit has a reputation system built in, and active (and popular) users accumulate 'karma'. But the site is not set up to be a meeting place for professionals, and it would have to be off-sited and re-themed to change the perception of it as a place for teens to post memes.

The StackOverflow Model

Stackoverflow.com, mathoverflow.net, and many other similar sites now exist to serve as a meeting place for professionals of different stripes. Comments:

1. The 'overflow' model excels at the Q&A give and take; this is its strong suit. Users post questions with meta-data to categorize them. Others can comment on the question or post a solution. All of these (question posts, comments, and solutions) can be voted up or down, and the original poster (or OP) can select a solution as the best, at which point it gets a check mark as 'best answer.'

2. User profiles are quite detailed, with graphs of activity and reputation scores. These are easily associated with meta-data tags, so it's nearly ideal for finding others with similar interests.

3. Each site has a culture and stated rules about what should and should not be posted. For example, 'soft questions' like "how much caffeine do you consume while writing code?" are generally frowned on, as job ads would be at most sites. But this is all adjustable. Like Reddit (and email for that matter), it requires some moderation, but users providing up or down votes provide most of the filtering.

4. The reputation system built into the overflow model is plausibly usable as an indicator of professional activity. For example, see the page for Joel David Hamkins at MathOverflow.net--a site for working mathematicians.

Other possibilities including using Facebook or LinkedIn or Google+ or Academia.edu or ResearchGate.net as a base platform. These all have the vulnerability of being beholden to a corporate interest, however.

More Connections

In addition to posting learning outcomes ideas/plans/reports/findings for public review, a well-designed professional networking site could seamlessly overlap with conference presentations, so that individual sessions could have a backchannel on the site as well. Twitter can accomplish this through hash tags, but these are limited and not combined into an easily findable place.

There are also possibilities for crowd-sourcing problems using collaboration sites, but this goes beyond the present scope.

Sunday, January 27, 2013

Finding Meaning in Data, Part II

In Part I, we took a look at a large data set from the perspective of trying to predict something. The example was artificial--there's no need to predict first generation status because it's easy to directly determine--but the survey items that link to that status tell us something about those students. So I'm using 'prediction' as a generic term to mean connections within a data set, and not necessarily chronologically.

But often we do want to make predictions of the future based on past information (inductive reasoning). I'll give an example below that expands on the first article by directly involving more than one predictor.

Here's a mock data set small enough to understand in full:

We want to predict graduation from the data we have at hand. Using the single-dimension predictors, the best one is Athlete status:


At this point, we could research this by interviewing students or others, or looking for more data. Or we can try to mine the available data further by involving more than one variable. In Part I, I used RapidMiner to do that for me, and that's a good option. Decision trees are particularly easy to understand and implement.

One of the key ideas in some of the standard models is conditional probability, meaning that if we restrict our attention to a subset of the whole data set, we may get better results for that subset. More discussion on the philosophy behind this will come in the next installment. For now, I'll use our sample data as an example.

Let's go 'meta', and ask about the predictors themselves, and in particular the Athlete flag, which accurately predicts 12 out of 16 cases now. Let's ask: can we predict when the Athlete flag works and when it doesn't? This is the same kind of problem that we're already working on, just asked at a higher level. Instead of asking which variables predict graduation, we want to ask which variables predict how well the Athlete flag works.

To do that, we can create an indicator column that has a 1 in it if the Athlete column accurately predicts graduation, and a zero if it doesn't. Here's what that looks like:

I've highlighted the cases where the Athlete flag == Graduation flag, and those cases have a 1 appearing in the new AthInd indicator column. Now we run the predictor algorithm to see what predicts success, and come up with:


We find that Gender is a good indicator of the success of Athlete as a predictor, and in particular, when Gender==0, it's a perfect predictor. So now we know that in this data set we can perfectly predict Graduation by using Athlete for all cases where Gender==0.

A simpler approach is to turn the problem around and imagine that the best 1-D predictor can become a conditional for another of the variables. In this case, we'd run our prediction process with the filter Athlete==1, and we find that Gender conditioned on athlete works just as well as the other way around: we can predict 100% of the graduates for Gender==0. This business of conditionals may seem murky, given this brief description. I will address it more fully in the next installment.

Real data is not as clean as the example. Picking up from the update in the first post, one of the indicators of high GPA as a college senior (in 1999) is RATE01_TFS--academic self-confidence as a freshman. If a student ranked himself/herself in the highest 10%, there's a 51% chance he/she will finish with (self-reported) A average grades. Using the easy method described above, we can condition on this case (RATE01_TFS==5) and see what the best predictors of that set are. Within this restricted set of cases, we find that the items below predict A students to the level shown:

  • Took honors course: 74%
  • Goal: Being very well off financially: not important: 70% (answered as a freshman)
  • Goal: Being very well off financially: not important: 70% (answered as a senior)
  • Never overslept to miss class or appointment: 69%
  • Never failed to complete homework on time: 69%
  • Very satisfied with amount of contact with faculty: 68%
  • Less than an hour of week partying: 66%
  • Self Rating: Drive to achieve (highest 10%): 65%
  • Faculty Provide: Encouragement to pursue graduate/professional school: 64%
  • Future Act: Be elected to an academic honor society: 64% (answered as a freshman)
  • Goal: Being successful in a business of my own (not important): 63%

All of these improve on the initial accuracy of 51%, but at the cost of reducing the applicable pool of cases into smaller chunks.

With three questions from the freshman survey, we have found a way to correctly classify 79% of a subset of students into A/not A (lots of fine print here, including the assumption that they stay at the school long enough to become a senior, etc.). Here's the performance:


This is great. However, we have narrowed the scope from the original 5500 cases of A-students to about 800 by conditioning on the two items above (only one of the two had to be true: being well off financially being not important OR anticipating being elected to an academic honor society). However, this is not a bad thing--it gets us away from the notion that all A students are alike, and starts us on the path of discriminating different types. Note that to have confidence that we haven't "overfit" the data, the results need to be validated by testing the model against another year's data.

Saturday, January 26, 2013

Finding Meaning in Data, Part I

Large data sets can be heartbreaking. You suspect that there's something in there--something really useful--maybe even a "knock it out of the park" bit of wisdom, but finding it is often not trivial. The Scylla and Charybdis in this classical tale are (*) wasting an inordinate amount of time to find nothing, or (@) trying too hard and bestowing confidence on something meaningless. The perils are everywhere. At one point, I had myself convinced I'd found a powerful predictor of attrition: students who didn't have financial aid packages. After chasing down that rabbit hole for too long, I discovered that the connection was real, but in the wrong direction: students who left had their packages scrubbed from the system (an argument for data warehousing).

The only solution, I think, is to make it easier to find interesting connections, and easier to know if they are real or spurious. To that end, I've strung together a Rube Goldberg contraption that looks for interestingness and reports it in rank order, interestingest on top. There's quite a bit to say about the hows and whys, but let me start with an example.

Yesterday I happened across an article about the disadvantages first-generation students face in college:
Given ... the ways in which students’ social class backgrounds shape their motives for attending college, we questioned whether universities provide students from these different backgrounds with an equal chance of success. — Assistant Professor Nicole Stephens
And if you're interested in that, there's a recent article in the Atlantic "Why Smart Poor Students Don't Apply to Selective Colleges [...]"
[T]he majority of these smart poor students don't apply to any selective college or university, according to a new paper by Caroline M. Hoxby and Christopher Avery -- even though the most selective schools would actually cost them less, after counting financial aid. Poor students with practically the same grades as their richer classmates are 75 percent less likely to apply to selective colleges.
Now 'poor' and 'first generation' are not the same thing, but they overlap substantially. We can test that by looking at some data (albeit old data).

The nice folks at HERI allow access to old survey data without much fuss, and I downloaded the Senior Survey results from their data archive to use for a demonstration. The survey is high quality and there are lots of samples. I limited myself to results from 1999, about 40K cases, of which somewhat more than half are flagged as not being in the 2-year college comparison group.

This is an item on the survey:

FIRSTGEN_TFS First generation status based on parent(s) with less than 'some college'
  • (1) No
  • (2) Yes
Wouldn't it be interesting to know how the other items on the survey relate to this one? My old approach would have been to load it up in SPSS and create a massive correlation table. Later I figured out how to graph and filter the results, but there are problems with this approach that I'll cover later.

Now, within about a minute I have a full report of what significant links there are, answering a specific question: if I had the whole survey except for this FIRSTGEN_TFS column, how well could I predict who was first generation? Here's the answer.
  1. (.95) Father's education level is lower
  2. (.94) Mother's educational level is lower
  3. (.63) Distance from school to home is lower
  4. (.63) Get more PELL grant money
  5. (.61) More concerns about financing college education
  6. (.60) Less likely to communicate via email with family
  7. (.60) Financial assistance was more important in choice to attend
  8. (.59) Don't use computers as often
  9. (.59) Lower self-evaluation of academic ability as a Freshman (less true as a senior)
  10. (.58) Wanted to stay near home
  11. (.57) Are more likely to work full time while attending
  12. (.57) Are less likely to discuss politics
  13. (.57) Are less likely to have been a guest at a professor's home
  14. (.57) Spend more time commuting
  15. (.57) Are more likely to speak a language other than English at home.
  16. (.57) Evaluate their own writing skills lower
  17. (.57) Low tuition was more important in choice to attend
  18. (.56) Have a goal of being very well off financially.
  19. (.56) Had lower (self-reported) college grade average
This list goes on for pages. The numbers in parentheses are a measure of predictive power, the AUC which will be described in a bit. Obviously mother and father's educational level are linked to the definition of first-generation student, but the others are not.

We can see in this list a clear concern, even back in 1999, about finances. There's also a hint that these students are not as savvy consumers as their higher-SES peers: the desire to stay close to home, for example. We could follow up by explicitly chasing ideas. What's the profile of high-GPA first generation students? Does gender make a difference? And so on. These are easily done until the point where we don't have much data left (you can only disaggregate so far).

Rather than doing that, let me show the details of this automated survey analysis. Each of these items on the list above comes with a full report, one of which is reproduced below.


This is a lot to take in. For the moment, focus on the table near the bottom called "Predictor Performance." If we just used the response to this item about the likelihood of having to get a job to pay for college, the predictor works by classifying anyone who responds with (4) Very good chance or (3) Some chance as a likely first-generation student. This would result in correct classification of 86% of the first generation students (true positives), diluted by 89% (from 1-.18) of the non-first-generation students (false positives). This is not a great predictor by itself, but it's useful information to know when recruiting and advising these students. 

The value of a classifier like this is sometimes rated by the area under the curve (AUC) when drawing true positive rate versus false positive rate--a so-called ROC curve. That's the left-most graph above the table we were just looking at. The next one to the right shows where the two rates meet (slightly to the left of 3), which defines the cut-off for the classifier. The histogram to the right of that shows the frequency of responses, with blue being the first-gen-students and red the others. The final graph in that series gives the fraction of first-gen students who responded with 4,3,2,1 for this item, so if we chose a student who responded (4) to this item, there's a 19% chance that they were first generation, as opposed to a 9% chance if they responded (1).

Note that even though this is not a great predictor, we can have some confidence that it's meaningful because of the way the responses line up: 4,3,2,1. These are not required to be in order by the program that produces the report. Rather, the order is determined by the usefulness of the response as a predictor, sorted from highest to lowest. As a result, the In-Class fraction graph always shows higher values on the left.

The table at the very top shows the true positive and false positive rates (here, In-class means the group we have selected to contrast: students who checked the first-gen box in this case). The number of samples in each case is given underneath, and a one-tailed p-value for the difference of proportions is shown at the bottom, color coded for easy identification of anything less than .05.

You can download the whole report here.

We can take the whole list of good predictors and generate a table of predictions from each. These are each one-dimension, like the question above about the likelihood of needing a job while in college. We can see how these overlap by mapping out their correlations. That's what the graph below shows (down to correlation +/-.40).



The good news is that there are no negative correlations (predictors that contradict one another). The items cluster in fairly obvious ways and suggest how we might pare down the list for further consideration.

In order to create multi-dimensional models, we can try using standard machine learning algorithms using RapidMiner or Weka, which are free. An example using the former is shown below, employing a Bayesian learning algorithm to try to sort out the difference between first generation students and the others using the one-dimensional predictions already generated.


The result is a classifier that has usable accuracy. In the table below, first generation students (as self-reported on the survey) are coded as 1.


The model correctly classifies 15,219 non-first-gen students and 2,300 first gen students, and incorrectly classifies a total of 6,259 cases. This is better performance than any of the single predictors we found (leaving out parental education level, which is hardly fair to use). The AUC for this model is .76.

Note that this is only part of the work required. Ideally we create the model using one set of data, and test it with another. For example, we could use 1998 data to predict 1999 data.

This general approach can be used to predict enrollment, retention, academic success, and anything else you have good data on. There's much more to this, which is why the title has "Part 1" in it. Stay tuned...

Update: As another example, I contrasted students who report having an A average in their senior year to those who don't (for 4-year programs), and eliminated those in the B+/A- range to make the distinction less fuzzy. You can see the whole report on one-dimensional predictors here.

You'll see things you expect, like high academic confidence, and good study habits (not missing class, turning homework in on time), but there's also a clear link to less drinking and partying, intellectual curiosity (e.g. taking interdisciplinary courses), the role of professors as mentors and role models (e.g. being a guest in a professor's home), a trend toward helping others through tutoring and service learning, but the one that surprised me the most is shown below. It's an item that is asked on the freshman survey and repeated on this senior survey. They give virtually identical results:


This is matched by a question about having a successful business, with similar results.  Both show a clear tendency of the top grade earners to not be as motivated by money as students who get Bs or below (remember, this is 1999).

Continued in Part II