Monday, August 31, 2009

Money, Genes, and College

The New York Times has a recent article on SAT related to family income here. It shows that incomes and scores have high positive correlation for 2009. I've reproduced the graph from the article below.
The article doesn't say that income causes higher scores (as I recently speculated), but the article is nevertheless criticised by an economics professor here. In his blog post, Prof. Mankiw suggests that a significant part of the slope is due to genes, with an argument along the following lines, which I've made more explicit here:
  1. IQ correlates positively with income
  2. The ability to perform well on an IQ test is influenced by heredity
  3. IQ correlates positively with SAT
Therefore, students of wealthy parents should have higher SAT scores merely because they are smarter. This is presented as a bias for explaining part of the slope of the curve evident above (which is greatly exaggerated by the scale used, please note). How much of the SAT bonus is due to IQ is not spelled out, other than the concluding note in Dr. Mankiw's article:
It would be interesting to see the above graph reproduced for adopted children only. I bet that the curve would be a lot flatter.
I interpret "a lot flatter" to mean that the IQ contribution accounts for a significant part of the slope. This is all reminiscent of the The Bell Curve and the controversy of genes vs. environment in the creation of intelligence.

Some analysis is in order. The inheritability of intelligence is a very political topic. The left would like to assume that all people really are created equal, and that the "blank slate" is there to be written upon. This legitimizes interventions that affect socio-economic status (SES). The right would like to believe that interventions are counter-productive because intelligence is fixed at birth. Both argue from the conclusions back to reasons for believing them, which is probably some kind of tragedy of the commons in the public realm: in order to stay in power, parties have to act sometimes in a way that is contrary to the common good. Since I don't have to be elected I can freely wish a pox on both their houses. Where is the science on the matter?

1. Income and IQ

There seems to be good evidence that IQ and income correlate positively. It's important to remember that IQ and intelligence are not the same thing. The first is a monological definition based on a particularly kind of cognitive test, and the second is a vocabulary item in common usage. In The Bell Curve and subsequent articles, the authors try to make the case that intelligence stratifies the employment landscape with the Very Dull at the bottom, hardly educated and hardly employable, and the Very Bright at the top. (Those words annoy me because dull is not the opposite of bright. Dim is.) In reading those arguments, it conjures up for me a kind of overarching social history that's implied--something on the order of what Marx sparked. In any case, it's a very strong conclusion they try to reach. You can spend a lot of time reading the debate about that book and related concepts.

It's noteworthy that physical attractiveness seems to also be linked to higher income.

2. Genes and Intelligence

The problem with an approach like The Bell Curve is that it's the wrong discipline. If you want to make conclusions about genetics, you need to actually look at some genes. This is especially true if you want to make big conclusions. The authors try very hard to control for environmental factors, but that doesn't substitute for identifying DNA that is causally linked to intelligence. That kind of work is being done by biologists, however. Here are some examples you can find on
Of course, there is science pointing to environmental influence over intelligence as well:
Largely ignored in the grand debate over environment vs. genes are new findings about epigenetics, which you can survey here and here.

I think a reasonable person has to conclude that genes, epigenetics, and environment all play a role. Because genes are discrete, it ought to be possible to identify consequent effects with more precision than the other categories. The third article linked in the list above pins a particular gene to an R^2 of 3% of IQ.

One of the puzzling things about IQ is that if it really describes a primarily genetic effect, how can we explain the dramatic rise in scores in the last decades? This is the so-called Flynn Effect, summarized in Wikepedia as the change over a 30-year period where:
  1. the mean IQ had increased by 9.7 points (the Flynn effect),
  2. the gains were concentrated in the lower half of the distribution and negligible in the top half, and
  3. the gains gradually decreased from low to high IQ.[reference]
What would a genetic-based explanation look like? I think you'd have to assume that the genes for Dullness became expressed less often, perhaps because they are going extinct. This would suggest a harsher survival environment for said genes. It seems to me, however, that's it's becoming easier to survive without developing intelligence, but that's just my take on it. Others have commented on this at great length. See a rather critical piece here.

I think it's also fair to conclude from the evidence we have that smarter parents on average will produce smarter offspring, all other things being equal, but that this is not fully deterministic. The question of how much intelligence is inheritable is still open.

3. IQ and SAT

The link between IQ and SAT is also controversial, at least from test-maker's point of view. See an overview here. There seems to be a strong correlation between the two, which you can read about in this research article.

SAT isn't very good at predicting first year college grades, but that's what it's designed for. It's even less good at predicting success beyond the first year. This isn't perhaps surprising: the content of the test resembles high school and college freshmen academic work. There are many other factors that influence success. I was interested to read here that:
Bates College, which dropped all pre-admission testing requirements in 1990, first conducted several studies to determine the most powerful variables for predicting success at the college. One study showed that students' self-evaluation of their "energy and initiative" added more to the ability to predict performance at Bates than did either Math or Verbal SAT scores.
Is IQ similarly limited in predicting college success? I couldn't find anything definitive about that in the time I had at my disposal, so I'll leave the question open.


If we can reach any conclusion, it's a very weak one. Almost certainly income is indirectly linked to SAT scores through the associations advanced. However, how much that bonus is remains unclear. My blog post last time pointed out that it's not only that higher incomes get higher SATs, but that the year-to-year differential is also higher. If this is not some statistical artifact, it doesn't jibe with a purely genetic explanation, as it would require the gene pool to be evolving at a very high rate, implying extraordinary pressure from a fitness gradient.

I looked for more historical data on this increase as a function of wealth, but unfortunately it's only in the last two years that the SAT included the salary range from zero all the way to $200,000+. The old version only went to $100,000, and the most interesting part of the curve is above that. Having said that, I did not find any evidence for my theory about the economic value of error with the data that is available. We'll have to wait for next year's results.

There is, however, a direct causal explanation that is difficult to imagine away. It contrasts with the somewhat specious illustration Prof. Mankiw gives:
Suppose we were to graph average SAT scores by the number of bathrooms a student has in his or her family home. That curve would also likely slope upward.
Correlation and causation are different things. But consider another scenario. Suppose we were to graph SAT scores by the number of prep-tests a student attended, or the number of times they took the test (guaranteed on average to raise the maximum score), or the quality of the high school they attended. All of those have plausible causal connections to how well a student performs on a test of high-school cognitive material, no? And which students have access to the best high schools? Can afford to take the test multiple times? Can afford prep-tests that are advertised to raise scores 100 points?

The ironic thing is that the economic value of raising the SAT is real because of the usual policies in awarding financial aid. So the wealthiest are in the best position to reap that aid for both reasons of intelligence and the ability to buy error (over-prediction due to coaching or taking multiple tests). This trend has been documented for a long time here.

This line of thought gave me an interesting idea some time ago, which I hope to turn into a project. At my current institution we have embarked on a shift toward recruiting better students (students who have a better chance of success). But the conversation with the board, as well as public perception, includes the median SAT scores of the students we admit. My preference is to use better predictors (including noncognitive ones) to find the best students, but this is largely independent of their SATs, which creates a tension between actual goals and perceptions.

The idea is to create a grant-funded summer camp for the low-SAT that are nevertheless predicted to do well. In this camp they would receive SAT test-taking preparation and then retake the test immediately afterwards. This would raise median SATs without affecting our ability to get the students we want.

Update: here are some references:

Thursday, August 27, 2009

Amplification Amplification

There's something seductive about self-reference--a topic delightfully expounded in Douglas Hofstadter's book Metamagical Themas. Self-reference shook the foundations of math with the Russell Paradox and then Gödel's incompleteness weirdness. But that doesn't stop the fun--by no means! The latest, coolest application I've seen is the RepRap, which is a machine that can build other things, including itself. I suppose that if the conditions are right it will create a whole ecology of replicator replicators. That's what we are too, of course, just squishier and less prone to rust.

This gratuitous introduction is by way of explaining the title. I've spent some time thinking about Zog's Lemma about the amplification of error, and wondering if there's actually anything there. I think there may be, even in the cold light of the fluorescent sun. I will therefore try to amplify on my prior post.

The idea is this:
As economic value increases, for weak predictors the error of prediction can increase over time.
I opened up WinEdit to try a mathematical formulation, but I haven't gotten too far with that yet. The formality may get in the way anyway. We're considering the prediction of the value of some future valuation. The examples will make it clear, but think Prediction = Actual + Error. The last term could be negative. Consider a couple of scenarios.
1. Imagine pulling a dollar bill from your wallet or handbag. How much is it worth, in dollars? One, right? Well, if you got a fiver by mistake, it's worth five bucks. You're not likely to be convinced by a huckster that your fiver is only really worth one and fork it over for four quarters.
Here we have a situation where predictive error is very low and economic value is low too. The actual value of the dollar is realized when you buy something with it. Let's suppose that a fast food burger is a dollar. If you get a single, it was a dollar. If you get five, it had Lincoln on it.
2. Now imagine that you can't tell the difference between a one dollar bill or a five, or a twenty, or a hundred... You have to depend on "experts" for this. The problem is real because you don't want to walk to the burger stand with a hundred dollar bill; you're just not that hungry. This is not entirely hypothetical: US currency is all the same size, so if you're blind this is the way it is. Euros come in different sizes.
Now the predictor has dropped because you can't be sure whom to trust. There's a chance someone will lie to you when you show them the bill and try to take advantage of you.

Both of these scenarios are with relatively low stakes. The competition for your one dollar or five is not as great as if it were a million, right? What happens when we now consider situations where the economic value of the prediction is high?
3. Airplanes are pretty well understood--just watch this. Predicting what will happen under lots of conditions is achievable. Still, things go wrong once in a while. I think that we can agree that this is high stakes--no need to put a dollar figure on a successful landing.
In this situation, what has happened is that the error has gotten smaller over time. That is, by assiduously investigating every prediction failure, we learn how to make better predictions. This is just how science works, of course.

Now for the weird part. Try to imagine situations where the economic value of the predictor is high, but the accuracy isn't so great. What happens now?
4. You want to plant your crops as soon as possible for economic reasons. The shaman says that the gods have decreed that the last freeze will come late because of some indiscretion within the tribe.
This is deadly serious--if your crop gets frosted over, you may not last the next winter through. But the predictor isn't a very good one. What are the dynamics? In this case, the hapless farmer doesn't have many options. Without an anachronistic scientific approach, there is no way to reduce the error. But that's not the end of the story.

The Shaman has a vested interest in maintaining credibility, no? If it emerges that he's full of buffalo cakes, the tribe may put him out on his ear. Since P = A + E, and (despite his best efforts) he has no way to influence the actual outcome A, nor can he improve his predictions P, his only margin for improvement is E. But how can that be? True, he (or she) can't actually improve the error, but he can try to make it appear so with mystical mumbo-jumbo of sufficient impressiveness. So his best bet is to create the best fakery he can--double and redouble the pageantry and behavior that's so far out of norm that it MUST be profound. A successful faker can then get by without paying any attention to the real problem of prediction, and can just get better and better at pretending to.

The sticky point of this equation--what we need actual math for--is whether Herr Shaman could completely abandon accuracy. Maybe he has some residual knowledge passed down about the seasons, the moon, etc. Can the error rate actually naturally increase in such a situation? Only if the effort in maintaining the better predictor is not more than compensated by an equal effort spent toward new and improved humbug.

But what if we can affect the error ourselves?

5. Odysseus wants to sneak his men past the furious cyclops Polyphemus, who he has just blinded. He and his followers ride underneath sheep so as to fool the giant as he checks who exits by touch. (Homer I ain't.)
The predictor is the cyclops in this high stakes assessment. Odysseus (not being Circe) can't change the fact that his men are men, so the A is immutable. He can only affect the prediction by increasing the error, which he does in fine style. Notice the contrast between this and the previous example. In the previous one, disguising the size of E was done for economic value. Here, E is actually increased as much as possible for the same reason. Odysseus spends a lot of time increasing E, actually.

In common parlance, this is called cheating or manipulation, and it's common even in low-stakes situations as we all know.
But what about the amplification of error I advertised? Glad you asked.

Just imagine a recursive situation like number five above, where there are repeated rounds. Each time, Odysseus tries to sneak by the mutilated son of Poseidon, and each time the cyclops tries to detect him. There are lots of situations like this: they evolve. In biology it's sometimes called a Red Queen Race, after a bit from Alice in Wonderland about running as fast as you can just to stay in place. So we have a curious effect where the predictor P is precariously balanced against the error E, like a tug of war: not moving much perhaps, but with great forces involved.

But there's no reason to assume the playing field is fair. What if the conditions are better for creating E than for eliminating it? That might be the case for the SAT test. The chart below was clipped and pasted together from one in InsideHigherEd yesterday here. It lists year over year improvements in SAT score averages by income range.

It shows convincingly that more money gets you higher scores. Exactly why that's true is debatable, but remember this is an increase for one year. One hypothesis is that students of wealthier families have more access to test preparation and can pay to take the test more times, and probably have parents who enable all this as well as pay for it. Does all this effort increase the teen's actual ability to succeed in college (let's call it A)? Or is it responding to the economic value of the predictor (the SAT score) by increasing E instead?

If it's the latter case, then we have a pretty snapshot of an imperfect estimator demonstrating prediction error on the increase because of economic forces. This would imply that the psychometricians who create and score the SAT haven't found a way to counter the increased error that's hypothesized. Zog's moment in the sun?

Tuesday, August 25, 2009

Zog's Lemma: Assessment and the Amplification of Error

The concept of outcomes assessment is like a Swiss Army spatula: it's used in all kinds of ways. As opposed to the proverbial Russian Army hardware, pictured below (ubiquitous on the Internets):
To some, outcomes assessment is a touchy-feely endeavor of encouraging the practitioners of higher education to do the right thing, close the right loop, bring a glowing smile to the visiting team. All that. I'm comfortable with that.

To others, it's a more serious matter, more scientific in approach, rather like making tick-marks on the door's threshold on a child's birthday to signify evidence of growth. We know that is serious business: with shoes or without, and where exactly is the top of the head? Does hair count or not? It grows too, after all.

The scientific approach requires belief in theory: that assessments are valid to the purposes we employ them for. In reality we usually have no real way to know that based on a solid physical theory. The belief has to be defended by what statistics can be summoned to make a case. (In truth, beliefs don't need to be defended at all; they only need to be believed.) What comprises this validity? I'd like to address that question sideways. The discussion of what constitutes validity is a groove cut deeply in the literature of psychometrics, and I'd rather ask a more important question: of what use is it to believe in the validity of an assessment?

It's easy to make hay from the fact that the definition of validity itself isn't settled, but this is unfair, I think. It's a difficult philosophical nut to crack. For my purposes, predictive validity is the most important aspect of assessment. If an assessment doesn't tell us anything about what's going to happen in the future, I don't see much use in it. The Wiki article on predictive validity contains an interesting observation that is the crux of the value of an assessment:
[T]he utility (that is the benefit obtained by making decisions using the test) provided by a test with a correlation of .35 can be quite substantial.
Utility is a concept from economics that allows for a weighting of outcomes to balance outcomes that would otherwise be numerically indistinguishable. A classic example is the fact that $1000 means more to a poor person than it does to a rich person, even though it won't buy any more for the former. The relative worth to the penniless is more. The point is that even imperfect predictive assessments are useful.

An example of this is the college admissions process. Let's suppose that with the data gathered on the application, the institution can estimate the probability of "success" of student. Success could be retention to second year, or GPA > 2.5, or graduation, or whatever you deem important. The statistics can be generated with a logistic regression, which will yield a model with a certain amount of predictive power. No model is perfect, and even in retrospect (feeding the original data back into the model) it will not correctly classify applicants as "predict success" or "predict failure" with 100% accuracy. There will be some proportion of false positives and false negatives. This can be visualized on a Receiver Operating Characteristic (ROC) curve. You can see a bunch of them on google images. The usefulness of the ROC curve is that it lets you visually explore the decision of where to set a threshold for decision-making. If you set it too high, you get too many false negatives (reject too many qualified candidates). Too low and you admit too many false positives.

For an institution, such a tool is obviously useful to believe in: one can work backwards from the desired number size of the entering class to see where the threshold should be set. You could even estimate the number of false positives. Any power to discriminate between successful and unsuccessful students is better than none. Of course there are other factors, such as ability to pay, that make this more complicated. To keep things simple, I won't consider these distractions further.

One can imagine a utopia springing from this arrangement, where the assessments continually get better and the applicants learn to distinguish themselves by giving signals that the assessments can recognize. The first doesn't seem to be happening, and the second has an unfortunate twist. An analogy from biology serves us well here.

The story of the peacock, according to evolutionary biologists, is that a showy mating display is worth the biological cost of making all those pretty feathers because of the payoff in reproduction. The similarity is that when mates choose each other they have limited information to go on--an assessment we could characterize with a ROC curve if we had all the facts. This produces a distortion that favors any apparent advantage in a potential mate, or in the case of the peacock, creates over time a completely artificial means of assessment. The peacock is saying "look, I'm so healthy I can drag around all this useless plumage and still escape predators."

So, if the value of partial assessments is clear from the institutional vantage, it's a different picture altogether from the applicant's point of view. It's interesting that InsideHigherEd has an article this morning on this very topic. According to the article, in response to increasing competitiveness at "top institutions":
[H]igh school students could respond to the pressure by taking more rigorous courses and studying more -- or they could focus their attentions on gaming the system and trying to impress.
A study by John Bound, Brad Hershbein, and Bridget Terry Long shows that while this perhaps motivates students to take a more rigorous curriculum, it also prompts them to spend more time in test preparation, or in games like trying to engineer more time to take the test. Peacock plumage? It's hard to say without seeing actual success rates. It could be that having the willingness to spend all that extra effort is itself a noncognitive predictor of success (that is, not related to the scores themselves, but to the personality traits of the applicant). As Rich Karlgaard put it in Forbes magazine, a degree from an elite institution is valuable because:
The degree simply puts an official stamp on the fact that the student was intelligent, hardworking and competitive enough to get into Harvard or Yale in the first place.
What is clear is that those with the means to game the system are better off than those who do not. So we see things like test-prep for kindergartners at $450/hr in the upper crust of society, but probably not so much in housing projects. This likely produces more false positives among the select group: it's as simple as money buying better access. [Update: see this InsideHigherEd article for some dramatic numbers to that effect. Year over year, SAT scores increased 8-9 points for $200,000+ families, and 0/1 points for the poorest group.]

The economic demand for false positives has become an industry under No Child Left Behind, and that doesn't seem to be changing. The New York Times has published letters from teachers on this topic. Some quotes:
  • [T]he use of test data for purposes of evaluating and compensating teachers will work against the education of the most vulnerable children. It is a mistake to conceptualize education as a “Race to the Top” (as federal grants to schools are titled) — for children or schools. (Julie Diamond)
  • Linking teacher evaluations to faulty standardized tests ignores the socioeconomic impact on a nation that is both rich and poor. Can a teacher confronting the poverty of some children in Bedford-Stuyvesant be made to compete with a teacher instructing affluent children in Scarsdale?(Maurice R. Berube)
  • My job went from teaching children to teaching test preparation in very little time. Many of our nation’s teachers have left their profession because the focus on testing leaves little room for passion, creativity or intellect. (Darcy Hicks)
  • Standardized tests are, by their nature, predictable. Most administrators and teachers, fearing failure and loss of position and/or bonuses, de-emphasize or delete those parts of the curriculum least likely to be tested. The students sense this and neglect serious studying because they know that they will be prepped for the big exams. (Martin Rudolph)
The economics of false positives is clear: test prep is an industry, teachers and administrator and schools are rated by how many they generate. Of course the object is not to create false positives, that's just the result of so much emphasis on an imperfect assessment.

Less attention is paid to false negatives. Research points to noncognitive traits like grit, planning for the future, and self-assessment as being important to actual success, but these are not directly accounted for in standardized assessments. To be sure, college admissions officers look at extra-curricular activities to try to add value to SAT, GPA, and curriculum, but I think it's safe to say that the cognitive assessments are primary. How badly do we underestimate actual performance?

That question comes up when we evaluate or create a predictive model for applicants. The resulting predicted (first year) GPA can be used for admissions decisions and financial aid awards. In the Noel-Levitz leveraging schema applicants are sorted into "low-ability" to "high-ability" bins for individual attention. The question of false negatives is the same as the question "how accurate is the description 'low-ability', based on the predictors?" Not very good, as it turns out.

Every time I ran the statistics I got the same answer: at the very bottom end of our admit pool--those Presidential and provisional admits who were supposed to have the hardest time--about half of them performed well (I usually use GPA > 2.5 for that distinction). Since we reject students below that line, we should assume that about half of those students just below the cutoff would have done as well too. If this is typical, there are a LOT of false negatives.

Remember, this line of thought applies to all outcomes assessments to one degree or another. Let's take a concrete example. Suppose we want to assess vocabulary knowledge of German language students with an exam. Memorizing a single word (including declensions for nouns and conjugations for verbs) is low-complexity. The total complexity is that for one word times, let's say 10,000 total items, less any compressibility of this data. All told, this is a lot of complexity (measured in bits) for a human. So it's a suitable subject for assessing, and the low complexity per item means that each of them can be assessed with confidence.

Supposing that we do not have the resources to test our learners on all 10,000 items of vocabulary, we'll have to sample randomly and hope that the ratios are representative (or else spend a lot of time checking correlations and such). Maybe our assessment is only on 100 items, chosen at random from the 10,000. If Tatiana actually only knows K of the items, then there is a chance of K/10,000 that she will know an individual word, and theoretically score K/10,000 on average (on any sized test). Given this arrangement, what is the trade-off between false positives and false negatives?

We will assume we can use the normal distribution to estimate these Bernoulli trials. This only requires that K not be too large or too small--in those cases the chance of an error diminishes anyway. We already know the mean is p=K/10,000, and the standard deviation is given by SQRT(p(1-p)). Two standard errors is about 3% for large and small values of p, ranging to about 5% for those in the middle. These would decrease for a number of items larger than 100, and increase for a smaller test.

The conclusion is that if we set our cutoff for passing in a usual spot, say 70%, we should expect at least half of the scores in the range 65-75% to be either false positives or false negatives. For example, if Tatiana actually knows 70% of the vocabulary items, she has a 50% chance of passing the test because the distribution of her scores over all possible tests is (almost) symmetrical and centered on 70%.

The actual number of false positives and negatives depends on where the skill ranges of the test-takers lie in relation to the cutoff value. The more there are close to the cutoff, the more errors there will be. I actually witnessed a multiple-choice placement test being graded one time, and inquired about the cutoff to find that the grade to pass was actually less than the average result expected by chance! This certainly reduces false negatives, but I'm not sure about the overall result.

The vocabulary example is a best case, where the items themselves are of low complexity and (we hope) not subject to a lot of other kinds of error. When the predictive validity is actually quite low (as in the example where we explain a small percentage of the variance in the outcome), the proportion of errors in both directions is far worse. What does this mean, then when we "believe" in a test like the SAT and adopt it as a de facto industry standard? Even together with high school GPA, these can typically explain only about 33% of the variance of first year college grades.

First, it is of benefit to the institution to be able to imperfectly sort students by desirability. It is costly if the institution has to bid for the apparent best applicants with institutional aid, and there is incentive to look at noncognitives and better predictors, but I can't see a lot of progress in that direction. So the false positives get bid up with the rest. Meanwhile, the false negatives--those applicants who would succeed but don't show it on the predictor--get passed over for admission or merit aid.

A tentative conclusion is that a weak predictor gets amplified in a competitive environment. This is probably true in lots of domains, like the evolutionary biology example of the peacock. Probably some economist has his/her name attached to it: Zog's Lemma or something. I'll have to ask around. (I just googled it--apparently there is no Zog's Lemma.)

For the assessment types, there are two lessons. First, there are going to be errors in classification. Second, those errors get amplified as the assessment gains importance. This is an argument against high-stakes assessments, I suppose. I've always gotten good results by keeping assessments free from political pressures like instructor or program review. "Accountability" creates problems as it tries to solve them. Acknowledging that would be a fine thing.

PS, if you're interested in other kinds of amplifiers in science, read this.

UPDATE: see Amplification Amplification

Friday, August 21, 2009

The Use of Grades

There's been an interesting discussion on the ASSESS listserv about assessments vs. grades, which led me to think about the relative uses of each. Part of the problem in addressing that difference is that grades come in all sorts of flavors that may or may not resemble an outcomes assessment. For example, a carefully designed calculus final exam may pass quite nicely as a summative assessment. By contrast, a student who increases his final grade by attending an extracurricular event (say in an orientation course) has little claim that the grade is an reflection of his performance in some cognitive skill. The summing of individual grades into a cloudy average makes this worse (see: Statistical Goo). The same can be done for assessments, of course.

One fundamental difference between grades and assessments is that the former is used to motivate students. In fact, we've created a whole industry that depends on this kind of motivation, from accreditation downwards to the classroom: a kind of "do this or else" mentality. Surely this can't be ideal. I appreciate the requirements of state education to more or less force kids out of their beds in the morning, onto busses, and subject themselves to ideas that hurt to absorb. But higher education? Especially in the liberal arts, we talk about general goals like creating life-long learners and such. How does that square with our methods of delivery?

That discussion comes back to the topic of noncognitive traits in learners: how motivated is Tatiana to learn computer programming? For a motivated learner, assessments are better than grades because they get straight to the point of "how well am I doing?" without the coercive baggage that's inherited from elementary school. This is, by the way, an argument for detailed reports in assessments as well. The more gooey they become, the more they resemble grades and the less useful they are. Compare:
Tatiana sees her score of 79% on the C++ test and concludes that she is doing okay, but not excelling.

Tatiana reads her C++ assessment and sees that while basic control structures are second nature to her, she really doesn't understand pointers.
Only the second of these is actionable--Tatiana can increase her skills by practicing with pointers, getting tutoring, reading what others have to say about the subject. Learning is about details, and assessments should be too. With the ubiquity of modern information systems, keeping track of details isn't a problem--we just have bad habits left over from the grade school mentality of reducing a semester's work to a single letter. It's absurd, if you think about it.

Can we find another way to motivate students? I don't know, but I don't think we're really trying. I can imagine a culture that fosters a more inquisitive approach to self-improvement, but don't know how that might be engineered. And yet, we don't really want to produce graduates who only perform in order to get the pellet in a Skinner box, do we? I think the first step is to start to assess certain noncognitive traits, and bring them into the curriculum (and not just in orientation course). I'm not saying that all students lack self-motivation, of course. We have and value the go-getters in class who drive classroom discussions, ask for new things to read, and always come for help when they don't understand. How do we harness that energy to help pull along students who aren't so energetic in their learning practice?

Until such a motivation exists, it's probably best to keep grades to push students along and keep the political pressure off assessments. It's very convenient for the Assessment Director to not have to worry about the kind of scrutiny that the registrar requires. But it is a capitulation of sorts.

Tuesday, August 18, 2009

Assessing Away Grades

It's an Assessment Director's dream or perhaps nightmare: to replace course grades with assessments. Further, to have such buy-in for said instruments that students need not even take a course to get credit. This is radical and commonsensical at the same time. If it's the result we care about, and not the process by which it was reached, then why not check for the result right off the bat? We do that with placement tests, for example. There's also the practice of granting course credit for "life experiences," but this conjures diploma mills, and tends to lift eyebrows among the conservative types who think mill work should be properly done overseas if at all.

The conservative types are going to be interesting to watch. Remember when digital cameras came out? I was a hard core darkroom, wet film rolled into the can, fixer mixed from a bag, stink of stop bath kind of guy. It took me a long time to buy a digital camera and put my 35mm Pentax on the shelf for good. It's tough to give up a whole skill set and have to learn a new one (Photoshop, or the free software The Gimp). Imagine what it would be like to change from grades to assessments as an industry. Minds will be boggled.

It could be bad. Assessments can be what they want to be now, more or less. As long as a program makes a credible effort to close the proverbial loop, there's a pat on the back and a smile from the accreditor due to the hard-working Director. But that all changes if the ultimate certification of a course, program, or institution rides on those assessments. One effect could be to squeeze subjectivity out of the process, because it has to be defensible in court if necessary. (I've argued that subjectivity is not a bad thing when considering complex outcomes.) In order to protect the part of one's body most necessary for meetings, instructors would (I imagine) tend to inflate assessments that can be questioned. I'm certain this happens with grades--basic game theory. In any event, it's more visible and probably more stressful job for the assessment director.

It would be good, however, to focus on actual outcomes rather than statistical goo resulting from a formalized melenge of marks. This comes back to process vs. result. In my experience, most grades are a combination of the course's work. This may have resulted from the need to motivate students to do work continuously throughout the semester. It works differently in other countries, of course, where Das Exam may be the single grade a student earns.

Extending the concept of assessment-as-certification, one might imagine that a university would grant a degree without the student ever participating in a class at all, but rather by passing all of the outcome assessments. If these assessments are similar, I presume that this could occur at multiple institutions, so that a resume could record several degrees from different colleges, all obtained at the same time. This is weird stuff from a registrar's perspective--it changes the way higher education works.

Not far from this idea is the related conception of a central agency that does the actual certification. This solves legal problems for an institution regarding validity of assessments, and creates a standardization that would warm the heart of the erstwhile Dept. of Ed. (I'm not yet sure about the current one), and probably accreditors too. I'm sure ETS or ACT would kill to get into that business. Individual departments and programs wouldn't like it much, I imagine--there's nothing left for them to do but play the tune on the sheet.

If the outcome is the thing, the process can be standardized and probably should be; this is the central philosophical fulcrum upon which this debate must turn. Years ago I remember reading about a commercial chemical lab coming up with a formula for creating Scotch imitations that didn't need to be aged twelve years. If it's chemically the same, is it the same? It's difficult to find a reason to say no.

This debate is not "merely academic." Western Governors University uses outcomes assessment exclusively to offer a full catalog of undergraduate and graduate programs. I mentioned in a recent post that it's accredited by the Northwest Commission. It also has NCATE and CCNE accreditations. The process for earning credit is described on the web site:
Our online degrees are based on real-world competencies as opposed to seat time or credit hours. Our focus is on ensuring you possess the skills and knowledge you need to be successful, not whether you’ve attended class or not. ("Show us what you know, not how long you’ve been there.")
Results are valued over process, in other words. I exchanged emails with Tom Zane, who is the Director of Assessment Quality and Validity. I asked about the assessment process and asked his permission to post his response. Here it is:
Western Governors University is a competency-based institution, so we warrant that our graduates possess certain knowledge, skills, abilities, and dispositions. Therefore, it is essential that we measure all aspects of competence that our faculty places into our domains. I am proud to say that our assessment systems are exemplary. We follow the Joint Standards (AERA, APA, NCME) and I’m confident that we have fair and legally defensible assessments. The assessment system is very large, with hundreds of professionally developed instruments of many types. Our assessment department staff builds, administers, scores, and maintains most of the assessments in-house. We use some standardized national exams when the exams appropriately match the domains (e.g., state mandated teacher licensure exams, IT certifications, etc.).
Tom also shared with me a success story:
A single mother of three had been a teacher for several years, but she didn’t have a degree or a license. She was a great teacher. She understood her students, pedagogical strategies, and the content she was tasked to teach. She was slated to lose her job soon after NCLB began. Her local colleges of education told her she would need to attend classes for anywhere from three to five years. Obviously, this was not a good fit for her situation. Instead, she came to WGU. Because she already knew so much, and more importantly, could teach successfully in a live classroom, she was able to skip classes or other learning opportunities and move directly to the assessments to prove her competence. In her program, that meant passing 59 assessments of various types (e.g., traditional exams, performance assessments, portfolio requirements, reflection pieces, a series of 8 live in-classroom performances during student teaching, and others). She was able to hold onto her job, feed her family, and graduate in 18 months.
Finally, here are some technical references for the assessment leaders out there who want to know more about Tom's work at WGU:
  • Zane, T. W. (2009) Reframing our thinking about university-wide assessment. [PowerPoint Slides]. Presentation at the NASPA Annual Conference, (New Orleans, June 10-13).
  • Zane T. W. (2009, April). Tenets of psychological theory that guide performance assessment design and development. [PowerPoint Slides]. Presentation at the annual meeting of the American Educational Research Association, San Diego, April 12-18, 2009).
  • Zane T. W. & Johnson, L. J. (2009, April). Decision precision: Building low-inference rubrics for scoring essays, projects, and other performance-based artifacts. [PowerPoint Slides]. Workshop presented at the annual meeting of the American Educational Research Association, (San Diego, April 12-18, 2009).
  • Zane, T. W. (2009). Performance assessment design principles gleaned from constructivist learning theory [Part 2]. TechTrends, March/April.
  • Zane, T. W. (2009). Performance assessment design principles gleaned from constructivist learning theory [Part 1]. TechTrends, January/February.
  • Zane, T. W. (2009). Guided reflection: A tool for fostering and measuring deep learning of characteristics such as open-mindedness and critical thinking. Paper presented at the American Association of Colleges and Universities Conference, (Baltimore, February 26-28, 2009). {Note: The paper is unavailable because it is under consideration for publication.}
  • Zane, T. W. (2008). Domain definition: The foundation of competency assessment. Assessment Update: Progress, Trends, and Practices in Higher Education, 20(1), 3-4. {Note that the Assessment Update article was published in a special edition that contains multiple articles about WGU.}
  • Zane, T. W. & Schnitz, J. A. (2008). Linking theory to practice: Using performance assessment and reflection to prepare candidates for student teaching and reflective practice. Paper presented at the 14th Annual Sloan-C International Conference on Online Learning (Orlando, November 5-7, 2008). {Note: The paper is unavailable because it is under consideration for publication.}
  • Zane, T. W. (2008). How online programs can assure students attain crosscutting themes: Diversity, reflection, and more. Paper presented at the 14th Annual Sloan-C International Conference on Online Learning, (Orlando, November 5-7, 2008). {Note: The paper is unavailable because it is under consideration for publication.}
  • Zane, T. W. (2008). Large-scale e-portfolio testing program design and management. [PowerPoint Slides]. Presentation at the Assessment Institute conference, ( Indianapolis, October 26-28, 2008).

Open Ed Conference

OpenEd 2009, a conference on open education, just wrapped up in Vancouver. The program can be found here. There are sessions on methods, philosophy, politicking, and the economics of open education. Just picking through the session description, there's fascinating stuff.

Open Education is international. The Dutch have an "emerging national strategy," which you can see and hear about here. The UK is piloting the idea. Most interesting is that the Chinese are subsidizing their universities to create open content. From the description:
[T]he Chinese Ministry of Education has since 2003 been operating a national OCW program called China Quality OpenCourseWare. Chinese universities submit proposals, and can receive between $7,300 and $14,600 per course that is made freely available online. By 2009, there are over 10,000 courses available online, many of these with extensive resources, and video recordings.
It's interesting to note the contrast between this model and simply subsidizing students to attend college. It's like the old saw about teaching someone to fish rather than just giving them a trout. Providing need-based aid, for example, encourages institutions to raise costs. Paying institutions to create content that will be free potentially reduces costs. These overarching strategies play out in the grinding evolution of societal change, with economics playing the role of the grim reaper. I've read that our own Dept of Ed is interested in the openEd idea, but a quick search on their website for "open education" doesn't turn up much of interest.

On the technological front, there is the idea of a Personal Learning Environment (PLE) replacing or augmenting a Learning Management System (LMS). From the Wikipedia page:

Personal Learning Environments are systems that help learners take control of and manage their own learning. This includes providing support for learners to

  • set their own learning goals
  • manage their learning; managing both content and process
  • communicate with others in the process of learning

and thereby achieve learning goals.

This PLE idea seems to be undergoing rapid evolution. Take a look at this summary page of PLE diagrams at edtechpost. It looks like the Cambrian Explosion, 530 million years ago, when multicellular life exploded on the scene. You'll see what I mean if you scroll down the long page full of wild diagrams that try to capture what a PLE is.

(photo courtesy of Wikipedia)

In some ways, the PLE resembles a loose portfolio of a student's work combined with links to social networking communities. See this version (roll over the picture for explanations) for a good look at how this multicellular learning platform looks. I don't see much that refers to assessment, however. There is perhaps another "cell" that wants to look like the Center for Teaching, Learning, and Technology's Harvesting Feedback Project. I wrote about it last here.

It all looks terribly distracting, however, this PLE idea. It seems unfocused to me. Look at the one below, courtesy of D'Arcy Norman.
If my PLE means just the places I go on the web to find interesting stuff, it becomes to general to be of much use. But I digress.

Open education within bricks and mortar is the topic of this session:
[W]e propose that it is important to begin looking at how adopting open course models in traditional universities can offer benefits to both the institutions and the open education movement itself.
This is an interesting idea, partly because it would seem to be inevitable. Higher education isn't really about copyrights and restricted access, at least not philosophically. It's not because of a desire to hold knowledge close to their gowns that the hoary old university model is so poor at distributing knowledge efficiently. Rather, the traditional methods of delivery nearly require a physical presence to work, and this dramatically limits access and increases cost. I would venture to say that most professors would love to have their teaching reach a much magnified audience. You might even be able to talk them into it for free, as long as they didn't have to work any harder. The sticky part is the technology and practices that allow this to happen--both the creation of the infrastructure, policies, and content that are necessary AND the culture and acceptance in the general community that grant the pay off for the efforts. This comes back to the question I asked yesterday: what is the real value (and cost) of standardization through rigid processes and requirements that embody higher ed in the traditional ways of most institutions?

Monday, August 17, 2009

Edupunk and the New Ecology

On July 17, 1969, the New York Times printed a retraction to a 1920 editorial that read in part:
That Professor Goddard, with his 'chair' in Clark College and the countenancing of the Smithsonian Institution, does not know the relation of action to reaction, and of the need to have something better than a vacuum against which to react -- to say that would be absurd. Of course he only seems to lack the knowledge ladled out daily in high schools.
Apparently the moon landing changed their minds about the way rockets work.

Failures of imagination are obvious in retrospect, and it particularly interesting to observe one's own shortcomings in creativity. The virtual ink is hardly dry on my post "Unplanned Obsolescence," where I described an idea for an eBay-like system of matching educational providers and consumers. Now I read that there's an even simpler system, and it already exists!

I remember reading sometime back about an eBay precursor, or mutant variation, perhaps--an evolutionary design that had no chance of surviving the Great Dot Com Crash. The business model for this extinct mercantile species was: buy the nation's junk, store it in great warehouses, mark it up and sell it online. This is like eBay, except that eBay only provides a place to bid. Buyers and sellers exchange money and items sold with no direct intervention from the online auction house. The idea of actually shelling out cash for all that stuff, holding it, and arranging for it to be shipped is obviously inferior. It's a case of simplifying down to the bare necessities.

The article "How Savvy Edupunks are Transforming American Higher Education" raises the question of what is this minimum delivery system in higher ed. Rather than my auction house idea, the article presents the idea of completely open education, where the resources are freely available. In order to have something concrete to talk about, let me assume the best case--that we want to deliver a computer science curriculum online. The students probably have access to the Internet and computers, and are probably comfortable with the technology. This subject, as opposed to something like learning glass-blowing, is very suitable for an open delivery system.

I see the following as the essential ingredients to effective open education. Probably in a month I'll realize how dumb I was and write another post decrying this analysis in favor of even simpler requirements. As Gauss said, to err and err but less and less...

Access to high quality learning resources. By this I mean automated or static materials: books, websites, interactive tutorials, compilers (since we're talking about computer science), editors, debuggers, and other software. This category does not include human resources, which we'll come to in a minute. Information becoming stale is always a problem, so we probably require an altruistic organizer of "best source" learning materials. Something like Wikipedia, perhaps.

A learning community. Even with do-it-yourself materials, there will be difficult patches requiring custom explanation and verification that you are on the right track. This could be provided by other students who are more advanced. If it's to be free, participation in the learning community has to be voluntary, with perhaps "soft" rewards like social networking karma. Here, status is formalized through official kudos of some sort.

Motivation to succeed. We might call this "grit" (see this post). This is the only part that cannot be provided externally, at least until pharmacology catches up. When a traditional college student is packed off to the dorms of his new ivy-bedecked home, there are social pressures that undoubtedly help motivate him to succeed. More so, at least, than if he were sitting at home and had to choose between YouTube and I-Learn, guess which one will win for most. (Take a guess where takes's surprising, and reveals something, here. Is this big web retailer getting into the ed market???).

Motivation is the weak link. I think it might be possible with the right social network apparatus to overcome the inertial tendencies of lone students at their computers. Perhaps something like (shudder) Second Life is the answer.

The new movement for open education is wildly optimistic. Self-described "edupunks" see the Internet transforming higher education for good. From the article:
The architects of education 2.0 predict that traditional universities that cling to the string-quartet model will find themselves on the wrong side of history, alongside newspaper chains and record stores. "If universities can't find the will to innovate and adapt to changes in the world around them," professor David Wiley of Brigham Young University has written, "universities will be irrelevant by 2020."
Some of the Ed 2.0 innovations available right now, cited in the article are (all quotes from the article):

Flat World Knowledge. Advertises itself as "Remixable textbooks by expert authors, free online, and affordable off." Affordable means $30 for hard copy, but downloads are free. This addresses access.

Academic Earth focuses exclusively on scholarly video lectures, and as such also addresses the content access area. Creator Richard Ludow has ambitious plans for the site: "My idea was to first, aggregate this huge critical mass of content disconnected over various sites; second, apply best practices in user interface design and Web standards to do for educational content what Hulu has done [for TV]; and third, build an educational ecosystem around the content."

Peer2Peer University. This social networking site provides an online place where "Would-be students can use the Web site to convene and schedule classes, meet online, and tutor one another; a volunteer facilitator for each course helps the process along." The home page encourages you to think of it as a sort of book club for motivated autodidacts. The seven courses currently on offer are eclectic, ranging from Neuroethics to Cyberpunk Literature.

University of the People. Describes itself as "the world’s first tuition-free, online academic institution dedicated to the global advancement and democratization of higher education." There are two complete programs on offer, in Business Administration and Computer Science. You can see the recommended course sequence for the latter here. I haven't tried to compare it the the ACM standard, but it looks pretty solid to me. Note, however, that UoPeople is not yet accredited and does not confer degrees.

Western Governors University isn't free, but it's cheap: under $3000 for a six-month term, all online. The most interesting attribute to me is that grades and credits are gone. In President Mendenhall's words:
We said, 'Let's create a university that actually measures learning,' We do not have credit hours, we do not have grades. We simply have a series of assessments that measure competencies, and on that basis, award the degree.
And it's accredited by the Northwest Commission. I find this amazing, and will write more about how this works once I've had a chance to research it.

WGU tries to address the motivation problem:
For every 80 students, a PhD faculty member, certified in the discipline, serves as a full-time mentor. "Our faculty are there to guide, direct, counsel, coach, encourage, motivate, keep on track, and that's their whole job," Mendenhall says.
Teaching, assessment, and motivation are separate functions:
What WGU is doing is using the Internet to disaggregate the various functions of teaching: the "sage on the stage" conveyor of information, the cheerleader and helpmate, and the evaluator.
An expanding ecology of evolutionary solutions to the problem of higher education is evident in the list above. This can only grow as more innovations are tried, the successful ones surviving to create new models. Meanwhile, the traditional bricks and ivy institutions lumber along like edusaurus, oblivious to the changing climate. But that criticism is too easy, and perhaps unfair. There is value in the intense social environment of a face-to-face college experience. Moreover, there is the comfort of standardization: despite technological changes, the academy functions pretty much the same way it has for 1000 years. Ultimately the marketplace will sort out what value to place on these bonuses. For the motivated self-learners, for the lifelong learners, there will certainly be lots of options. As the WGU administration would probably agree, it's not what the diploma says that's ultimately important: it's what you can do.

Friday, August 07, 2009

Creating Meeting Discipline

Anybody reading this blog has probably sat in generous number meetings. I'm guessing you haven't had a lot of those meetings I read about where everyone stands rather than sitting, in order abbreviate the proceedings. I've blogged here before about various aspects of meetings, which you can find here.

I like meetings where you walk out with a sense of accomplishment. There's this particular feeling that comes with successful group-decision-making that must be a pale shadow of what it's like to be a node in a hive mind, if such a thing really exists. One of the striking realizations I had in reading My Stroke of Insight was that our brain hemispheres are like two very closely cooperating minds. When the right people and right habits of mind combine something magical occurs. I think I remember first reading about it in Michael Herr's Dispatches, but I can't be sure that's the book. After searching on google books, I couldn't locate the quote. [Edit: it's A Rumor of War, by Philip Caputo, and the quote is here.] Like the rest of my generation, Vietnam was the "last war" and held a certain fascination that led me to read a lot of books about it. In any event, the scene I remember is a platoon leader recollecting the experience of directing his troops in a sweep, and his description of an exalted feeling of the extension of his own body, almost a proprioception, into the men following his direction. This, of course, isn't exactly the same thing as colleagues deciding general education around the table, but sometimes it seems like it.

Sci-fi has lots to say about the idea of group minds (as opposed to group-think, which may be its opposite), and naturally pushes the envelope. See the excellent novel A Darkness in the Sky by Vernor Vinge, for example. Science itself does too, if you will tolerate one more digression before pulling the chair fully up to the table of "what to do about bad meetings."

The term "complexity" became more confusing at some point because it came to mean, in addition to the extant meanings, the study of how systems emerge out of goop. Of course, this is not the technical definition, which you can find here. I've made a lot of hay in this blog with another kind of complexity: computational complexity, but the two are different. John Holland, who works at the Sante Fe Institute now (think Manhattan Project) developed some cool ideas about intelligence. Well, artificial intelligence, anyway, but who's counting. The idea that sticks in my mind is that of competing algorithms (like voices) that sound an alarm when they think they can contribute something useful. If their input is actually valuable, it's rewarded. Otherwise they may be ignored the next time, like the proverbial boy who cried "lupus!" or some other auto-immune disease, I forget. This is very like the members of a team or committee, who have to individually decide when their input is valuable, and slowly accrue or leak social capital with their reputation for effective contributions. With humans, of course, there are many other complexities, like how to speak, what words to use, how to interact with others, and so on. There are lots of things that can go wrong. I think mostly they do, in fact, because the meetings that are exceptionally productive seem few and far betwixt. This certainly cannot be a limitation of the people involved, but one of method, I present to you. But what method? By what cryptic scheme can a meeting be set in order?

I don't know. But, practicing what I preach--viz., that complex problems can be approached through an evolutionary method--I herein propose a starting point. I think the genesis for this was something I read long ago in the C User's Journal or Dr. Dobbs. Or not. Anyway I read about a protocol for conducting a meeting. And I don't mean Robert's Rules of Odor. Nothing is more annoying than the pedantry with which meeting minutes get presented and approved, after which all rules disappear. Nothing against Robert, 'natch. I just don't think the answer is complicated bureaucracy. Douglas Hofstadter created a game out of rules of order whereby one tries to bring the whole system to illogic, a frozen halting state. That's what I think of when I think of rules of order: a computational engine guaranteed to lock up.

No, what I have in mind is more like a game. We may as well call it the Committee Game. Here are the first draft rules. They aren't meant to be comprehensive.
  1. A referee is assigned to loosely enforce the following rules, with a liberal dose of common sense. It would be best if this were the committee chair, at least to start with.
  2. The meeting agenda needs to spell out the level of detail an item will be addressed in. Typically, "tactical" or "strategic" suffice to do this, but you may want "administrative" to talk about office functions or "meta" to talk about the functioning of the committee as a whole. Roll your own.
  3. Someone--probably not a committee member--is tasked to take timings. This consists of watching a second hand and noting how long each speaker talks. Doesn't need to be perfect. Simple statistics are generated from this for feedback later.
  4. At the end of the meeting, if anyone spoke longer than 30 seconds continuously, the longest speaker is fined a buck.
  5. If someone seems to go off topic, the referee should note this immediately by holding up some symbolic object, like a stuffed animal. Once the referee has the floor, he or she asks the group if they really want to take up that topic. If not, the offending member gets the off-topic symbol to 'own' until the next offense.
  6. A particular type of "off topic" is when someone begins to talk about tactical considerations during a strategic discussion or vice-versa. The procedure in #5 should be applied, and the scope (tactical, strategic, meta, whatever) explicitly noted, so that there will be a greater awareness of the level of discussion the committee is engaged in.
  7. Responsibility for being referee should rotate, so as to build a culture where time and effectiveness are valued.
  8. Defer to the chair of the committee when extraordinary measures are required, such as changing the agenda during the meeting, or suspending the rules. The chair is ultimately responsible for setting and executing the agenda, but defers actual meeting discipline to the referee.
  9. Periodically, the committee reviews its performance, considering how well agendas have been executed, the timing statistics, and other general considerations that apply. Improvements to the rules are made as deemed reasonable. Sample questions are:
  • Rate the overall effectiveness of the committee.
  • Are contributors getting to the point quickly enough?
  • Do you feel that decisions are being reached quickly, but with due consideration?
  • Do committee members feel fairly treated?
  • Are deadlines being met?
  • What could be improved and how?
  • Do you enjoy coming to meetings?
Even this level of formality may not be necessary. In practice, I've noticed a marked improvement in the deliberations of one of my standing committees simply by the group acknowledgment that meeting time is valuable. We agreed not to tolerate digressions, for example. It happened anyway, but I noticed that there was an awareness of it: "I know this is a digression...". At the end of the meeting we informally evaluated our own performance. We had accomplished a rather complex task in record time, leaving an hour for a sub-committee to polish the proposal. I think there was a general good feeling about the effort to make our deliberations more efficient and then seeing the outcome of that effort. It's evolution in action, and a pretty thing to watch.

Be More Machiavellian

I'm heading off to an all-day retreat for the deans today. As the Dean of Academic Support Services, I'm invited as sort of the court jester, which suits my personality. This got me thinking about leadership last night, and I retrieved my copy of Machiavelli's The Prince from my car. It's a different translation from what's found here on, but the quotes are similar.

Leadership is a big topic, of course, and I mean to narrow it to the interaction between a leader and his or her direct reports, as we would call them. Machiavelli calls them ministers or servants. The first thing is their selection:
The choice of servants is of no little importance to a prince, and they are good or not according to the discrimination of the prince. And the first opinion which one forms of a prince, and of his understanding, is by observing the men he has around him; and when they are capable and faithful he may always be considered wise, because he has known how to recognize the capable and to keep them faithful. But when they are otherwise one cannot form a good opinion of him, for the prime error which he made was in choosing them.
Second is how to assess their performance:
But to enable a prince to form an opinion of his servant there is one test which never fails; when you see the servant thinking more of his own interests than of yours, and seeking inwardly his own profit in everything, such a man will never make a good servant, nor will you ever be able to trust him; because he who has the state of another in his hands ought never to think of himself, but always of his prince, and never pay any attention to matters in which the prince is not concerned.
And how to motivate them. Note that Machavelli recommends fear over love when necessity requires it, but not hatred. But there he is speaking about subjects. With ministers it is very different:
[T]o keep his servant honest the prince ought to study him, honouring him, enriching him, doing him kindnesses, sharing with him the honours and cares; and at the same time let him see that he cannot stand alone, so that many honours may not make him desire more, many riches make him wish for more, and that many cares may make him dread chances. When, therefore, servants, and princes towards servants, are thus disposed, they can trust each other, but when it is otherwise, the end will always be disastrous for either one or the other.
But bringing ministers into the leader's circle of trust has its own problems, one of which is flattery. Herein lies a dilemma:
Because there is no other way of guarding oneself from flatterers except letting men understand that to tell you the truth does not offend you; but when everyone may tell you the truth, respect for you abates.
Machiavelli's solution is ingenious. First, don't take unsolicited advice:
Therefore a wise prince ought to hold a third course by choosing the wise men in his state, and giving to them only the liberty of speaking the truth to him, and then only of those things of which he inquires, and of none others;
But solicit advice on everything! And listen carefully.
[B]ut he ought to question them upon everything, and listen to their opinions, and afterwards form his own conclusions. With these councilors, separately and collectively, he ought to carry himself in such a way that each of them should know that, the more freely he shall speak, the more he shall be preferred; outside of these, he should listen to no one, pursue the thing resolved on, and be steadfast in his resolutions.
This solution leaves control with the leader, avoids flattery from the masses by not inviting their opinions, and creates a decision-making engine that doesn't invite contempt. The gloomy alternative is also spelled out:
He who does otherwise is either overthrown by flatterers, or is so often changed by varying opinions that he falls into contempt.
How Machiavellian is your leadership?
  1. Do you choose wise and respected subordinates? (yes)
  2. Is your circle of trusted advisors less than eight? (my interpolation here, since eight is the kiss of death for committees) (yes)
  3. Are they committed to your success? (yes)
  4. Do they only give advice when asked? (yes)
  5. Do you ask them about wide-ranging topics? (yes)
  6. Do you get truthful, frank answers? (yes)
  7. Do you take advice from those outside your leadership team? (no)
  8. Do you stick to decisions, once made? (yes)
If you scored a perfect eight, congratulations! You may be qualified to run a small Italian principality. Tryouts for CBS's new reality show "The Prince" are just around the corner...

Thursday, August 06, 2009

Grit, Grades, and Graduation

It's funny how categories affect thinking. Since becoming interested in the noncognitive traits of students, their assessment and consequence predictive powers, I've begun to see noncognitives everywhere. The August 2nd article "The Truth about Grit" in the Boston Globe is an example.

The point of the article is that intelligence doesn't guarantee success; success also requires perseverance in the face of obstacles, or "grit."
The hope among scientists is that a better understanding of grit will allow educators to teach the skill in schools and lead to a generation of grittier children.


The new focus on grit is part of a larger scientific attempt to study the personality traits that best predict achievement in the real world.
The US Army has supported the research with some interesting findings. The example of the US Army's military academy West Point is compelling:
The Army has long searched for the variables that best predict whether or not cadets will graduate, using everything from SAT scores to physical fitness. But none of those variables were particularly useful.
What did work was a survey that assessed perseverance. The article notes that this echoes Francis Galton's 1869 research findings that concluded that a prerequisite for higher-order achievement was “ability combined with zeal and the capacity for hard labour.”

The whole article is worth reading. There is a link given (indirectly) to a grit survey developed by A.L. Duckworth at the University of Pennsylvania. The project is applicable to higher education:
Duckworth has recently begun analyzing student resumes submitted during the college application process, as she attempts to measure grit based on the diversity of listed interests. While parents and teachers have long emphasized the importance of being well-rounded - this is why most colleges require students to take courses in all the major disciplines, from history to math - success in the real world may depend more on the development of narrow passions.
This should be terribly interesting for liberal arts schools particularly. If perseverance is tied to singular passions, then how does this interact with the "broadening of the mind" mission of the institution? Is the implicit goal of success after graduation secondary, or should we try to pull off both? This has direct implications to learning outcomes assessment, as noted in the article:
In recent decades, the American educational system has had a single-minded focus on raising student test scores on everything from the IQ to the MCAS. The problem with this approach, researchers say, is that these academic scores are often of limited real world relevance.
Supposing we took the idea of grit seriously. That would start with using instruments like the one Dr. Duckworth has developed to estimate it. Can we then teach it? Should we?

Dr. Dweck at Standford University is quoted referring to a "growth mindset" versus a "fixed mindset," the difference being what we believe about our abilities. Can we grow them, or are we fixed with them? Fixed mindset learners are more likely to give up when encountering obstacles, assuming they're just not talented enough. Dweck's research, according to the article, demonstrates that the growth mindset can be taught effectively.

Ironically, praising children for their intelligence may make them less likely to succeed because it reinforces the fixed mindset. A better strategy is to reward effort and hard work. I've blogged about this subject before, and Malcolm Gladwell's Outliers is in much the same vein, I believe (I have only read reviews of it to date).

Should we assess these traits? One of the results of reporting assessment results for learning outcomes is that it becomes evident that students can earn decent grades and graduate without scoring high on assessments. If you've been in the classroom any length of time, you've probably encountered this student, whom we'll call Joe. Joe works very hard in your Finite Math class, forming study groups outside of class, turning in all the homework, coming to office hours. But the material just doesn't click with him, and he has a terrible time of coming to grips with the key concepts. Nevertheless, he gives each exam his full efforts and manages to pass with a B. Everyone is happy about this achievement, no? After grades are posted, Joe announces he wants to major in math. You consult with your colleagues and worry together for a bit. Although Joe has worked very hard and earned his B in everyone's eyes, he clearly lacks some spark of imagination that makes doing math rewarding. You fear he's setting himself up for failure.

The point is two-fold. One is that we assess outcomes (subjectively and informally in Joe's case) and we assign grades, but they mean different things. Tied up in grading is the notion of perseverance, of meeting each of the many assignments head-on and getting through them. But in the gestalt there may be something missing; the pieces do not always make the whole. The second point is related: we attempt to estimate the cognitive outcomes with our formal assessments, but generally do not assess the noncognitives. Shouldn't we doing so? If the science quoted is valid, then grit has as much to do with success as intelligence.

If you've browsed my Assessing the Elephant piece, or read enough of this blog (e.g. here), you can guess where I'm going. Why not assess grit along with thinking and communication skills across the curriculum, in a minimally-intrusive survey? It works for the cognitive skills, so there's a chance it will work for the noncognitives. Wouldn't it be fascinating to be able to compare cognitive skills, grit, grades, and graduation rates?

Please note that this gives institutions a way to include students themselves in their assessment reports. My last post was about publicly reporting learning outcomes and results. For example, Capella University's website has a page for Learning Outcomes. This is the future--gaze well upon it, ye assessment directors:
As this practice becomes common, the question will be asked "why doesn't every student get the highest rating?" Is this a fault with our education? At present, we could only shrug our shoulders and perhaps mutter about SAT scores of incoming freshmen, or if accreditation is imminent the glorious plans for improvement we've printed in reports. But if we actually assessed noncognitive traits like grit, we could report out richer details. We might note that the students who failed to graduate also were rated low in perseverance. We could isolate those with the most assessed grit and see how this relates to cognitive development. We could include noncognitives in the curriculum itself and begin to take it seriously, just like effective writing. For liberal arts institutions, the tricky question of balancing broadness with singleness of purpose could be explored with at least some minimal data.

It's an experiment worth doing.