Tuesday, March 31, 2009

A Learning Paradox

Assessing learning isn't just done in the education industry, nor is it even exclusive to biological organisms. The computer scientists have been trying to teach computers to think for decades now. To paraphrase a saying about fusion power, we might say that true artificial intelligence is just 20 years in the future, and always will be. There have been challenges. Knowing a bit about those challenges might inform the kind of wet-brain learning that we try to measure from across our desks in the institutional effectiveness building (What? You don't have a building yet?).

A standard test for machine intelligence is the Turing Test. Alan Turing established some of the fundamental ideas of computer science, helped crack German cyphers in WW II, independently derived the Central Limit Theorem, among other things. He was also persecuted after the war for being gay and ultimately committed suicide by eating an apple laced with cyanide. I highly recommend The Enigma as a biography.

Rene Descartes didn't believe in thinking machines. Quoting him from the Stanford Encyclopedia of Philosophy:
If there were machines which bore a resemblance to our bodies and imitated our actions as closely as possible for all practical purposes, we should still have two very certain means of recognizing that they were not real men. The first is that they could never use words, or put together signs, as we do in order to declare our thoughts to others.
This is essentially the test Turing considered. In 1950 he muses about something the he calls the imitation game:
I believe that in about fifty years' time it will be possible to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 percent chance of making the right identification after five minutes of questioning. … I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted. [source]
First of all, note that in 1950, computers were massive glowing things with very limited capacity by our standards. For Turing to correctly guess that by 2000, hard drives would be measured in Gigabytes is flat out astounding.

The main point, however, is that Turing poses as the test of intelligence an interview. The question is simple: are we convinced that the other side of the conversation is intelligent?

You can try out Turing's imitation game yourself, no doubt running on computers with gigahertz processor clocks. Just browse to jabberwacky.com and start chatting. Below is an excerpt of an argument I had with the thing about what day of the week it is. Its responses are in bold.


There are practical applications to all this. It's not just that we might look forward to mechanical devices that can understand plain language and converse with us, but the Turing test protects us from the annoyances of the bad kind of machine. For example, 'bots' may automatically try to log into this blog and leave a comment with a link to some site the owner wants to advertise. Google tries to prevent that with CAPTCHA--hard to read images that one has to interpret in order to proceed. The one below is taken from the Wikipedia page:
It's easy for humans, but hard for machines to decipher these. This brings us to the learning paradox I advertised in the title: Moravec's Paradox. The Wikipedia describes it thus:
Moravec's paradox is the discovery by artificial intelligence and robotics researchers that, contrary to traditional assumptions, the uniquely human faculty of reason (conscious, intelligent, rational thought) requires very little computation, but that the unconscious sensorimotor skills and instincts that we share with the animals require enormous computational resources.

What the computer scientists would be the hard part was what humans find hard--manipulating symbols and doing logic. What they discovered is that the hard part is actually the kinds of things we learn without much effort--how to talk, walk, see, and judge our surroundings, for example. This is a fascinating result.

An evolutionary explanation is proffered. The idea is that the older the cranial wiring is, the harder the function is to reverse engineer. So sight and visualization, for example, are hard to reproduce with computers because they've been fine-tuned by hundreds of millions of years of evolution, and optimized to the point where it's effortless for most of us to see and understand our surroundings. (Not all--see The Man Who Mistook His Wife for a Hat.)

In terms of higher education learning outcomes, this carries some warnings. Standardization is an algorithmic approach to assessment, for example. This falls into the category of stuff that's easy for computers. Notice that Turing didn't suggest that we give a machine a paper and pencil IQ test. I think few people would be convinced of machine intelligence if all it did was get some vocabulary and logic problems correct (for example). He suggested instead a messy, subjective form of assessment for intelligence. The paradox suggests that some kinds of thinking are very old and likely difficult to assess with discrete methods, while others (the more recently acquired by the species) may be easier because they are more suited to a regular, algorithmic-type approach of judging success.

Here is a an example of a situation that requires intelligence. One might say critical thinking. See what solution you come up with. I got it from here.

According to a news report, a certain private school in Victoria, BC recently was faced with a unique problem. A number of year 12 girls were beginning to use lipstick and would put it on in the bathroom. That was fine, but after they put on their lipstick they would press their lips to the mirror leaving dozens of little lip prints. Every night, the maintenance man would remove them and the next day the girls would put them back. Finally the principal decided that something had to be done.

First think of your own solution, then read the 'official one.'

She called all the girls to the bathroom and met them there with the maintenance man. She explained that all these lip prints were causing a major problem for the custodian who had to clean the mirrors every night. To demonstrate how difficult it had been to clean the mirrors, she asked the maintenance man to show the girls how much effort was required. He took out a long handled squeegee, dipped it in the toilet, and cleaned the mirror with it. Since then, there have been no lip prints on the mirror. There are teachers, and then there are educators.

This story is funny because of the creative solution. I requires a deep knowledge of social norms, human emotion, and game theory to really grok it. This is the kind of thing that computers may be able to come up with eventually, and when they do and can prove it through conversation, we'll perhaps name them intelligent. It's also the kind of thinking that's very hard to assess with an algorithmic approach, for the same reason it's hard to program--it taps ancient and deep resources within our minds.

See Also: More Thoughts on Moravec's Paradox

Friday, March 27, 2009

Reporting Learning Outcomes

If you hang around folks responsible for institutional effectiveness, sooner or later the conversation turns to reporting learning outcomes. This is a topic rich with conflict and therefore could profitably be turned into a reality show or Broadway musical (chorus: assessment begins with youuuuu!). I tried to catalog a few of the conflicts in Assessing the Elephant and on this blog now and again: things like monologue vs. dialogue. But a big part of it is simply the inertial field that surrounds higher education.

The prescription is to assess and improve the delivery of teaching with the goal of improving learning. Sounds simple enough, and indeed it goes on all the time. Getting it documented is the hard part. Having a simple-as-possible system helps, but only partially.

You can demonstrate to yourself that good things are going on by doing the following exercise. Choose a small department or discipline--certainly not the deadly size of eight around the table. Ask them what changes they've made in the last year to improve their program. You'll probably be surprised. Then start asking why they made the changes. Out will pop assessment information (probably informal observations) and goals (although likely implicit). You can have a good discussion discovering the objectives that were implied by their course of action as a discovery process to build a plan. One way or another, the program is supposed to have a list of goals with objectives in a structure similar to this:
  1. Objective General statement of what you want to accomplish
  2. Outcomes Specific statement of how you'll notice if it changes
  3. Assessment Details of how you make observations
This comprises a plan. Then, in the fullness of time, one should get results of the form:
  1. Results The actual assessment results and what you think of them
  2. Actions What, if any, actions were taken
The results and actions are somewhat hard to find on the internet tubes, probably because no one's is perfect. I got permission to post a draft of completed learning objectives reports from programs at Coker College (my erstwhile employer, where I was SACS liaison). You can find this (rather large) file here. Names have been replaced with question marks. Where you see numbers in brackets, these would have been hyperlinked to documents in the archive--in this case based on openIGOR. You can find a manual with more information about the process here.

The FACS reports mentioned under core skills outcomes is the Faculty Assessment of Core Skills, a liberal arts assessment applied across the whole curriculum. The whole story of that is in Assessing the Elephant.

Note that no one is advertising these as perfect examples of learning outcomes and how they should be reported on. But we should tip our hats to the nice folks at Coker College for the willingness to show us what actual reports may look like.

Thursday, March 26, 2009

Disasters of Entitlement

A title is a dangerous thing to have. I had an apocryphal friend once who was talked into being promoted to Director of Learning. The poor guy never had a moment's rest after that--always taking calls from students who didn't feel like they were getting their learning per buck, professors who thought the students weren't learning, and on one occasion, the Secretary of Education demanding a graph showing where the two intersected.

His sin, you see, was to accept a title that was too general. People glom onto titles like a lottery ticket and come to the poorly-titled slob for the cashing in.

Riiinnnnng

Yes?

Is this the Director of International Stuff?

Well, er--

--so here's my problem: my cousin's kid's passport expired while he was out of the country. Oh, and where can I get one of those cool European stickers that will make people think I'm from Estonia?
You get the idea.

The worst possible title is Director of Stuff. No matter how much pay increase they offer, you want to politely decline this offer. Since everything consists of "stuff" in one way or another, you'll be responsible for all that ails the institution without a budget or personal trainer to get you through the rough patches. Of course, very occasionally something useful arrives on your desk as a result of a bad title, but this is the exception rather than the rule.

When my title was Assistant Professor of Mathematics, I got a detailed letter from an inventor showing how his formula to generate the digits of pi could be used as a data compression algorithm, and thereby produce a dialing system so that everyone on the planet could have a one-digit phone number. You can see how useful that would be.

But most attention generated from a too-general title is just annoying, and wastes time that could be more productively spent sipping a latte.

The solution is to ask for an obscure and specific title. I've been stuck with Dean of Academic Support Services for the last few months (a title I've learned that I shouldn't abbreviate), but when I can get away with it I announce myself as Dean of Exotic Plumbing. I think you can see why. Aside from a leaky water fountain request, I get little extra attention from the switch. On the other hand, it has completely cleared up a whole range of other phone calls I used to get about the library having caught afire and other such distractions.

So for your use, I have included a selection of useful suggestions for new titles to replace a couple of common ones that are too general.

Instead of Library Director, use Director of Dust Mite Retention. Rather than Chief Academic Officer, Vice President for Rodentia is a big improvement. And of course, President should become something like Senior Assistant to the President for Undetermined Tasks.

This last one is particularly clever because the President can claim only to take tasks from himself, and then only if they are undetermined. As an exercise you can see how to apply that self-referential concept to your own title, and forever insulate yourself from the mission creep that happens from being in charge of stuff.

[Also: The Secret Life of Committees]

Wednesday, March 25, 2009

Where are the error bars?

False certainty is a devil or an angel, depending on whether or not you're in for a surprise. If you assume the world won't end today, life is a lot easier--no messy running to the store to stock up on staples and digging a cellar for the winter. On the other hand, that certainty is almost always misplaced to some extent (solar explosions, calderas, accidentally dropped nukes,...). Still, we'd all be nervous wrecks if we hedged our bets on every conceivable contingency. Much better to be certain about most things and accept a few surprises. You can think of this as a kind of low-pass filter that removes the noise of everyday probabilities.

Sometimes, however, we go to far. We apply categories like 'provisional student' that carry expectations and assumptions. These may or may not be valid, and it's in our best interest to make some of these conversations more complex than they will be with our artificial certainty.

I've argued before that intervention, for example, is better than prediction. If a student is having trouble, we can see that in performance. But predicting whether or not that will happen is fraught with uncertainties. Nevertheless, many institutions categorize students as 'risky' and put them in special programs, ignoring the fact that they're probably right only about half the time (in my experience), and that there are many other students who should be in those programs, but don't get the same category applied.

Even more subtle is the effect of false certainty on learning outcomes. A casual reader of advertisements for products like the Collegiate Learning Assessment (CLA) will be impressed by the certainty implied*.
[A]fter testing of seniors in the spring, you receive a full Institutional Report that evaluates your school's value-added on a comparative basis. Testing every year allows you to measure for effects of changes in curriculum or pedagogy.
and
Performance-based assessments are anchored in a number of psychometric assumptions different from those employed by common multiple-choice exams. As such, initiatives like the Collegiate Learning Assessment (CLA) represent a paradigm shift, the technical underpinnings of which remain unfamiliar to many faculty and institutional researchers.
There's not much evidence of modesty in these claims. To be fair, this is pretty typical even for conference presentations. We may put literal error bars on some statistical data (although even that seems to be rare), but there's little discussion about the actual uncertainty inherent to the process of assessing learning. The most powerful kind of validity information would be predictive--what do your test results say about future performance in some realm that we care about?

In practice, the value of learning assessment is that it generates a useful dialogue that potentially produces improvement and innovation. Unfortunately, that's not what the grand overseers of higher education want--rather they desire the artificial certainty of a real metric: an educational gas gauge.

So accreditors and administrators fuss around trying to produce certainty. Now that there's a huge economic and status incentive to produce the stuff, certainty in learning outcomes measurment is cranked out like sausages. There's also little defense against the "show me" argument, so as a profession we run around in circles manufacturing certainty where it doesn't really exist. We employ a positivist-type philosphy of definition/observation/classification but only in the sense that a cargo cult builts an airplane. That's a bold claim, so allow me to elaborate.

An example of this latter assertion is the way rubrics are constructed and used. In the interests of creating a kind of measurement of learning, standard divide-and-conquer reductionism is employed. To assess writing samples, we might use a rubric that contains several elements including correctness and style. These would be defined in levels of increasing accomplishment, perhaps from 'underperforms' to 'overperforms' on an ordinal scale. There seems to be an overwhelming certainty that this approach is a good one. As I mentioned, the results could indeed be wonderful for sparking a dialogue between professor and student or among professional colleagues as part of an ongoing program review. But as real measurement it fails miserably. The complete catalog of reasons why will have to wait for another day, but one suffices.

Not everything in the world comprises linear combinations of other things. To create a 'composite' or 'holistic' score from a rubric, the individual dimensions must be combined somehow. Sometimes the scores for correctness, style, and so forth are simply summed up to give a total. This total is the 'measured' quality of writing. Implicit in this construction is the idea that if we get the details right, then a picture of the whole will emerge naturally. This is a nasty little enthymeme; in this case false certainty is the devil in the details.

For some odd reason, subjective assessments of learning are discouraged in the big picture, but accepted without comment at the reduced level. That is, the assessment of the elements of the rubric is almost always subjective, but subjective assessments at a larger scale of resolution is met with raised eyebrows. I know this because when I present the Assessing the Elephant model at conferences, I get amazed looks. You can get away with assessing a student's performance as a whole??? This contradiction goes largely undiscussed.

What is the predictive validity of rubric use? If we're going to use linear combinations of subjectively rated sub-skills as a 'measurement' for some whole, how do we know we're getting the right answer? If we are, there should be a natural test. I have my own ideas about what this natural test is, but there ought to be a debate on the question. The misplaced certainty that current 'measurement' practices have a sound epistemology is damaging because it can easily create a new phrenology. It already has.

The reality is that whatever philosophical mish-mash is employed to produce numbers, these numbers are often taken seriously, and become a monological definition for learning. This applied definition may have little to do with the common (dialogical) definition. This is a recipe for a very expensive irrelevance. We need less certainty. I'm sure of it.


*I was amazed to see that a Google search has a hard time even finding an official CLA website. I ended up using a cached one.

Tuesday, March 24, 2009

Testing and the Monologue Problem

Assessment of learning is usually done as a monologue. I mentioned this idea a couple of days ago ("Generalized Silliness"), and it's explored more fully in Assessing the Elephant. A monologue in this context is any standard or definition that is rigorously applied to a subject, like holding up a ruler to a banana to see how long it is. This practice is monological in the sense that the banana doesn't get to negotiate or appeal the decision, nor are any factors not contained in the ruler's definition of length that are considered. For example, the banana cannot argue thus:
Dear ruler, you have underestimated me. For if I could just uncurl a bit, you'd see that I am a good two inches longer than you have measured.
That would begin a dialogue, which is not permitted in the example given. More generally, monologues have two basic limitations: they do not permit perspectives outside the original definitions, nor are the adaptive: they cannot take new information into account and change their definition. Examples are all around us.

In sports like basketball, time is carefully meted out in a monological fashion. Time on the scoreboard has some relation to a fan's subjective experience of the temporal flow, but is an artificial 'official' version that the teams had better adhere to if they want to win.

Most tests of education outcomes are like this. Not all, of course--things like eportfolios can be wonderfully complex and are good at producing dialogue (if one wishes to do that). But if you think of the typical testing setting, it consists of students interacting with a pencil and a piece of paper. This is a monologue. Talking back to the test does little good, although I've had adult students write prayers on math finals before. But celestial appeals for the power to solve quadratics aside, a math test is pretty much a one-way conversation.

Note that this is unnatural. In most endeavors with humans, we talk or otherwise correspond to find out things.
* Know any good restaurants around here?
+ Sure--there's an Indian place on Central, about three blocks.
* Anything closer? I'm in a hurry.
+ Oh, well--there's a sub joint just around the corner.
Compare that to the prompt:
Q1) List all eating establishments within three blocks and identify type of food and approximate price for each.
The first example is a dialogue; the second is a Google maps search.

Monologue is good for testing things like 2+2=4. (The proof of which is very complex by the way!) Low complexity tasks and completely algorithm ones are good candidates for assessment by monologue. Solving differential equations, reciting facts about the Civil War (an oxymoron if there ever was one), and so forth may as well be done in a monologue. But for more complex types of learning, it becomes a limitation. Consider the following example.
At the campus of Assess U., seniors are expected to get up early one Saturday in order to go take a standardized test of their thinking skills. This will be a three hour session with pencil and paper. They will never see the results of the test, but they are advertised as being important to judging the effectiveness of the general education curriculum or something. Stanislav's alarm goes off at 7am and he considers his options. Having had a course in Game Theory, he quickly realizes the optimal solution for him is to simply sleep in. What are they going to do--not let him graduate?
In the example, Stanislav does some serious critical thinking while denting his pillow, and decides that taking a test on critical thinking is not in his best interests. His professors, who all know he's great at solving problems, and would recommend him to any employer or graduate school, have no voice on the assessment either--by demonstrating the skill tested by the assessment, Stanislav's 'measurement' completely vanishes from the sample.

The example is tongue-in-cheek, but it does illustrate the point that only 'official' kinds of thinking skills are going to be seen by the test. This list of certified ways of demonstrating skill or knowledge is much more limited in a monlogue than it is in a dialogue. Think of the game of twenty questions, where you try to guess an animal based on yes/no questions. With 20 questions, one can exhaust up to 2^20, or about a billion possibilities (if it's perfectly binary). With a monologue, you can only explore 20. Think about that--with only 20 questions, a dialogue is about fifty million times more discerning. That's because a monologue cannot adapt from one question to the next.

Examples of dialogue-based assessments include interviews, oral exams, conversation, writing with revision, portfolios (potentially), and group work of various types. These are the rich interactions that we use routinely to assess the characteristics of the other humans around us. Formalizing these dialogues (but just barely) is a great way to get authentic assessment data that isn't skewed by the 'official' monologues and that can adapt to new circumstances. In the end, what we assess and how we assess needs to evolve in order to improve. Dialogues are essential to this process, which explains the vapor lock that often occurs when an assessment committee confronts the results of a standardized test and suddenly is faced with having to make something of it.

Monday, March 23, 2009

Assessment Blogs

I created a page just for assessment blogs. It's on the left nav bar of this page, and can be found at zzascape.com/assessment.html.

As part of it, I included a Chatango chat widget. I used to use Gabbly, but it's overrun with advertising.

Saturday, March 21, 2009

Creating Genius, Part II

Last time, I commented on Richard Hamming's talk to Bell Labs in 1986 entitled "You and Your Research." Let's continue.

He has some interesting things to say about creativity--that mysterious source of new ideas. He gives some advice that will seem familiar to any mathematician:
If you are deeply immersed and committed to a topic, day after day after day, your subconscious has nothing to do but work on your problem. And so you wake up one morning, or on some afternoon, and there's the answer. For those who don't get committed to their current problem, the subconscious goofs off on other things and doesn't produce the big result. So the way to manage yourself is that when you have a real important problem you don't let anything else get the center of your attention - you keep your thoughts on the problem. Keep your subconscious starved so it has to work on your problem, so you can sleep peacefully and get the answer in the morning, free.
This seems to be a common phenomenon. For me personally, I've had this experience many times when programming or doing research intensively. When it happens, there is a real sense of breakthrough--of understanding something you didn't before. Not every time is it sustained--sometimes my subconscious gives me the wrong answer. But more often it's correct, and mysterious about how it got there.

Seeding creativity is important. "You can't always know exactly where to be, but you can keep active in places where something might happen." Hamming means looking out for the important problems.
If you do not work on an important problem, it's unlikely you'll do important work. It's perfectly obvious. Great scientists have thought through, in a careful way, a number of important problems in their field, and they keep an eye on wondering how to attack them.
Part of the seeding comes from having an open door, Hamming says, both as metaphor and literally. Tolerating distractions is the price of staying in the conversation.

Taking the big view is also important.

You should do your job in such a fashion that others can build on top of it, so they will indeed say, "Yes, I've stood on so and so's shoulders and I saw further.'' The essence of science is cumulative. By changing a problem slightly you can often do great work rather than merely good work. Instead of attacking isolated problems, I made the resolution that I would never again solve an isolated problem except as characteristic of a class.

Now if you are much of a mathematician you know that the effort to generalize often means that the solution is simple. Often by stopping and saying, "This is the problem he wants but this is characteristic of so and so. Yes, I can attack the whole class with a far superior method than the particular one because I was earlier embedded in needless detail.'' The business of abstraction frequently makes things simple.
This resonates with me, and that is probably a sign of my own peculiar weakness. I don't like complicated details, doubly so when they seem irrelevant to the problem at hand. My poor brain is always looking for the simplest way to do something. Sometimes that's a good idea, and sometimes not. Remember that generalization is a type of inductive reasoning. Some characteristics of such modes of thinking are that:
  • It is creative--there is no general procedure for producing general procedures
  • There is no guarantee of success. It takes persistence to get anywhere in this trial and error process.
  • It requires a knowledge of the analytical tools necessary to check to see if your solution is correct. If you don't know right from wrong, you won't get far.
For many types of problems, this means adopting or creating a philosophy that makes sense--one that can incorporate the peculiar nature of the data and processes you're working with. For example, Darwin's Big Idea was that there had to be something in the reproductive process that acted like genes. He didn't know about genes, and it gave him fits because the evidence wasn't there (Mendel's work wasn't well known until later). But he arrived at the right philosophy. Leucippus and others figured out a good general way of thinking about matter as atomic.

Think about your curriculum. Think about the big problems you deal with in your job. How often do you go back and think about the big picture--the most general way of thinking about these issues? In the curriculum, we typically immerse students in detail and fail to emphasize philosophical underpinnings (in my experience). As an example, a liberal arts curriculum is typically a list of stuff. Writing, quantitative skills, etc. are assembled in registrar's lists with the assumption that some whole comes from the parts. What is this gestalt? Is it to enable students to have a grounding in the big problems of our time? To give a personal answer to the meaning of life? To have the technical means to begin to think generally (inductively)? To prepare them for a major or for the job market? Or just to fulfill accreditation requirements? The big WHY of gen ed easily gets swamped in details and lost. Even if we pay lip service to a mission statement, it's very difficult to actually integrate that philosophy into course work and classroom activities to instantiate it.

Alter the problem. Instead of asking how much Stanislav learned, ask how much do we think Stanislav learned. The first problem is probably impossible to answer; the second is trivial. And yet, the second is the more important question. When Stanislav goes out into the big wide world, his supervisor, teacher, or mentor will form an opinion about Stanislav's capabilities and act on that opinion, not some standardized test score, however accurate it is advertised to be.

Hamming concludes with an Apollonian "know thyself."
If you really want to be a first-class scientist you need to know yourself, your weaknesses, your strengths, and your bad faults, like my egotism. How can you convert a fault to an asset? How can you convert a situation where you haven't got enough manpower to move into a direction when that's exactly what you need to do? I say again that I have seen, as I studied the history, the successful scientist changed the viewpoint and what was a defect became an asset.
The idea that defects and limited resources produces creative solutions was mentioned in the first part of the article. Perhaps these limitations spawn conditions to generalize and ask WHY, simplifying and redirecting us to the philosophical underpinnings of the problems to be solved.

Friday, March 20, 2009

Creating Genius, Part I

In March of 1986, Richard Hamming gave a colloquium seminar talk to Bell Labs scientists. You can find the text of the talk, which was entitled "You and Your Research" here. I think it holds valuable insights for anyone seeking answers. That's all of us, right?

Hamming is famous for his results in communications (e.g. Hamming Codes), and he had the opportunity to work with and observe some of the greatest scientists of the 20th century. He relates that working at Los Alamos with the likes of Richard Feynman made him envious. In his words, "I saw quite a few very capable people. I became very interested in the difference between those who do and those who might have done."

This is a question that's in the air, of course. The recent book Outlier by Malcolm Gladwell takes up a similar question, and I've blogged about the topic in the past. Hamming's remarks have special weight because he did first-class work himself.

There are several themes that Hamming employs. On confidence and courage:
Our society frowns on people who set out to do really good work. You're not supposed to; luck is supposed to descend on you and you do great things by chance. Well, that's a kind of dumb thing to say. I say, why shouldn't you set out to do something significant. You don't have to tell other people, but shouldn't you say to yourself, "Yes, I would like to do something significant.''
It's not really about luck, because people who do good work often go on to do more good work. Think of Feyman or Einstein or Darwin or Newton... I'm focusing on scientists because that's what Hamming is directly addressing, but you can create your own list in other fields of endeavor.

His second point is interesting because it speaks to the whole idea of assessing learning outcomes, if obliquely. Hamming says there are different ways of measuring brains. As an example, he talks about an employee:
I met him when I was working on a problem with John Pierce's group and I didn't think [the employee] had much. I asked my friends who had been with him at school, "Was he like that in graduate school?'' "Yes,'' they replied. Well I would have fired the fellow, but J. R. Pierce was smart and kept him on. Clogston finally did the Clogston cable. After that there was a steady stream of good ideas. One success brought him confidence and courage.
The ability to do good work is more than the cognitive skills we typically try to measure, in other words. We probably ignore a lot of what's important when we focus on narrow learning outcomes, rather than considering a more holistic approach.

Other factors that Hamming cites as important are age, and working conditions--bad ones are good because they spark innovation! Embedded in this rich piece is a theory of learning as an exponential process:
"Knowledge and productivity are like compound interest.'' Given two people of approximately the same ability and one person who works ten percent more than the other, the latter will more than twice outproduce the former. The more you know, the more you learn; the more you learn, the more you can do; the more you can do, the more the opportunity - it is very much like compound interest. I don't want to give you a rate, but it is a very high rate. Given two people with exactly the same ability, the one person who manages day in and day out to get in one more hour of thinking will be tremendously more productive over a lifetime.
This is very similar to the conclusions that others have come to (see Gladwell's book, or my previous blog posts citing other studies). But effort and drive are not enough--it's key that they must be directed. A darker side of all this is that other things have to be neglected in order to accomplish it.

Tolerating ambiguity and looking out for evidence that contradicts your current understanding is another recommendation from Hamming. He quotes Darwin on the topic, saying that "Darwin writes in his autobiography that he found it necessary to write down every piece of evidence which appeared to contradict his beliefs because otherwise they would disappear from his mind." Stephen Jay Gould used this idea as a foil in his books, often referring to the "exception that proves the rule."

[to be continued]

Wednesday, March 18, 2009

FUD

Fear, Uncertainty, and Doubt--the bread and butter of consultants, charlatans, and news organizations. There's plenty to go around in the new Higher Education Opportunity Act (HEOA) of 2008. A colleague got my attention by forwarding an email flier from this group, which sells online seminars on higher ed. From the flier:
The buzz is that the HEOA requires very costly and very complex changes to online programs, and that it’s going to create major challenges for the people who run them.

In this case, the buzz is absolutely correct.

The HEOA is going to wreak havoc in online learning, particularly in the areas of academic integrity and student authentication.
Online courses are not a big part of our business currently at my home institutions, but we should probably think about summer classes delivered through distance learning. That project has been in the back of my mind, so I furrowed my brow at the language in the flier. Before shelling out $229 to see what this group has to say, I thought I'd see what the HEOA actually says. Here's the part about student authentication.
[T]he agency or association requires an institution that offers distance education or correspondence education to have processes through which the institution establishes that the student who registers in a distance education or correspondence education course or program is the same student who participates in and completes the program and receives the academic credit. [Section 496]
So what kind of process is required to establish identity? I figured that Educause might be a good source for information on this. Indeed, they've written about it already here. I took the title of this article from their reference to "FUD" surrounding this issue. Apparently, some wild misconceptions were spawned by no less than the Chronicle. Educause quotes their article:
Tucked away in a 1,200-page bill now in Congress is a small paragraph that could lead distance-education institutions to require spy cameras in their students' homes.
This references the paragraph in the HEOA quoted above. As the Educause article explains, however there are two rather large considerations that obviate the more drastic assumptions being made, and hardly merit beating the drums for. First, the requirement is for accreditors, not schools, meaning that this requirement will likely become part of the accreditation list of 'to-do's next time it rolls around. More importantly, a conference report from the HEOA actually spells out what authentication means: currently it means having a unique ID and password. That's it. The North Central Association (accrediting agency) endorses that line here (pg 8-9)
The Joint Conference Committee of Congress and the U.S. Department of Education have confirmed that initially institutions may use simple efforts already in use at most institutions to verify the identity of their students. Such efforts may include the use of IDs and passwords. This amended policy reflects this current understanding; however, as time progresses, better processes for verifying the identify of students come into existence, and final regulations develop, this policy may need to be updated. The Commission hopes that the Department will follow closely to the language of the statute in its final regulation thus presumably allowing agencies and institutions some latitude in determining what method of verification best suits an institution’s mission and purposes. In the meantime, institutions should be examining more sophisticated approaches to verifying the identity of their students and making plans to incorporate such approaches in their distance and correspondence education.
You can find more analysis and opinion on this blog.

The larger lesson, I think, is to beware of the FUD. It's like stress for an individual. A little can loosen the bonds of established practice and prompt higher productivity. Too much can lock the system up and send you out shopping for snake oil.

Update: See this more recent post.

Tuesday, March 17, 2009

Generalized Silliness

I often use the Indian story made famous by Godfrey Saxe about blind men from Indostan who encounter a elephant and try to describe it. The intent is to show how assessing anything complex can depend heavily on one's point of view. The moral of Saxe's poem is:
So oft in theologic wars,
The disputants, I ween,
Rail on in utter ignorance
Of what each other mean,
And prate about an Elephant
Not one of them has seen!
This was my first impression of the debate about student learning outcomes of a complex nature, like 'critical thinking.' A March 16, 2009 article in The Chronicle nicely illustrates a "elephant problem," that of assessing writing. The article's introduction is thus:
Writing instructors need to do more to prevent colleges from adopting tests that only narrowly measure students' writing abilities, several experts said last week at the annual meeting of the Conference on College Composition and Communication.
I've used such simple measures myself. An administrator was desperate for a quick solution to assessing writing quality, and had the English faculty come up with a simple exercise to test all of our students. Each was given a paragraph with errors to correct. The errors ranged from their/there sort of things to incorrect parallel structure. Several faculty members scored the results, including myself. One notable phenomenon was that students responded in ways that showed the were trying to answer the 'correct' way and not the natural way they spoke and wrote. A good example in the "real world" is The Clash song "Should I stay or Should I go?", which includes the line:
The underlining is courtesy of MS Word's grammar checker. There's actually a website praising the band for their correct use of grammar. I've scratched my head trying to figure out who's right in this case, but what's clear is no one would actually talk that way unless they were trying to be 'correct.' In Assessing the Elephant I write about what I call monological definitions--those imposed by an outside system and not subject to debate. Grammar rules are like that. So people act differently when they think they will be judged on the 'right' answer rather than the one they'd normally give. This is bound to bias the results on any simple test like this. This is just one tiny example of how an artificial test like this leaves much to be desired when assessing writing. Even simply measuring a student's ability to write correctly--the simplest possible assessment--is fraught with problems.

As an aside, I have to relate my favorite response on the exercise I described. The paragraph the students were asked to correct was on the topic of life at the south pole. One of the sentences was something like "The fish living under the water of the arctic are adapted to the near-freezing conditions." This student circled "under the water" and penciled in "where else would the fish live?" Indeed.

According to the quoted article, finding more sublte forms of assessing writing is a problem needing a solution:

Charles Bazerman, the organization's chair, said in an address to members that the move toward formal writing assessment is "inevitable," and that writing instructors need to furnish the evidence and the expertise to help improve how students are judged.

"It is up to us to provide more subtle forms of assessment," Mr. Bazerman said.

If "subtle" doesn't include authentic, I think this will be a very difficult proposition.

The article goes on to describe how writing assessment was implemented at one university, in an ad hoc manner (emphasis added):
Initial designs for the new assessment, put together by administrators at the last minute, were met with "confusion, ignorance, even outrage" among the writing faculty
This illustrates a central conceit about assessment in general, viz, that it's easy and can be done in straightforward and obvious ways. Heck, even an administrator without training or knowledge of the field can do it (my extrapolation). Contrast this from the faculty's stated opinion, taken from the article:
[G]rowth in writing is complex, slow, often nonlinear, intrinsically contextual, and not necessarily immediately visible.
Context is key to authenticity. Writing a lab report is different from writing a poem, which is different from writing a mathematics article. Experts in those areas can judge writing samples and rate them under the right conditions, but just because Stanislav can write a decent math paper does not mean that he can write a good review of a lit crit article. This is so obvious that it hardly bears mentioning, except that this kind of mistake gets made over and over.

I've used the following example before, but it serves well to illustrate the point. A supervisor is generally capable of judging whether employee Stanislav does a 'good job' at executing his responsibilities. Can we therefore make the leap to assume that there is a general test we can construct for someone doing a 'good job'? This test would have to work equally well for, say, a nuclear engineer, an airline pilot, a stock trader, and a science teacher. Obviously not. I had a math professor who called the subject of mathematical Topology "generalized silliness." That's a good description of what a general test of 'doing a good job' would be. Ditto for a general test of 'good writing' or 'critical thinking.'

The power of induction (generalizing from examples to create categories and rules) is powerful and easy to misuse. Just because there is 'writing' and there are 'tests' and there is 'exellence' does not mean that there are general tests of writing excellence. To assume that such a thing exists and proceed on that basis is, well, silliness of a general kind.

Friday, March 13, 2009

Curricular Networks

I wrote the following report years ago, and stumbled upon it. Most of us are probably used to the idea of curriculum mapping as a way to find out what gets taught where, but there is another way to make a map of the curriculum. I have edited it somewhat.


1. Introduction

Course prerequisites ideally keep students from taking classes they are unprepared for. As a logistical side effect, they theoretically limit enrollment in classes with prerequisites. Taken together, prerequisites form a network of courses that theoretically describes the curriculum in terms of logistical flow. We will look at this network as it exists in theory and compare it to some actual data.

There is a wide variety of types of prerequisites, but they fall into two categories that may be combined with a conjunction.

AND Requirements

These simply list a group of courses that must be taken prior to the target course. Each listed course restricts further the potential enrollment for the target.

OR Requirements

In the most general description, these are requirements to take a fixed number of courses from a list. In the simplest case, it’s of the form Take ABC-101 or ABC-102 or ABC-103. In the more complicated form, they may require more than one course from a list, as in Take 2 courses from ABC-101, ABC-102, or ABC-103.

All of the prerequisites that currently exist are described in our registration database as being one or the other of the above types, or a combination of the two conjoined with AND.

2. Method of Analysis

Course prerequisites are described in a systematic way in the administrative database. These were obtained in electronic form and parsed. Student transcripts were then used to count how many students satisfied the requirements for a class and how many actually took it. This process is shown in the schematic below.

In the first step, a symbolic logic representation of each prerequisite tree is generated. Some of these get quite complicated, and in fact there were a few that were infinitely long because of ‘looping’ prerequisites in the Dance curriculum. A few such prerequisites were artificially truncated in order to let the computation finish within the lifetime of the universe.

As an illustration, here’s the recursive logic for BIO 211:
BIO-211=(MAT-100*MAT-101*BIO-110*MAT-100*MAT-101*BIO-101+MAT-100*MAT-101*BIO-110*MAT-100*MAT-101*MAT-100*MAT-101*BIO-110*BIO-111+MAT-100*MAT-101*BIO-101*MAT-100*MAT-101*MAT-100*MAT-101*BIO-110*BIO-111+MAT-100*MAT-101*BIO-110*MAT-100*MAT-101*BIO-102+MAT-100*MAT-101*BIO-101*MAT-100*MAT-101*BIO-102+MAT-100*MAT-101*MAT-100*MAT-101*BIO-110*BIO-111*MAT-100*MAT-101*BIO-102)*(MAT-100*MAT-101*BIO-110*MAT-100*MAT-101*BIO-101+MAT-100*MAT-101*BIO-110*MAT-100*MAT-101*MAT-100*MAT-101*BIO-110*BIO-111+MAT-100*MAT-101*BIO-101*MAT-100*MAT-101*MAT-100*MAT-101*BIO-110*BIO-111+MAT-100*MAT-101*BIO-110*MAT-100*MAT-101*BIO-102+MAT-100*MAT-101*BIO-101*MAT-100*MAT-101*BIO-102+MAT-100*MAT-101*MAT-100*MAT-101*BIO-110*BIO-111*MAT-100*MAT-101*BIO-102)*(MAT-100*MAT-101*BIO-110*MAT-100*MAT-101*BIO-101+MAT-100*MAT-101*BIO-110*MAT-100*MAT-101*MAT-100*MAT-101*BIO-110*BIO-111+MAT-100*MAT-101*BIO-101*MAT-100*MAT-101*MAT-100*MAT-101*BIO-110*BIO-111+MAT-100*MAT-101*BIO-110*MAT-100*MAT-101*BIO-102+MAT-100*MAT-101*BIO-101*MAT-100*MAT-101*BIO-102+MAT-100*MAT-101*MAT-100*MAT-101*BIO-110*BIO-111*MAT-100*MAT-101*BIO-102)*(MAT-100*MAT-101*BIO-110*MAT-100*MAT-101*BIO-101+MAT-100*MAT-101*BIO-110*MAT-100*MAT-101*MAT-100*MAT-101*BIO-110*BIO-111+MAT-100*MAT-101*BIO-101*MAT-100*MAT-101*MAT-100*MAT-101*BIO-110*BIO-111+MAT-00*MAT-101*BIO-110*MAT-100*MAT-101*BIO-102+MAT-100*MAT-101*BIO-101*MAT-100*MAT-101*BIO-102+MAT-100*MAT-101*MAT-100*MAT-101*BIO-110*BIO-111*MAT-100*MAT-101*BIO-102)
This is by no means the simplest way to express the proper relationship, but it has the advantage of being completely machine generated and can be processed by existing algorithms normally used for teaching computer architecture courses.

Once the logic is generated, a ten year range of transcripts was processed to compute how many students during a semester had satisfied the prerequisites for courses offered that semester. It was a simple matter to count up how many actually had taken the course, so then comparisons can be made. Both recursive and non-recursive methods were used, and ultimately I decided that the non-recursive ones were the most useful. Basically that means that a student is counted as having satisfied the requirements for a course if he or she meets the overt requirements, but not necessarily the sub-requirements. The long process of working through decisions like this one boiled down to finding an appropriate metaphor for the curricular network. Ideas that seemed reasonable, but were ultimately discarded were modeling the requirement tree on a digestive system (yuck), a river (too simple), an electrical network (no way to include the complicated ‘or’ prerequisites), a quantum mechanical system (no one would understand it), pure probability (leads to the computation of huge correlation matrices), and multi-valued logic (problems with commutivity). I finally settled on using the model of a rather simple computer chip. You could, in fact, burn the course requirements onto a piece of silicon (excepting the Dance department, with their infinite loops).
The diagram above shows an imaginary set of course requirements for ABC-300, implemented with logic gates. The one for the biology example above could be simplified to fit on one page, but would be more work than it’s worth.

3. Historical Data

It would be natural to expect that the more severe the prerequisite restrictions, as reflected by the number of students who qualify to take a given target course, the fewer students would actually take it. In other words, one would expect to see the emergence of a straightforward linear relationship with nice correlation. The truth is a lot stranger. The scatter plot below summarizes five years of data.

Each dot is an individual course type taught during a semester. So all HIS-210’s for a semester are lumped together, for example. The vertical scale shows how many students during that semester met the requirements for that course, according to their transcripts (transfer courses were NOT included, which must be taken into account when drawing conclusions). The lines at the top are actually accumulations of dots that represent courses with no prerequisites, so all students qualified. The horizontal scale shows how many students actually took a course type (all sections within a semester being aggregated).

The data has a correlation coefficient of .2, which means it isn’t particularly linear, an obvious conclusion one can draw from the picture. The conclusion that does force itself upon us, just from inspection, is that there are essentially two types of courses. Those with no prerequisites (top of the chart) and those with prerequisites (bottom of the chart). If we only consider the group at the bottom, we get a correlation coefficient of .4, which is a stronger indication of linearity than before, but still not very compelling (it would need to be near 1).

Since many prerequisites are of the AND type, one would expect demand to drop exponentially with each additional requirement, which is a strong hint that we should look at the graph with a log scale on the vertical axis.
Now we can clearly see the ‘funnel effect’ of restrictions on class entry. The relationship isn’t a simple linear one, where fewer restrictions equals more enrollment, but rather sets the upper limit on class enrollment statistically. It’s easy to see this boundary where the dots stop toward the right of the graph at each level.

Of course, there are many factors that determine why a student takes one course rather than another. One would guess that a main consideration is the student’s major. This is borne out in an analysis of class correlations. The chart below shows a matrix of correlations between classes taught over a ten year period. Courses are in alphabetical order, which has the advantage of grouping together ones from the same discipline.
A perfectly distributed curriculum, where every student was equally likely to take any given class, would give a perfectly uniform matrix. The ‘clumpiness’ visible above is evidence of a high degree of selectivity. The dot size, ranging from specks to balloons, is indicative of how strong the correlation is. The fact that most of the larger ones are distributed around the diagonal shows that students have a tendency to choose classes within the same discipline. General education classes or other popular classes will appear as horizontal and vertical streaks (correlation matrices are symmetrical). The main diagonal of the matrix has been omitted, because all it would show is relative class size for each class, and would obscure the relationships we’re looking for.

4. Conclusions

Prerequisites for classes create barriers to entry. The example in Exhibit 6 below shows a diagram of the prerequisites in Math and History.


The average number of prerequisites per course is 2.09 for Math and .95 for History.

Given the same number of majors, you could get by with fewer History classes per semester than Math classes. This depends also on actual requirements for the degree, of course, and not just on the information presented above.

Obviously if most students can’t take a particular course, it’s going to have lower enrollment. More than half of courses taught probably fall into that category. In many cases we can’t do anything about that unless we radically change the way we think about majors and disciplines.

In some cases we have introductory level courses segregated by the type of student. For example, Biology has separate courses for majors and non-majors, while English has a separate course for transfer students. This is another kind of prerequisite.

It would be a good thing for programs to review their curriculum from this "barrier to entry" approach periodically in order to reduce unnecessary complications and to perhaps unwind an infinite loop or two.

Thursday, March 12, 2009

Mashups

A colleague sent me a pointer to a cool recruiting mashup from Davidson College that geo-tags comments from applicants. Click on the image below for the actual site.



[Image by screencast.com] This is a clever idea for creating authentic marketing. I've been thinking a lot lately about two items from this excellent piece from the Lawlor Group on trends for the 2009 recruiting cycle (quoted in these pages previously). The two items are
  1. Addressing the value proposition (proving that your product does what it advertises)
  2. Using authentic voices in advertising
This second item is the point of using blogs and other customer-generated content to create a sense of authenticity. Yes, you give up some control, but the payoff is that you don't depend solely on slick advertising copy that is likely to be viewed somewhat cynically.

I wondered what else is out there like this, so I used delicious.com to search for keywords "highered" and "mashup". Here's a couple of the more interesting ones, quoted directly from "40+ Most Useful Mashups for College Students."

  1. lazylibrary: The motto of lazylibrary is practically marketed just for college students: "read less. Get more." This mashup filters out books with page counts of over 200 pages, so you can find cheaper books, get quick reference guides and more.
  2. Campus Explorer: Campus Explorer is a great mashup for high school students, students looking to transfer, and even individuals thinking about going back to school. You can search schools by location or major, view photos and video clips of different schools, and set up your own profile to save your favorite schools.
Another find is slideshare.com, which allows presentations to be shared online. And of course there's Zotero.com for researchers--a very useful Firefox plugin.

An Educause article linked from my search page caught my eye. It draws a circle around a brewing conflict with social/Web 2.0/mashup services offered by colleges:
At one point, the phrase "Net Gen meets FERPA" was coined to address the quandary faced between the "old-fashioned" way services are delivered (a way intended to protect student privacy and maintain standards, as required by the Family Educational Rights and Privacy Act) and the "new" expectations of Net Gen students for flexible, seemingly less private and open-ended e-services.
This is a problem I'm wrestling with directly now. Faculty demand better access to, for example, student records, so that advising can be more flexible. It's easy to deliver such services--even as RSS feeds or other dynamic data ripe for mashing up. How far is too far to go down this route? I don't want to find a public geo-tagged map created by some enterprising professor that lists names and GPAs, for example. That would be a clear FERPA violation.

Creating the mashups is fun.
There's an article from OUseful.Info, the blog... showing some pretty sophisticated tricks in the higher education setting. I've found some of the tools to be a bit flakey and unreliable (google docs, in particular is laggy and buggy about pushing out to the web). A dedicated development environment, for example in Perl, would be nice.

Wednesday, March 11, 2009

Chartering a Course for Higher Ed

A colleague just attended the negotiated rulemaking committee's deliberations to ultimately implement the Higher Education Opportunity Act (2008). The meeting was reported on in InsideHigherEd (IHE) here, and is summarized by a CHEA letter here, which she forwarded to me.

The current process is apparently not at the DefCon 4 status that the Spellings Commission created with its force-feeding of ideology and intrusive monitoring agenda--an atmosphere the IHE article terms "poisoned." They also describe the outcome of this overreaching:
The perceived overstepping by the Education Department led Congress to step in to block the education secretary from promulgating rules on accreditation, and in renewing the Higher Education Act last summer, lawmakers barred the Education Department from issuing regulations going forward that are designed to ensure that colleges are measuring student learning outcomes.
This act was in large part due to the efforts of Senator Lamar Alexander. At the same time, Sen. Alexander had made it clear that innovation in higher education is called for. In a Feb. 10 article in IHE he is quoted as suggesting a three-year bachelor's degree. This is an interesting idea that deserves some comment.

On the actual merits of what the addition of three-year bachelor's-type programs could add to the educational mix, I cannot speak. As a past SACS liaison, however, it curdles my blood to think of the effort involved to make such a thing reality. Last month an email on a from Belle Wheelan, President of SACS, was circulated. The subject was the integrity of so-called 3+1 programs, which the accreditor describes as a combination of 90 credits at a two-year institution plus 30 at the finishing (four-year) school. There are apparently five standards from The Principles of Accreditation that are in question here, from questions about transfer credit to the efficacy of the general education curriculum. ?Pres. Wheelan also notes that programmatic accreditors (for Music, Education, Engineering, etc.) will also need to be consulted to give their approval. My perception of the tone of the email is "well, we can't stop you from doing this, but be prepared for a long hard slog." Herein lies the rub.

Academic Administration is not an engine of innovation. Given the stack of regulations to be followed (Sen. Alexander apparently shows off of stack of five boxes of them) and the accreditation processes to be followed in order to make any substantive change--a term that leaves little wiggle room for unapproved changes--the amount of energy that would be required on the part of any administration to create a three-year bachelor's program is mind-boggling. Real innovation requires the ability to play, in order to see what works and what does not. It requires the ability to fail. But accreditors demand success, and usually assume that one can plan one's way to every higher goals if one only has enough assessment data. Peer review committees are naturally suspicious of anything new, and despite efforts in the last decade to change the tone of accreditation visits from "gotcha" events to something more helpful, little has really changed. It's a process guaranteed to be conservative and laden with inertia.

This is not to say that it can't be done. But an institution has to be particularly well led, well funded, and connected. In essence, it needs a 'bye' from the usual processes in order to do anything truly innovative. This is similar to the concept of a charter school.

So here's my proposal. Select a group of volunteer institutions, which are to innovate. Release them from all but the most important requirements on reporting, energy-sapping regulation, and most importantly pre-approval from the accreditors. Just like charter schools, let them go out an create new things. Five or ten years later, take a look at the results; in the meantime leave them alone. This would do two things. First, it would allow for much freer innovation, which is potentially valuable for its own sake. Secondly it would create a perspective on the rules and processes that regulate higher education. If these 'charter schools' succeed despite the lessened requirements, then perhaps we didn't really need all those expensive layers of review, reporting, and approval that currently exist. Wouldn't that be interesting...

Saturday, March 07, 2009

SAT Validity

The College Board has a nice page on Data and Reports, with a report on SAT Validity showing correlations between SAT and high school grades (HSGPA) and first year college grades (FYGPA). The table below from the report shows these statistics.

There is also a report on demographic breakdowns called Differential Validity and Prediction of SAT, which shows differences between race and gender groups. On the whole, predictive validity doesn't vary much, especially considering that what we're interested in usually is R-squared.

Since I've been interested in using noncognitive indicators to predict achievement, I looked for some statistics to guide me. All I've found so far is in the paper "Predicting the Academic Achievement of Female Students Using the SAT and Noncognitive Variables" by Julie R. Ancis and William E. Sedlacek. In the study used in the paper, two of the noncognitive dimensions seemed to be the best predictors (my interpretation): realistic self-appraisal and community service. There wasn't information about the residual when GPA and SAT are also taken into account, but putting these together, one can create an approximate "best case" scenario based on the statistics. This is shown in the chart below.



The contribution of the noncognitives was to me disappointingly low, and it is likely even smaller because of the correlation with SAT and GPA, which is unknown here. That is, the green slice might actually overlap with the red or blue ones.

It's interesting to note that that "real-life" correlations like the one in the Ancis and Sedlacek study are lower between SAT and FYGPA than in the College Board's research reports. This has been my experience too, and I can't explain the difference. I came across a rather impassioned argument for dropping the SAT as a predictor in a 1992 proposal from Jonathan Baron at University of Pennsylvania. His correlations are even lower than mine for SAT and FYGPA, and he argues that there's not enough value added by the SAT (after taking into account other predictive variables the university uses) to justify its continued use. He makes an interesting point about the emphasis on SAT in the admission process:
College admissions criteria have major effects on high-school
education. College-bound high-school students do what they think
will help them get into a good college. Students now spend a
considerable amount of time preparing for the SAT. (When my son
was taught how to take multiple-choice exams in Kindergarten,
when he was in a group of children who could already read, I
complained that this was an inappropriate activity, and I was
told that it's never too early to start preparing for the SAT!)
If they were told that the SAT was not important but their grades
and their achievement test scores WERE important, they might
spend more time trying to learn something and less time trying to
learn how to appear to be intelligent on a test. This might be
reason enough to drop the SAT, even if it were somewhat useful
for prediction.
The trend toward standardized testing as a measure of minds is not limited to SAT. Now we have the whole No Child Left Behind apparatus and an increasing appetite at the Department of Education to infect higher education with this philosophy using the likes of the CLA. I hope that instruments using noncognitive assessment can get more attention and be developed into something useful.

Friday, March 06, 2009

Assessing the Elephant

There's a strongly felt opinion about presented in InsideHigherEd's March 2 piece "Who really pays for assessment?" The subtext is "is the time we're spending on this activity worth the cost?" It's not a great sign for our industry that the author felt the need to remain anonymous. I recently compared accreditation visitors' judgment of your assessment plans to the Spanish Inquisition, so "Unfunded Mandate," I feel your pain.

In fact, a very basic kind of assessment need not be expensive either in dollars or time. The methods we developed at my former institution (Coker College) would serve in any small-class setting. It won't work for class sizes of over about 25 probably. There's a second chapter appearing about it, in a 'best practices' book, this summer. But you can read my extended thoughts on it in Assessing the Elephant. I have plans to turn this into a real book, maybe this summer.

Key design elements are:
  • Assess based on existing class work. Some prefer this to be at the assignment level, but we used the whole course as the basis for judgment.
  • Assessments are subjective judgments based on general rubrics made by course instructors.
  • The scale of assessments is based on traditional sequencing of undergrads: remedial, fresh/soph, jr/sr, and graduate.
  • Assessment results are not used by the administration as a stick (to avoid tainting results)
  • Assess skills that are demonstrated in a class, not just where they are taught.
  • Use a dead-simple method of gathering data. It shouldn't take more than 5 min per class per semester.
You can read some of the statistics on validity and reliability in the document linked above.

Thursday, March 05, 2009

The Backchannel

A while back I tried an experiment at the NCSU Assessment Symposium, and then later at the IUPUI Assessment Institute. The problem I saw was this: conferences are ephemeral events, and individual sessions tend to vanish (for me at least) into a handful of crumpled powerpoint note pages with my hastily scrawled observations on them. Not only is it hard to remember later what the salient parts were, it's hard to find the time to go back and categorize and sort out the stuff. I thought perhaps a 'sticky' conference experience would be a good idea. I described it on my other blog here:
[Conference] conversations tend to get cut off just when they get interesting. Wouldn't it be great if these moments could be tagged so that you could pick them up later? I imagine a site that you log in to, identify the conference, session particulars, etc., and then are allowed to add comments or documents under that rubric.
My solution wasn't a very good one--I just created a blog post for people to add comments to. This has the same problem as sorting out note later; no one has time to do it. I had asked a member of the audience to my presentation to jot down her own questions as they arose during talk. These and others I tried to answer on the blog to spark a later asynchronous conversation. You can find the post here. It didn't work. I don't know if anyone read what I wrote, but no one commented. Of course, it didn't help that I advertised it in a very ad hoc way. I don't like wasting paper with handouts, so their finding the site had to depend on them jotting down the URL and browsing it later. I have suggested to conference organizers that such a thing be included in the conference web site (links to discussion boards or something), but they're too busy too.

So I was pleasantly surprised to come across an article in The Chronicle called "Switch-Tasking and Twittering Into the Future at Library and Museum Meeting." The unfortunate title hides a very cool idea, viz, a partial solution to the problem I just described. Instead of a post-meeting discussion, the idea is to discuss content during the meeting. It's a key piece that is missing from presentations that are usually monologues. The service is called Today's Meet. The interface is dead simple. You browse to the page, create a meeting (no sign-up required), and are presented with a Twitter-style interface. 'Tweets' can be tagged with meta-data.

The so-called backchannel comprises a conversation in the background, via these tweets, while the presentation is going on, or presumably afterwards. In order for it to be successful, the participants have to have access to appropriate technology. This will be a barrier to immediate adoption at most conferences, although a Blackberry interface might go a long way to solving that problem.

The idea that people are chattering away electronically while you're speaking may be horrifying to some presenters. Not me. In order to work, the attendees would have to have the URL for the meeting communicated to them anyway, so it almost requires the consent of the presenter. Ideally, you could have an LCD projector showing the tweets in one panel while another is used for the official meeting presentation. It would take some practice to figure out how much attention to pay to this backchannel, but it's a worthwhile experiment.

This doesn't solve my problem completely, because I still wish for official conference message boards as well as todaysmeet-like tweet boards that would be listed in the program. This way everyone could find them before, during, or after the presentations. Conversations could begin before the conference even started.

But you can use it in the classroom too. Ira Socol (whose son developed Today's Meet) describes a classroom experience in the edu-blog SpeEdChange:
The first few posts were simple. "Hi" "Hello" "What's This?" But within fifteen minutes it had accelerated wildly. There were "tweets" (if you will) about the stuff in the class, and questions, and doubts, and worries. There were "procedurals" - "where's the sign in sheet?" "Is the due date still...?" There were requests, "I wish he'd talk about..." There were concerns, "This is distracting me" "More than Facebook?" "About the same" By the time the class session had ended, over 200 comments in all.

Every few minutes I looked up at the screen and checked the conversation, and typically I adjusted the discussion, or picked up on a question being asked there, or commented on an answer or a comment. In a big class it gave me real access to far more students than I can possibly get by watching for raised hands. And it let me - and the class - hear from many who never raise their hands. Honestly, I could even judge, much more clearly than usual, what was connecting and what was missing. As an instructor - I loved it
This makes sense. I've always found it useful to have students fill out a 3x5 card with questions at the end of class and rate their understanding 1-10, but instant feedback is much better. In online classes, I've also noticed that some students who are reticent to speak in class become more lively in a chat room.

There will be detractors, and this may not work for everyone or in every setting. But the reality is that if an audience has the technology, odds are they're already in their own private backchannel with chat, Facebook, email, and so on. Why not tap into that and make it a conversation?

Wednesday, March 04, 2009

Data Compressing the Curriculum

I've written about vodcasting lectures for out-of-class consumption and about the idea of information theory in the past. A recent article in the Chronicle puts these together. The piece highlights a practice of condensing a lecture or topic into a one to three minute video that encapsulates the essence of the material to be taught. This would be followed by an exercise to reinforce the material.

You can see one here:




I can see that this technique would work better with some ideas than others. In particular, this is the kind of delivery that would have the best chance of being successful with low-complexity learning: facts, simple pattern recognition, or simple examples of deductive reasoning. In fact, it occurs to me that one could scientifically study the complexity of a task by reducing the length of the mini-lecture until students don't understand anymore. This would be an upper bound on complexity. So perhaps solving a 2x2 system of linear differential equations has 25-minute complexity, whereas learning the basic facts about the Norman Invasion has 2.4 minute complexity. If this worked, you could actually create a taxonomy of knowledge and skills for a discipline that had some (minimal) scientific grounding.

Sunday, March 01, 2009

The Idea Pile

I've been musing lately about using the hive mind of an organization to crowdsource problem-solving. In plain English, encourage sharing ideas at all levels of an organization with a democratic way of gathering, evaluating, and publishing them in a useful way. I mentioned EtherPad as an interesting tool for collaborative editing. Online forums can be made at forumsland.com, or probably dozens of other places. I looked for a diagram creator to make org charts and the like, and found AutoDesk's online tool for doing that. I created the chart below in a few minutes, and published it to the web.

You can also create mind maps online at mindmeister.com. It's so easy to use it's fun. You can create maps and share them as read-only, like the one below, or allow wiki-style editing to other users.



For synchronous and asynchronous collaboration, there are a couple of nice tools. One is for real-time (synchronous) meetings at vyew.com. Yesterday, my friend Jon told me about another site called Nexo that lets you create a variety of web sites for the asynchronous part. Nexo's power comes from its flexibility. You can create multiple pages devoted to things like calendars, forums, and resources, and then customize those pages with sections like RSS feeds, books on Amazon, forums, tasks, comments, and so forth. It looks promising. Everything I've cited in this post is free for individual accounts.

There are lots of ways to collaborate online. The trick is to figure out how to pull the best of these tools together into one pile. Ideally, any user could highlight and tag one of these objects and have it naturally integrated. This is too much to hope for currently,but I'm hoping to converge to some way to usefully assemble and connect rich content. Everything interesting is due to combinatorics, after all. It's not what it is; stuff is ubiquitous. The uniqueness comes from how these bits are arranged in relation to those bits.