Saturday, May 29, 2010

Probability Dilation

A few days ago, I posted a probability puzzler based on conditional probabilities.  Here's another one.  The idea is that when we operate without full knowledge, we may nevertheless be able to give a lower and upper bound on the probability of some event.  Intuitively, if we discover new information, that should increase our certainty, reducing the difference between upper and lower bounds. 

As an example, if we flip a fair coin three times independently, the probability of three heads (HHH) appearing is 1/8.  After the first flip comes up heads, however, the probability changes to reflect the new information.  We write
\[Pr[ HHH | H] = 1/4\] where the vertical bar means "given that". 

Sometimes we don't know the conditional probabilities, however, and have to use a worst case/best case approach to get a range of possible probabilities.  Probability dilation is when the range gets larger when we add information.  That is, knowing more information leads to greater uncertainty.  I read about dilation in this paper (pdf).  Here is a paraphrased version of the example in the paper.  (I also posted this on reddit.)
Doctor: Your screening test for Krankheit came back positive. That means you have a 50% chance of having the condition. We want to do a more accurate test that will tell us for sure.
Patient: 50% Wow. I guess I need to know. Let's do the test.
------ 30 min later ----------
Doctor: Well, I'm sorry to say I have bad news and more bad news.
Patient: Oh no. The test was positive?
Doctor: Yes, it came back positive. But the other bad news is that they gave you the wrong blood test: one for an unrelated condition. We have no idea what the conditional probabilities are for having tested positive on this test and actually having Krankheit.
Patient: What! Why is that BAD news?
Doctor: Well, we used to be sure that Pr[Krankheit] = .5. Now we're dealing with Pr[Krankheit | Test+], which could be anywhere from zero to 100%. We know LESS than we did before.
Apparently this sort of worst case/best case analysis is used in machine learning algorithms, making this a practical problem.  

Friday, May 28, 2010

Fixing Assessment

An article this morning in Inside Higher Ed takes on the issue of faculty willingness to engage in meaningful assessment, citing an essay by Pat Hutchings.  I wrote about that article "Opening Doors to Faculty Involvement in Assessment" last month here.  My summary was:
Despite the odd misstep into virulent bureaucracy and too much enthusiasm for top-down assessments not tied to objective top-level goals, the article gives excellent advice for building the culture of assessment our accreditors are always going on about.  The recommendations are practical and useful, and can address the assessment problem at the troop level.  The big issue of getting the top-down approach fixed is not the topic of the article, and given where we are, is probably too much to hope for. 
 The Inside Higher Ed article complements Hutchings' findings with comments by Molly Corbett Broad, president of the American Council on Education.  She makes a point that is obvious by underappreciated:
[I]f faculty do not own outcomes assessment, there will be minimal impact on teaching and learning and, therefore, on student achievement
If the chain of logic [behind learning outcomes assessment] begins from an accountability perspective, the focus is on the institution, and if it is primarily an institutional measure, it is potentially disconnected from how individual faculty members teach
 Blackburn College's experience in trying, failing, and trying again, to find that formula are described.  In a linked document, we find this telling statement.
Assessment came to the college as an ill-defined external mandate that was perceived to be an adversarial process with connotations that threatened these core faculty concerns. Many faculty members raised questions about how such a mandate could be incorporated into college policy and practice without violating the fundamental values of the institution. 
The politics of assessment from the Dept of Ed on down have not been very conducive to positive change.  This is actually easy to fix, I believe.

Fixing Assessment.  There needs to be a clear and achievable expectation at each level of administration.
  1. The Dept of Education should not be concerned with classroom learning outcomes.  This is not a strategic goal, and can't be measured in a comparable way.  Rather, state clearly in an updated strategic plan what the vital interests of the USA are with regard to higher education.  Is it more high-paying jobs for graduates?  Then give universities feedback (using IRS data) about how well we do in that regard.  Is it technology?  Then emphasize science and math like we did in the space race.  Is it energy?  Appropriate measures are numbers of graduates in disciplines, number who attend graduate school, etc.  Do you want them to be better citizens?  Track voting records and give the universities feedback.  Engagement with the world?  Use carrots and sticks to foster languages and intercultural studies (or whatever is important).  Give us goals that are very easy to understand and turn into actions.
  2. Accreditors have done a good job of underlining the importance of assessment, but have been caught in two traps: the fuzziness of the goals from the top, and the perception on the part of faculty that this is all a top-down exercise.  This is the point made above, and in my experience it's completely valid.  Once the DoE defines clear and actionable goals, regional accreditors can focus on fostering faculty-supported assessment that will improve classroom teaching, and forget about "accountability."  They can perform a tremendous service this way.
  3. Institutions would have good information from the DoE about how they fit into national priorities.  Some institutions won't fit that profile, and others will, but either way the goals will inform and guide institutional strategic planning on some level.  (Do we want to start a computer science program and try to get federal grants? etc.).  If there is a DoE goal related to employment of graduates, this is probably the most general sort, and any institution should be able to engage with that.  On a lower level, the administration has to make sure that the assessment programs are working effectively.  That doesn't mean heavy bureaucracy, expensive online systems, or a raft of standardized tests.  Institutional data on learning outcomes is easy to get.  As evidence of how easy it is to get useful assessment data from faculty for free, see "FACS Redux."  With three emails and a few department meetings, we got (at last count) 2731 learning and non-cognitive outcomes ratings on a base of fewer than 1500 students.  It tells us what the faculty really think about students in an institutional format.
  4. Academic programs are where assessment and improvement mostly happens.  In my experience, once faculty are guided in a way that makes sense to them (see "Culturing Assessment and Scoring Cows"), a light comes on, and most can see the importance of it.  Not all faculty or all programs will do a perfect job of it, but I believe with cultivation the right attitudes and practices can be developed institutionally.  It means staying away from language like "accountability" and "measurement," and focusing on what specifically faculty want students to learn in individual classes.  It's not hard.
Instead of this, what we have is a confusion of priorities, methods that don't make sense, and alienation of the people we're trying to engage. It's political, messy, and largely irrational: rather like any faculty senate.  We can do better.

Update: see Assessment Crosstalk for analysis of the comments to the IHE article.

Thursday, May 27, 2010

Testing and Choice and Ruin

The Death and Life of the Great American School System: How Testing and Choice Are Undermining EducationIn a May 25th article in Education Week, Ravitch summarizes what she things is wrong with Race to the Top, the Obama administrations version of NCLB.  Among her points are two game theory arguments:
The NCLB-induced obsession with testing and test-prep activities will intensify under Race to the Top because teachers will know that their future, their reputation, and their livelihood depend on getting the scores higher, by any means necessary.

By raising the stakes for tests even higher, Race to the Top will predictably produce more teaching to bad tests, more narrowing of the curriculum, more cheating, and more gaming the system. If scores rise, it will be the illusion of progress, rather than better education. By ratcheting up the consequences of test scores, education will be corrupted and cheapened. There will be even less time for history, geography, civics, foreign languages, literature, and other important subjects.

I wrote about this sort of effect in "The Irony of Good Intentions," and it's also related to the over-valuing of SAT scores in "Amplification Amplification."  There must be some psychological flaw in our species that leads us to accept easy answers over ones that work, often taking the form of some sort of credential.  For example:
  • Bond ratings substitute for real understanding of a company
  • Investment firm stock ratings do the same
  • Standardized test scores of complex behavior substitute for actual performance
  • Degrees and other credentials substitute for demonstrated performance
  • Paper currency stands in for something (what?) of value
  • Loan applications substitute for actual ability to repay
All of these are undoubtedly useful, but are also subject to bubbles.  We're in a credential bubble now, with online for-profits cranking out degrees, racking up massive loan debts (for students), and milking the government of aid dollars. 

Probably someone has done this already, but this inflation-of-value idea probably has an archeology we can discover in word origins. For example, the word "really" used to mean "actually," according to this etymology site.
really early 15c., originally in reference to the presence of Christ in the Eucharist. Sense of "actually" is from early 15c. Purely emphatic use dates from c.1600; interrogative use (oh, really?) is first recorded 1815.
Now, really means something like "very."  The word "literal" has a precise meaning that describes a certain realness.  I have noticed that's becoming fashionable to use it merely as a false credential: on the radio I heard "and the audience was literally eating out of her hand."  Somehow I doubt that's true.

Similarly, words for "large" seem to have engorged themselves.  Things become Super, and Mega.  It can't be long before the advertising executives discover the untapped vein of other standard scientific prefixes: giga, tera, peta, exa, etc. 
Would you like fries with that?
Sure.
Do you want the Peta-pack or the Exa-plosion?
This arms race of "believe me!" may ultimately inflate so much that we require exponential notation to express how certain we are.  This may be another sign of Ray Kurzweil's Singularity:
Wow, that concert was REALLY10^25 good!
Oh yeah?  Well I saw one last week that was REALLY10^30 good!
And we'll probably be paying those teachers with these:

(image courtesy of Wikipedia)

Wednesday, May 26, 2010

Repressing Science in VA

Something astonishing is going on in Virginia. Michael Mann, a climatologist at University of Virginia until 2005, is being targeted by the state's attorney general with what is being called a fraud investigation into years-old research and grant funding.  Nature calls it a "witch hunt" in this article.  See The Chronicle in this article.  Related articles are here and here.  The most telling quote is:
"Throughout his career, he has shown that he is a man of deep, deep political commitment," says Jeremy Mayer, an associate professor of public policy at George Mason University, in Fairfax, Va. "He seldom shrinks from applying those views when he has the power to do so."

Mr. Cuccinelli, Mr. Mayer says, "really believes that global warming is a hoax." And, politically, "there's not really a downside for launching an investigation" of a university professor, Mr. Mayer says, because Mr. Cuccinelli probably counts few professors among his supporters. "He doesn't lose any votes from his base; he certainly gains more national credibility" in the Republican Party.
Mann is a main contributor to RealClimate.org, but I found no reference to this there.

It's tempting to draw the parallel to other misguided individuals, who let ideology trump scientific inquiry, e.g. (from Wikipedia):
Lysenko was put in charge of the Academy of Agricultural Sciences of the Soviet Union and made responsible for ending the propagation of "harmful" ideas among Soviet scientists. Lysenko served this purpose by causing the expulsion, imprisonment, and death of hundreds of scientists and eliminating all study and research involving Mendelian genetics throughout the Soviet Union. This period is known as Lysenkoism. He bears particular responsibility for the persecution of his predecessor and rival, prominent Soviet biologist Nikolai Vavilov, which ended in 1943 with the imprisoned Vavilov's death by starvation. In 1941 Lysenko was awarded the Stalin Prize.
But this isn't the Soviet Union, and this case seems more like a populist-driven crusade than a top-down enforced conformity to an ideal.  If the voters allow this, this is a real indication that the education system has failed.  The Nature article puts it like this:
Cuccinelli's actions against Mann hark back to an era when tobacco companies smeared researchers as part of a sophisticated public relations strategy to raise doubts over the science showing that tobacco caused cancer, and delayed the introduction of smoking curbs for decades. Researchers found themselves bogged down in responding to subpoenas and legal challenges, which deterred others from the field. Climate-change deniers have adopted similar strategies with alacrity and, unfortunately, considerable success.
As someone said, in a democracy people get the government they deserve.

If you want to see what the blogosphere has to say, here's a link to technorati.

Update: U Va decides to fight back  

Sunday, May 23, 2010

Slice of Heaven

I think, therefore I am, according to Descartes.  I think it's more fundamental than that: I eat, therefore I am.  It doesn't sound as academic, but it's easier to appreciate.  What follows is therefore not about assessment or the future of higher education or some obscure math thing, but how to make and eat bread.  This is my purely idiosyncratic view of it--I didn't consult the Baker's Union or anything.

Start with some stuff that's likely to lead to bread.  Life on Earth didn't just spring out of nothing, after all: there were comets delivering water, heat from the sun, etc.  So if you start in a completely sterile environment, bread is unlikely to happen too.  I suggest something like the "warm little pond" pictured below:
You may have to tilt your monitor to make sense of it, but it contains a plastic tub of King Arthur wheat flour with a scoop inside, some sea salt, olive oil, a measuring cup, bread machine yeast, a mixing bowl, and a mixer with bread hooks attached.  Never mind the oranges and flowers--those are not essential. 

I've found that it speeds things along if you intervene and mix some of these things together.  The amount of bread you get is proportional to the amount of water that starts it, so I put one cup of water in the mixing bowl.   When nothing happened after a while, I added a couple scoops of flour, a little yeast and salt. It's in this way that I found these ingredients to be all essential to good bread.  All very scientific.

The mixer had a natural affinity to the experimental goo, and I found it whirling about making a mess.  So I helped it along, battering the batter into dough with the hooks.

Here's where different species of bread diverge.  I was hoping for long skinny loaves, and for that we need a pretty firm dough.  Not a sticky gooey mess when you touch it.  When in doubt of this, add a little more flour--the dough will simply refuse to accept it if you add too much.  It's easier to get the dough too wet, which will cause problems with its development later.  I was lucky in this instance, and my two generous scoops were exactly enough.  So out it comes, evolving into a land critter.  It's pale and amorphous like any creature that's not fully developed.

Homely as it is, this is the time to name it.  This requires some imagination because there isn't much character to it yet.  That will change, but for now you have to be creative about it.  I called it Panis epsilonis after my daughter Epislon, who at a birthday sleepover and not around to witness this miracle of emergence.  She likes to pinch off pieces, put them on a cookie sheet, and cover them with sugar and cinnamon. 

Clearly, the nascent creature needed to mature, so I gave it a light coat of olive oil to keep it from drying out, and put it in a larger bowl (to give it plenty of room), and covered it with a plate.  It was early evening by then, so I let it rise for a couple of hours, and then put it in the refrigerator overnight.  In the morning I could see it had progressed in its development.
It's hard to tell from the photo, but the texture was now smooth, a luxury to handle.  The dough resists only if you mishandle it to the point of tearing.  Otherwise it's very well-mannered, if not quite domesticated.  It's fun at this point to develop it further by flattening and folding repeatedly:

After it seemed satisfactorily springy and awake, I began to shape my will upon the face of the dough.  Some conservative cooks claim that such human interference in the natural evolution of a bread will lead to abominations.  In truth, I have witnessed such things, when an experiment goes badly wrong.  Charred, malformed breads are the result, and all one can do is put them out of their pain and, depending on one's beliefs, call an exorcist.  But without experimentation, we would all still be living on a savanna somewhere picking fleas off of each other.  So I placed my faith in the hands of fate and inductive reasoning and began to slice.
The first cut with the bench knife carved off a piece so small it seemed insignificant.  The rest I divided equally, and then shaped these pieces while lightning crashed outside and Igor went in search of egg whites.  I fashioned two synthetic new species out of these: two we will call Panis longis and one called Panis roundis.  This was done by rolling and squeezing the pieces into shape.  I found that the coolness of the dough made this easier. When I was finished, I found the husks of previous maturation shells, and employed them as pictured below:


The round dish will be covered in the oven, allowing steam to envelop the crust as it bakes.  The bottom of the dish is buttered and sprinkled with cornmeal so that it won't stick. 

When Igor returned with the egg whites I instructed him to whip them into a frenzy and then add rosemary and basil from the garden, and a little olive oil, salt, and water.  This we basted the doughs with, and then let them rise for about three hours.  The long loaves we slashed three times each to let them expand gracefully.

The Panis longis looked beautiful.  The round loaf was almost touching the edges of its confinement.  It was time for the oven.  The long loaves went in first, into a cold oven set to 350*F.  I set it the timer for 35 minutes.  It took about 15 minutes longer than that, which I checked by fussing over the crust, tapping it, lifting it to see the bottom, and generally being neurotic about it.


They came out looking nice.  After I set them on the counter, I slid in the covered dish with the round loaf inside.

Eating bread. I assume everyone thinks they know how to eat bread, but I think most people are wrong about that.  For some reason, it's accepted practice to let the bread cool before eating.  Some books on baking will extol the wonderful sound of the crust crackling as it cools.  This is nuts.

You wouldn't grill a steak or fish fillet and then let it sit for half an hour before eating it, would you?  Or make a big pot of spaghetti and wait for it to get cold before serving.  Or let your ice cream melt.  It's the same thing with bread--it will never be as good as it is coming right out of the oven.  If the crust is perfect coming out, it will become soggy from the steam within a few minutes.  If you bake it so long that the crust stays hard despite this, it will be a thick tough crust.

The exception to this may be the German Brötchen, which maintains a perfect crust for hours somehow.  I don't know how they do it.  I asked some German bakers in Charleston one time, but they said it was impossible to make here in the South because of the humidity.  I'm not sure I buy that answer.  I think it's a deep dark secret.

So anyway, the conversation around the bread shouldn't be "could you pass me a slice of bread, please?" in order to get a cold limp slab that you try to revive with butter.  It should be more like:
Mom! I asked first!
I get the end piece--you got it last time!
Is there another loaf still in the oven?
Careful! It's right out of the oven.
Ow! Where are the mitts?
Hurry up!
In this case, I tore off a chunk as soon as I could handle it, drizzled it with olive oil, put a slice of fresh mozzarella and a basil leaf:


The subtle crunch of a just-right Panis longis crust is a singular pleasure.  It's not like a potato chip or cracker.  It's multi-dimensional, crunching through layers of crust, the crumb of the bread adding complexity to the top.  It doesn't always work that way.  Too much water in the dough and the inside will be a shaggy mess while the crust becomes too thick.  Or if you over-bake, the crust becomes rigid and hard.  This time I got it exactly right.  It was without exaggeration the best Panis longis I've ever eaten. 

Normally, the Panis roundis steals the show.  Because it's baked in a container, the steam works the crust differently, and if you get it just right, you have a eggshell crust, hard and crispy, but very, very thin.  It will barely support the bread it contains.  I usually make it as a treat for a meal, and the timing is tricky.  But when it comes out of the oven everything else stops and we eat the the bread.  It absolutely has to be eaten right out of the oven.  It took about 40 minutes this time, and I had no trouble getting it out of the dish.


I can't say the basil made any difference, and it looks kind of strange.  The egg white makes it brown faster, so it's easy to overestimate how done it is.  Be sure and tap on the bottom.  It should be hard too.  I like to take it out of the dish and give it a few minutes on the rack bare to harden the bottom crust a bit. 

This loaf you have to slice very carefully with an oven mitt to protect your hand.  It's fragile, and will collapse easily.  But if the crust is right, you can easily saw off the very end of it, which is the best part, maybe the best bread anyone has ever eaten in the history of the world.  It should have a hard delicate crust, not as many layers as the Panis longis, but crispy with a wispy crumb that is mostly air.  After cutting off the end, I usually cut triangular pieces rather than trying to cut all the way across and risk mangling it. 

After it cools a few minutes it wilts.   Trying to slice it then gives you something like this:
It was as good as expected.  If I could make the Panis longis every time like I did this attempt, however, I wouldn't bother with the the Panis roundis and its fussiness.  It was that good. 

Here's a happier picture to finish with.

Saturday, May 22, 2010

Short Takes

Lecture Tools.  Freeware (for now) to aid with classroom interaction.

Here's an interesting article about the inter-rater reliability for peer review in medicine.  Too bad they didn't study the same for psychometrics journals.  Now that would have been interesting.

You may remember me blogging not long ago about the Dept of Ed's strategic plan, and wishing for goals and data tied to employment.  If this Huffington Post article can be believed, something like that may be in the works:
In January, the Education Department suggested one answer: for a program to be eligible, a majority of its graduates' annual student loan payments under a 10-year repayment plan must be no more than 8 percent of the incomes of those in the lowest quarter of their respective professions. The earnings data would come from the Bureau of Labor Statistics.
This is limited to certain programs.  Don't have a stroke.

Finally, I just discovered this institute, which has an interesting article on student debt (pdf):
Ten percent of borrowers who graduated from four-year colleges and universities in 2008 owed
at least $40,000 in student loans, up from 3% in 1996 (in constant 2008 dollars).
And it's those who can least afford it who borrow most.

Creativity Example

A few posts back, I described a session with some arts faculty to construct assessments for their new programs.  I showed this graphic to represent the trial and error process of going from a question to an answer in a creative realm.  During this week I worked on a problem that makes a good example of how this works in math.

The problem I wanted to solve is related to this one.  Imagine you're building a space ship, which is composed of many components that all have to work together for the ship to function.  Also assume that you know the probability that any one of these will fail. Those probabilities are fixed, but you can create redundancies to ameliorate them.  If you only have a total number of components you can distribute among these redundancies, what's the optimal distribution?  An example will make it clear.
Imagine a simple case where you only have two components A and B, with probabilities of failure 1% and 4%, respectively.  You can use a total of 10 components in your design, so you could use five As and 5 Bs, or one A and nine Bs, or any other distribution that totals ten.  Each redundancy reduces the chance of catastrophic failure.  What's the best solution, to minimize this risk?
Intuitively we need more redundancy on the Bs, since they are four times as likely to croak.  For this simple problem, we can see the answer by plugging everything into a spreadsheet and listing all the possibilities.



Our intuition is right. The best distribution is four As and six Bs. How do we find this in general?  I worked on this problem over the course of a couple of days in my spare time.  I scanned the notes and created a Prezi presentation out of it, to illustrate the interplay between
  • knowledge of different techniques (analytical)
  • trial and error
  • making connections and exploiting them
  • using examples to simplify the problem
  • using examples to check work for plausibility
  • seeking symmetries that make the problem easier to understand and solve
You can click on it below to see the thing. 


Friday, May 21, 2010

Conditional Probabilities

The topic of conditional probabilities was on my wish list for the well-educated critical thinking.  I have a personal reason for this, actually.  When my daughter Epislon was being mechanically assembled in the usual manner, it was an unusually stressful time.  Part of the reason is that we decided to do an exchange visit to Shanghai during that time, and Epsilon ended up being born only a month after we got back.  That probably wasn't the smartest thing to do (that's not exactly how the obstetrics doctor said it!), but it worked out okay.  Before we left, however, I got the worst phone call of my life.  "The lab tests show that your child has tested positive for genetic abnormality X," was the gist of the message from the nurse.  Here, X is a big, life-changing abnormality.  Not a snub nose.  This was on a Friday, so we had to wait all weekend to come in and actually see the results.  I can't describe the amount of anxiety that phone call provoked.  By Monday I had my questions ready.  What's the probability that genetic abnormality X exists if the test shows negative, I asked.  In other words what's the probability of a false negative.  One in 430 (or thereabouts--I don't remember exactly) she told me.  Okay, what's the probability that the condition exists when the test shows positive?  About one in 420.  For this very small, probably statistically insignificant change in odds, we'd gone through a hellish few days and then further tests (to assure my wife).  Epsilon turned out fine, except for inheriting my sense of the absurd.  The whole episode would have been avoided if the nurse had a lick of understanding of probabilities (or just basic humanity).  The mistake was similar to students who compute (9+6)/3 on their calculators and happily write down 11 as the answer (9 + 6/3). 


All of this by way of introducing a nice puzzler on the topic. The source for this is this blog post.  If I told you that I had two children, that the first was male, and ask what you thought the probability of the second being male is, we could write it in conditional probability notation like this:
Pr[2nd child is male | there are two children, and the 1st child is male]
The vertical bar is read as 'given that'.  In real life, this isn't so simple, since gender is determined by the chromosomal  contributions of the male, and we can't necessarily assume it's 50-50 (some males have only male children, for example).  Also, slightly mre males than females are born because mortality rates aren't the same.  But assume these complications away.  Assume it's 50%. 

What if we made the question more general?  Find:
Pr[both children are male | there are two children, and at least one child is male]
 It looks similar, but it's subtly and significantly different.  The probability is not 50% anymore.  Now it gets really strange.  Find:
Pr[both children are male | there are two children, and at least one child is male born on a Tuesday]
The probability is now different from the previous one, which seems to confound all common sense.  How can such "irrelevant" information as the day of week of birth have any impact on whether the other child is a male or not?  A full discussion of the problem and solution can be found on the post.  And there's an interesting discussion on reddit, where I found the problem.

Thursday, May 20, 2010

The Use and Assessment of Portfolios

I've written about portfolios before, and I like the idea of freeing them from closed garden systems that reside on university servers.  An article in Campus Technology by Trent Batson echos this, asking if portfolio evidence is useful.  The author questions the hows and whys of assessing these ungainly, complex, but authentic, records of student work, and suggests:
One approach is to make the portfolio (the collection itself, not the technology), the project of the course and then to make a capstone portfolio a requirement for graduation. This would mean that the student herself would continually re-craft and re-comment on the collection. The collection would be winnowed down, the student would write a summary of achievement and link to examples within the portfolio collection itself to support claims about achievement. 
 He's in favor of reducing complexity to facilitate understanding, arguing that:
A large collection of undifferentiated work over 2 years or 4 years or more is not of much use to anyone. It is like the boxes of photos and letters and clippings out of which people make scrapbooks--the scrapbook (ideally) creates some coherence, selects work that is representative, and therefore conveys a message. This is the process for capstone portfolios: constantly building a student’s academic identity over time by re-visiting and re-working the portfolio collection.
 This is an idea worth refining.

Here's another article in Campus Technology that is worth checking out on classroom design.

Tuesday, May 18, 2010

Viewpoints

Four articles, one for each Horse of the Apocalypse:

NY Times: A Very Interesting Idea  (about getting two years of college while still in high school)
The school, a fascinating collaboration between Bard College and the city’s Department of Education, was founded in 2001 as a way of dealing, at least in part, with the systemic failures of the education system.
How about the systemic failure of American society to value education?  Maybe we should turn the Cartoon Network into an education channel and let Spongebob take over.

Wall Street Journal: A Lament for the Class of 2010
Economists theorize that this may be that very rarest of things—a generation that does not do as well financially as the generation that spawned it.
Are these the same ones who came up with credit swaps?  Is this prediction based on some new doomsday derivative you guys are cooking up and haven't told us about yet?

NY Times: End of the University as We Know it (from 2009)
The division-of-labor model of separate departments is obsolete and must be replaced with a curriculum structured like a web or complex adaptive network. Responsible teaching and scholarship must become cross-disciplinary and cross-cultural. 
Yay! No more department meetings!  We'll have complex adaptive network...uh...seances?

The American Scholar: The Disadvantages of an Elite Education (from 2008)
The first disadvantage of an elite education [...] is that it makes you incapable of talking to people who aren’t like you.
Sorry, what?

Update: Three more I found this morning, ruining the whole horse metaphor.


CNN Money: Why Universities Should Hate the iPad
If students embrace textbooks on the iPad, college bookstores may lose their shirts. 
Is income from textbooks really the most important consideration? Is that enough reason to "hate" the technology?

NYTimes: The Value of College
As it happens, though, labor economists have spent years trying to answer this exact question, devising careful studies to see whether students actually benefit from college. The answer, in a nutshell, is that they do. This is a nice example of Occam’s Razor: the simplest explanation is often the correct one.
 These are the real learning outcomes.

Chronicle: Spending on Political Influence by For-Profit Colleges (subscription required)
For-profit college groups are significant players among all educational organizations in their use of campaign contributions to curry favor in Washington, a Chronicle analysis shows.
And where does the money come from?  Oh, yeah, PELL grants and government-sponsored student loans.  It's a nice little feedback loop.

The Philosophical Importance of Beans

Pythagoras was against them: you simply couldn't be a Pythagorean and eat beans.  Nor should you pick up anything that has fallen.  Bertrand Russell lists fifteen prohibitions in A History of Western Philosophy.  The one that baffles me the most is the one about not eating from a whole loaf.  Are you supposed to wait around until someone nibbles from it first, and then have at it?  This catch-22 reminds me of teaching in Shanghai. I was teaching English conversation to young college students who'd been weened on the language.  They had questions I couldn't answer to their satisfaction.  What's the difference between "I went to the store," "I have gone to the store," and "I did go to the store"?  One day they had the left and right panels of the blackboard filled up with vocabulary, leaving me only the middle panel to write on.  They got terribly upset when they thought I'd erase it, so I left it the way it was.  But when I left I made a big box in the middle of the board that looked like this:

The students found this hilarious.  Normally, a class officer would make sure the board was cleaned, but when I came to class the next day the left and right panels were clear, but my bad luck box was still there.  I used it to talk about game theory and Pascal's wager, which they found interesting.  A weekend passed, and the following Monday the whole board was clean. The students had found a janitor who didn't know any English to erase it.  But I digress.

More recently, the subject of beans was a topic for a discussion on the assessment list server about the importance of agreeing on ratings (sometimes called centering or leveling) before assigning rubric-based scores. I was trying to find a topic where assessment was clearly subjective:
Suppose we ask 10 people to taste our green beans and ask them to rate as "good" or "not good."  Assuming we don't tip the scales somehow, we could legitimately report out to the public that of the 10 people who sampled the beans, 8 said they were good. 

But if we have the raters center first, the 2 holdouts are going to have to be convinced that the beans really are good, or else we ignore them.  Either way, we stamp GOOD on the can and give it a diploma, which may be misleading to 20% of the consumers.
A response from Hugh Stoddard:
Your green beans would be far more appealing to consumers if the raters discussed what constitutes a "good bean" and whether their judgement should be based on color, shape, size, sweetness, crunchiness, or nutritional value.
And a response to that from Joan Hawthorne:
Now we're getting into food where I feel a great deal of expertise after oh-so-many years of very dedicated eating.  I know in general that green beans are nutritious and I'm glad about that.  But hypothetical attributes of green bean goodness matter not one whit when it comes to my pleasure (or lack of pleasure) in eating a particular green bean dish.  Ditto for chocolate, wine, coffee, whatever.  If you're doing a shared rating of goodness for shared discussion, criteria may be helpful.  But I don't care if the top chocolate -- according to rater scoring based on criteria definition -- is this one or that one.  I care about which one appeals most to ME.
So who is right?  I originally assumed that green bean tasting would be so subjective that the idea of using rubrics was clearly nonsense.  But it took a while for me to really understand why.  Here's my answer in a nutshell: more doesn't equal better, and life isn't really a linear combination.

More is worse. A little salt is good.  More is bad.  A little crispness is good, more is too much.  Suppose Stanislav and Tatiana determine how salty they prefer their green beans.  The graph might look like this:

Here, the horizontal axis is increasing saltiness going to the right, and the vertical scale is pleasure in eating the beans, as subjectively reported (this is all made up data, in case you missed that point). 

Clearly, it's no good telling Stanislav that he's a wimp for not being able to tolerate more salt, or claiming that the "correct" value for optimum saltiness is somewhere between the two peaks.  Notice that there are two conditions here that we imagine away when we work with rubrics and ratings:
  1. There may be legitimate differences in opinions and tastes that can't be "centered" away
  2. more of an expressed trait isn't necessarily better
Wait--how can more of something be worse on a learning rubric scale?   Let's take something that ought to be straightforward: writing correctness.  Is it possible for writing to be too correct?  Here are some cases where it might be:
  • poetry, where one may bend the rules a little for style or form
  • advertising, where space is at a premium
  • dialogue or other less formal speech, where correctness can sound stilted.
  Suppose we "correct" Lewis Carrol's "The Jabberwocky," to take an extreme example. 
’Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe.
About the only correctness present here is that the words take familiar forms (nouns, verbs, adjectives) despite not making sense.  Is this poem fully correct with respect to use of language?  Clearly not.  Is that bad?  No.  It would lose its whole meaning if you corrected it to something like:
It was brilliant and the swaying boughs,
Did gyrate and gleam in the wind,
etc.
That made me ill.  We could get out of this by defining special kinds of correctness for each type of writing, but that still leaves the awkward question of how to assign a rating to something that is too much of something, AND it invalidates comparing one type of correctness to the other.

Tastes differ. Not everyone likes to read the same authors.  We have different tolerances for style and tone, and even correctness.  If I see a single "it's" where the author means "its," I usually stop reading; I take it as an indication of sloppiness.  In academic writing, style is tricky, and tastes will vary.  Too much informality, and you will not be taken seriously.  Too much formality, and few people will slog through what you've written.  But it's not necessary that we all agree on these things.  Indeed, it's probably impossible. 

I think Hugh Stoddard makes a good point when he suggests rating specific aspects of the green beans.  You could apply that to saltiness, crispness, and so on.  We still won't get agreement, but it would focus the conversation.  There is, however, a snag with this project too:  combinations aren't necessarily linear. 

We see this effect in analyzing student evaluations from classes, where students fill in bubbles on a Likert scale to answer such questions as "Were the goals of the course clearly stated?"  Students who rate the instructor low tend to also rate everything else low--the textbook, for example.  Similarly, someone who has a low tolerance for salt may be turned off by the mushiness of the beans simply due to too much salt.

In writing this is true also.  I worked with a colleague who would fail any paper with a single spelling error, regardless of any other merits the paper might have.  For him, correctness and style were tightly linked--errors ruin the style.  Or imagine a well-reasoned essay about the Battle of Britain, where the student misspells the names of all the important leaders in Britain, Germany, and the US.  Is that correctness or content or both?

These effects tend to complicate the clean-looking lines of rubrics, and the way people tend to think of them.  They are not precision instruments, but guides.  Their most valuable product is not numbers but ideas from focused conversations.  Rubrics are not measurement tools in any reasonable definition of measurement.

The point is not that rubrics are bad or that centering is bad. Rather, the point of this discussion is to consider the complexities of the situation when designing assessments and interpreting results.  If we "agree to agree" we lose information, and sometimes that's probably okay.  But other times it might be useful to try to understand what the subtleties are.  Imagine if we forced a diverse audience to watch the newest summer movie and then kept them in the room until they all rated it the same way.  If we understand the complexities of what it is we're trying to do, we will report more honestly (more modestly probably), and more usefully.  If you use a writing rubric across different disciplines, blend all the ratings into one average and then advertise that "we have measured the writing ability of our students to be X," it's misleading and not very useful.  Unfortunately, enough of that happens that administrators, accreditors, and the department of education seem to think that such sentences are meaningful. That's a bad thing. 

Aren't beans interesting?

Sunday, May 16, 2010

Higher Ed News

I cleaned up the higher ed news page (link on left,  or click here) to include more news items.  I thought it was time to get rid of the google trends graph on financial aid.  I replace two minor sources with reddit's education page and one from delicious.com.  There are no ads, of course.  It runs on google docs, yahoo pipes, and rss feeds from the sources.

Time Flies Like an Arrow

...and fruit flies like a banana.

Words are tricky things.  Jerome Kagan puts it much more elegantly in his book What is Emotion? He takes issue with loose language in scientific writing like "yeasts cooperate," writing that:
Poets possess the license to use a predicate any way they wish, as long as an aesthetic effect is produced.  Sylvia Plath can write the swans are brazen and pots can bloom, but scientists are denied this creative freedom because the meaning of a predicate for a process, action, or function always depends on the objects participating in the process.
He elaborates on this point, using an example of psychologists attributing "fear" to mice who have been exposed to unpleasant stimuli, and shows convincingly that a physiological response is not the same as an emotion.  The larger point is that processes are more important than outcomes for scientific inquiry.  Assessing an emotional state as "afraid" is not as meaningful or useful as describing the process that caused a physiological reaction that we give a label to.

It's the identification and study of processes, not outcomes, that makes the hard sciences effective.  Anyone can see that the apple fell off the tree.  Counting and sorting the apples doesn't help us understand what happened. 

Both of these subtleties bedevil the business of learning outcomes assessment.  For the most part, we only study the results (the outcomes) of the educational enterprise.  There are some studies that try to identify better pedagogies (like 'active' vs 'passive' learning), but even with these it's not possible to build a procedure for predicting learning.  The only predictions that come out of learning assessment is of the sort "we saw x in the past, therefore we think we'll see x in the future."  If a student aces the calculus final, we believe that she will be able to do integrals in her engineering classes next term.

Even this simple inductive reasoning is subverted by what we call 'feature-creep' in programming.  Tests and other assessments can never actually cover the whole range of the learning we would like to assess, so we assume that we can broaden the scope.  Sometimes this is partially justified with other research, but usually not.  The inductive assumption becomes:
We saw x on our assessments, and we predict that we will see X in the future.
Here, X is some broadened version of x that uses the flexibility of language to paper over the difference.  I made a particular argument about this concerning the CLA in a recent post.  An analogy will make this clear.
A test for flying a plane is developed, with the usual theory and practice mix.  As part of the assessment, the test subject Joe has to solo on a single engined Cessna.  This is what I've called x, the observation.  We now certify that "Joe can fly."  This is X, our generalized claim.  Clearly, this is misleading.  Joe probably can't fly anything except the Cessna with any aptitude.  Would we trust him with a helicopter, a jet fighter, or a 747?  Hardly. 
Now the same thing in the context of writing:
A test for writing compositions is developed and standardized with rubrics and a 'centering' process to improve interrater reliability.  Joe scores high on the assessment and is labeled 'a good writer.'  But can we really claim that Joe can write advertising copy, or a lab report, or a law brief, or a math proof?  Hardly.
Generalizing beyond the scope of what has actually been observed is misleading unless one has good data to build an inductive argument from.  We can predict the orbits of celestial bodies we see for the first time because we have a model for the process.  Is there a model that tells us the relationship between writing an essay and writing a contract for services?

And yet if we look at the goals set out by well-respected organizations like AAC&U, we see (from the LEAP Initiative) these goals for learning:
  • Inquiry and analysis
  • Critical and creative thinking
  • Written and oral communication
  • Quantitative literacy
  • Information literacy
  • Teamwork and problem solving

All of these are general, just like "flying" or "writing."  This is where we begin to get into trouble.  The list is fine--the trouble is when we start to imagine we can assess these and use the results for prediction.  General skills require general assessments.  Judging from his autobiography, I think Chuck Yeager was pretty good at "flying," in a general sense--he spent his whole life doing it.  But in four years (or two years) of college we don't have the time nor the mission to teach a student everything there is to know about even one of the items on the bullet list.  The list above only starts to make sense if we narrow each item to an endeavor that 's achievable in our limited time frame.  For example:
  • Inquiry and analysis in world history
  • Critical and creative thinking in studio art
  • Written and oral communication in presidential politics
  • Quantitative literacy in basic statistics
  • Information literacy in using electronic resources
  • Teamwork and problem solving in basic circuit design
And of course, this is the normal way assessment works for improvement of a course or curriculum.  If we get enough data points of this specificity, we can study how generalizable these skills are.  We can compare notes between disciplines and see if critical thinking really means the same thing or not.  This is the Faculty of Core Skills approach.

But big general tests, big general rubrics, and calls for accountability at the institutional level all assume that x implies X.  This creates confusion and tension.  Victor Borden wrote recently about this in "The Accountability/Improvement Paradox":
In the academic literature and public debate about assessment of student learning outcomes, it has been widely argued that tension exists between the two predominant presses for higher education assessment: the academy's internally driven efforts as a community of professional practitioners to improve their programs and practices, and calls for accountability by various policy bodies representing the “consuming public.” 
 I don't see this as a paradox.  If you want a paradox, here's one from last week.  A paradox is a logical contradiction--this situation is an illogical one.  Borden gives a succinct summary of it (perhaps quoting Peter Ewell):
Assessment for improvement entails a granular (bottom-up), faculty-driven, formative approach with multiple, triangulated measures (both quantitative and qualitative) of program-specific activities and outcomes that are geared towards very context-specific actions. Conversely, assessment for accountability requires summative, policy-driven (top-down), standardized and comparable (typically quantitative) measures that are used for public communication across broad contexts.
This is a brilliant observation.  What sorts of big complex enterprises are measurable in this way?  Could you do this with a large business?  Sure--they are called audits.  A swarm of accountants goes through all the books and processes and a bunch of big documents are produced.  Or a bond rating agency like Standard & Poor does something similar and arrives at a singular summative grade for your company--the bond rating.  We can argue about whether or not this is something accreditors do (or should do), but these processes look nothing at all like what passes for generalized outcomes assessment, for example in the Voluntary System of Accountability (NSSE, CLA...).  That would be the equivalent of giving the employees at Enron a survey on business practices, and extrapolating from a 10% response rate. 

It's processes, not outcomes, that matter.  We only look at outcomes to get a handle on processes.  Audits look directly at processes for a good reason.  A giant fuzzy averaged outcome is worse than useless because it gives the illusion of usefulness thanks to the flexibility of the language.  The problem derives from misplaced expectations at the top:
Nancy Shulock describes an “accountability culture gap” between policy makers, who desire relatively simple, comparable, unambiguous information that provides clear evidence as to whether basic goals are achieved, and members of the academy, who find such bottom line approaches threatening, inappropriate, and demeaning of deeply held values.
As I have mentioned before, if policy makers want simple, comparable, unambiguous assessments of goals, they need to specify goals that can be assessed simply, comparably, and unambiguously.  So I went looking at the Department of Education to see if I could find them.  I came across a strategic plan, which seemed like the best place to look.  In the mission statement:
[W]e will encourage students to attend college and will continue to help families pay college costs. The Department intends that all students have the opportunity to achieve to their full academic potential. The Department will measure success not only by the outcomes of its programs, but also by the nation's ability to prepare students to succeed in a global economy as productive and responsible citizens and leaders.
That last bit is probably the most pertinent to this discussion. We could extrapolate these goals:
  • graduates should be competitive in the global economy
  • graduates should be productive
  • graduates should be good citizens
  • graduates should be leaders
What direct measures of these things might we imagine?  Here are some suggestions:
  • Employment rates
  • Salary histories
  • Voting records, percentage who hold public office, percentage who serve in the military
  • Percent who own their own buisnesses
I had trouble with the leadership one.  You can probably think of something better than all of these.  But notice that not a single one has anything to do with learning outcomes as we view them in higher education.  You can stretch and say we have citizenship classes, or we teach global awareness, but it's not the same thing.

I find the goals wholly appropriate to a view from the national level.  These are strategic, which is what they ought to be.  They're still to fuzzy to be much good, but that's just my opinion.  I'd like to see targeted goals at science and technology like we did during the space race, but they didn't put me in charge.

Digging into the document, we find actual objectives spelled out.  Let's see how well I did at guessing.  It starts on page 25.
Objective 1: Increase success in and completion of quality postsecondary education.
To ensure that America’s students acquire the knowledge and skills needed to succeed in college and the 21st-century global marketplace, the Department will continue to support college preparatory programs and provide financial aid to make college more affordable. Coordinated efforts with states, institutions, and accrediting agencies will strengthen American higher education and hold institutions accountable for the quality of their educational programs, as well as their students’ academic performance and graduation rates.
The last phrase is hair-raising.  Hold higher education accountable for students' academic performance and graduation rates?  That's like holding the grocery store accountable for my cooking.  Sure, they have something to do with it--if the melon is rotten or whatever, but don't I bear some of the responsibility too?

That's just the objective.  Next come strategies for implementation.
Strategy 3. Prepare more graduates for employment in areas of vital interest to the United States, especially in critical-need languages, mathematics, and the sciences.
The Department will encourage students to pursue course work in critical-need foreign languages, mathematics, and the sciences by awarding grants to undergraduate and graduate students in these fields. The SMART grant program will award grants to Pell-eligible third- and fourth-year bachelor’s degree students majoring in the fields of the sciences, mathematics, technology, engineering and critical foreign languages. In addition, priority will be given to those languages and world regions identified as most critical to national interests.
Well, I got what I wished for.  This outlines objectives that are clearly strategic in nature, and relatively easy to measure success of.  We can count the number of students who learn Arabic or graduate in Engineering.
Strategy 5 mentions learning outcomes for the first time, but the hammer falls in the next one:
Strategy 6. Expand the use of data collection instruments, such as the Integrated Postsecondary Education Data System (IPEDS), to assess student outcomes.
The Department will collaborate with the higher education community to develop and refine practices for collecting and reporting data on student achievement. To encourage information sharing, the Department will provide matching funds to colleges, universities, and states that collect and publicly report student learning outcomes.
We're going to put learning outcomes in IPEDS???  Didn't Conrad write about this? Where's all that useful stuff I dreamed up in one minute about employment, salaries, citizenship, and such?  Those pieces of the mission statement aren't even addressed.  Or maybe I missed them due to the lack of oxygen in my brain after reading strategy six.  It seems perfectly obvious that IPEDS will want comparable learning outcomes, which can only mean standardized tests like the ones the VSA requires.  If so, it won't be the end of the world, but it will be negative.  Beyond the expense and missed opportunity, we will institutionalize the kind of misplaced reasoning that in the earlier example would lead one to believe that Bob the new pilot can fly an SR-71 Blackbird. 

Coming full circle to Kagan's book, he remarks that
A scientific procedure that later research revealed to be an insensitive source of evidence for a concept, because the procedure was based on false assumptions, is analogous to a magical ritual. 
He give the example of Rorschach's inkblot tests.  That's hardly an apt analogy for standardized tests reported out nationally as institutional measures of success, however; inkblot tests are too subjective.  I suggest phrenology as a better comparison.

Postscript. As I was writing this, my daughter Epsilon was studying for her French "End of Course" standardized test (see this post).  I went down and reviewed a few sections with her.  She didn't know the names of the months very well, so I suggested making some flashcards.  She told me this: "I don't need to be able to spell them or tell you what they are, I only need to be able to recognize them.  It's multiple-choice."  The test only assesses reading and listening.  Speaking and writing are not part of it.  Since speech production and understanding are in different parts of the brain, we actually have some clue about the process in this case, and it tells us that the test is inadequate.  This may be okay--maybe those parts are covered somewhere else.  The point is that the subtleties matter.  If I said "Epsilon aced her french final," you might be tempted to think that she knows the names of the months.

She's down there making flash cards now. 

Wednesday, May 12, 2010

A Paradoxes

Blogger informs me that this is the 300th post to this blog.  By way of celebration, I gave it a face lift using the more modern tools available now, so that the recent posts are displayed on the left as well as the tag list.  Ain't technology grand?

I came across an interesting paradox on Reddit, and have improved it (I think).  Here it is:
If you were to choose one of the responses below using a uniformly random selection, what is the probability that you would choose a correct answer?

A) 0%
B) 25%
C) 25%
D) 50%


A will be guessed 25% of the time, so we can't say A is the correct answer, or it contradicts itself.

Suppose B is correct.  Then C must also be correct. But together they make 50% of the choices, which means neither of them can be correct.

Clearly D can't be correct, since it only occurs once. 

So none of A-D can be correct, and there is a zero chance of guessing the answer, right?  But wait--that means that A was correct after all!  So there's a 25% chance of guessing right, but wait--that means B and C are correct, but we'll guess that 50% of the time, but wait--....

How can you get out of this?  I have a proposed solution below the break, but it's kind of a cop-out.

The problem seems to be with self-reference.  The question "what is the probability that you would choose a correct answer?" means "what is the probability that you would choose a correct answer to this problem?"  We could create similar problems using it as a model.  For example:
The correct answer to this question is 'False'
___True
___False

That one's not very subtle.  It's just a variation on the classic Liar Paradox.  You can read about it and theories for resolution on Wikipedia's page, and there's another nice article on it here.  Note that Stephen Yablo creates such a paradox without self-reference (link), using an infinite list of linked references.

Let me say up front that I'm not a logician.  I will undoubtedly either 1. trip over some elementary mistake, or 2. repeat something discovered 35 years ago.  I won't let that spoil the fun; paradoxes are too entertaining to leave to the logicians and philosophers.

First, forget about Truth.  We already know, thanks to Gödel, that even modestly complex axiomatic systems like Peano Arithmetic cannot be proven to be self-consistent.  So if we're honest, a proposition in some axiomatic system could have these truth values:
  • Proven true with a deductive argument based on axioms, contingent on not being contradicted.
  • Proven false with a deductive argument based on axioms, contingent on not being contradicted.
  • Proven contradictory: proofs for true and false both exist (this is usually a BAD THING)
  • Proven Indeterminate (a logical proof exists that we can't assign true or false--more on this later)
  • Unknown (we have no proof one way or the other)
That sound you heard was the Law of the Excluded Middle popping.  We can imagine that we've numbered all the possible well-formed propositions like "1+1=3" (provably false) and so on in a loooong list.  We make a table out of those, numbering the propositions 1,2,3... and list beside each one the status.  There are infinitely many true ones and infinitely many false ones, and we hope no contradictory ones.

So about this indeterminacy thing.  To approach that, we need to construct a proof machine for finding if a proposition p is contingent-true.  We'll call it T.  It works like this:  given a number n that corresponds to a proposition p, it starts assembling arguments from axioms in a systematic way so that all possible arguments are covered, building ever more complex ones, to see if p is true or false.  That is, it tries to find arguments for p and for ~p systematically.  We can imagine it alternating between the two.  Of course, T is completely impractical--it's just a theoretical tool.  If we give T a proposition p that's not provable, it will run forever and not give us an answer.  Nevertheless, it's very useful to imagine.  Let's say that our first two propositions are number zero, which is just False, and number one, which is True.  So if T ever stops running, it will return a zero for false and a one for true, the numbers representing the propositions.

For example, if we are clever and figure out that a proposition has a proof, then we can be sure that T(n) =1.  If proposition 123 is "1+1=3", we can be sure that T(123) = 0 even if we don't actually run the program described; it would eventually find the same proof we did, or a shorter one.

Now that we know what we're doing, we will allow the T function to be included in propositions.  In order to do this, we have to imagine that T can run a recursive copy of itself when needed.  So, for example, it can resolve this situation, where the numbers on the left are the proposition index numbers.
12: T(13) = 1         in other words, proposition 13 is true
13: 1+1=2
In order to find T(12), we have to imagine the computation of T(T(13)).  The 'outside' T has to pause and wait for the 'inside' T to finish.  If it ever does finish, it will have a zero or one presented to it.  In this case,
T(12) = T(T(13)) = T(T(1))
Since the truth of 0 or 1 is trivial to ascertain, T(T(1)) = 1, and T(T(0)) =0. Now suppose for some proposition k we have:
k: T(k) = 1
If we want to say that proposition k is true (contingent, of course), we have to be sure that T would eventually find a proof for it, if left to run long enough.  Would it?  Let's see what would happen.
T(k) = T(T(k)) = T(T(T(k))) =....
Recall that in order to call proposition k true, we have to be able to argue that T will terminate with a 1 result.  Clearly it won't.  It also won't terminate with a 0 result.   So proposition k is provably indeterminate.  

We could argue thus: since T(T(n)) = T(n) we can just telescope all those Ts and say that that T(k) = 1.  But if we do that, we've constructed new forms of argument--a meta-logic outside the system described above.  Of course, T is only provably indeterminate in meta-logic too.  There are choices we can make about what flavor of logic we like.

The Liar's Paradox becomes trivial--it's the same example with a logical twist.
k: T(k) = 0
The English version is "This statement is false."  T still recurses infinitely, and we have to say that proposition p is provably indeterminate (from our point of view). 

We can create variations that include loops that are not strictly self-referential:
m: T(n) = 0       another way to write this is ~T(n)
n: T(m) = 0
This is like two squabbling teens pointing at the other saying "he's wrong!" and "no, she's wrong!".   What's the value of T(n)?  We can watch it self-destruct:
T(m) = T(~T(n)) = T(~T(~T(n))) = ...
It should be clear that any self-referential or looped sequence of propositions will never allow T to terminate.  The example at the top of the post would clearly qualify, although the rules for the propositions would have to be clearly articulated in a language.  This may sound unsatisfactory as a punch line. 

We could go further.  We could imagine a proposition with broader aim:
g: for any number n, T(n) = 1 implies T(~n) = 0  and T(~n) = 0 implies T(n) = 1
This proposition claims that there are no contradictions. [Edit: not really, since by definition T(n) and T(~n) return the same value.  We'd need a new "contradiction operator," but it's not clear that this is definable in a useful way.  This is probably a severe limitation.]  That would be nice.  But if n can equal g, then we have an infinite recursion, and g is indeterminate.  But here's an idea.  What if we try to exclude g from the proposition to avoid the recursion?
g: for any number n other than g, T(n) = 1 implies T(~n) = 0  and T(~n) = 0 implies T(n) = 1
But we still have to worry because some of those other propositions out there might contain a reference to g, and still suck us into a loop.  In fact, because all propositions are eventually on the list somewhere, there's sure to be an infinite number of them.  In order to get anywhere, we'd probably have to exclude all references to the T operator.
g: for any number n, where proposition n doesn't use T, T(n) = 1 implies T(~n) = 0  and T(~n) = 0 implies T(n) = 1
Even this won't work because we have an infinite number of cases to check, which is impossible for T in finite time.  So this form of proposition can only be determined  by T if the set is finite.

Let's look at Yablo's paradox.  He constructs a list of propositions like this:
2: for all k > 2, T(k) = 0
3: for all k > 3, T(k) = 0
4: for all k > 4, T(k) = 0
etc
This doesn't leave much room for other propositions, but we can work around that. If any one of these is true, the following ones are false.  But in that case, there must be a true one out there somewhere, violating the premise.  So none of these can be true to avoid a contradiction.  But if, say number 2 is false, there must exist at least one that follows where T(k) =1, another contradiction.  What to do?

If we imagine our T machine trying to evaluate T(2), it's obvious that it will never finish because it has to check an infinite number of cases.  So all of these are indeterminate.

Maybe this construction throws the proverbial baby out with the bathwater.  But it does seem to take care of lots of of paradoxes.


Concluding note:  when I think about a paradox like Liar's, my brain finds itself in a loop "true means false, false means true, etc."  This is similar to constructions in programming languages that are useful for various things:
x := ~x
This flips the variable x back and forth between true and false as long as the code is run.  So what is quite normal in programming seems paradoxical in ordinary logic.  The difference is time.  If we allow variables (truth, even) to change over time, we can accommodate a paradox.  Here's a fanciful notion:  maybe time is just the universe's way of resolving paradox.

Update: There's a problem with the definition of T, but I think it's fixable. Having T simultaneously try to find a proof for p and a proof for ~p is cute, but it causes problems.  The most severe is that if we ourselves find a proof for p, we cannot assume that T(p)=1, because T just might find a shorter proof for ~p.  This can't happen unless the axioms contradict themselves, but we haven't assumed that.  One way out is assume up front that the axioms are not contradictory--kind of an insurance rider: "in case of axiom failure, all results below are invalid."  This is a practical approach, and the kind of thing we have to implicitly assume anyway.  Another solution is to have T only check for provability of p and not for ~p.  This is more powerful because it allows us to entertain propositions like g above.  I assume that we adopt the convention that "~p" means "the number of a proposition that corresponds to the logical negation of the proposition numbered p." 

Readers who are interested in this topic may also find Löb's Theorem interesting.

Tuesday, May 11, 2010

Berry Stein on the CAT

This guest post is by Dr. Stein at Tennessee Tech University.  He is co-creator of the CAT instrument I wrote about in "Pizza, Dinosaurs, and Critical Thinking."

David,

I enjoyed your recent blog on critical thinking. When we were developing the CAT instrument, we knew there was no way we could measure all aspects of what faculty in all disciplines call “critical thinking.” We searched for areas of faculty agreement across disciplines (not an easy task either) and skills that could be measured in a reliable way in a relatively short period of time with mostly short answer essay questions that would reveal something about how students think. This is clearly a subset of all critical thinking skills, but a subset that may be useful for general assessment purposes because it cuts across many disciplines. There is no silver bullet for measuring all critical thinking skills and we recommend that institutions try to use multiple measures that can provide converging evidence of student learning.

We also agree that an important feature of the CAT instrument is that it is scored by an institution’s own faculty. Engaging faculty in the scoring process clarifies what skills are being measured by the instrument and allows faculty to directly observe student’s strengths and weaknesses in those areas. We have found that these experiences make it much easier to engage faculty in discussions about ways to better prepare students to think critically than simply learning that your students are at, below, or above some national norm. The CAT instrument also provides a model for how faculty can develop discipline specific activities and assessments that are relevant in the courses they teach in their own discipline. This is important because how instructors assess student learning greatly affects what students will try to learn. If tests are geared toward rote retention then those expectations tend to drive student learning toward memorizing information. We hope that the CAT instrument provides a model of how to move away from just assessing and encouraging the rote retention of information. This does not mean that discipline specific knowledge is not important – as you point out discipline specific knowledge is an important part of critical thinking but students need to learn how to use that discipline specific knowledge to think critically and solve real-world problems. We have found that faculty can help their students make gains on the CAT by teaching critical thinking skills that are relevant to their discipline.

Although the CAT instrument involves mostly essay type questions, we do believe that scores assigned at one institution are comparable to those assigned at another. It is possible to compare scores across institutions because of the extensive work that went into refining the detailed scoring guide/training materials, the 2-day regional training workshops for preparing representatives at each institution to lead scoring workshops on their own campus in a consistent way, the use of multiple scorers for each question, and the scoring accuracy checks that we conduct on a random sample of tests scored at each institution to help insure scoring accuracy. The latter feature is relatively unique for a test of this type. The evidence that we have collected, to date, indicates most collaborating institutions are scoring student responses similarly with less than 5% error when compared to scores assigned by our expert scorers at TTU (see a more recent article on our website www.CriticalThinkingTest.org).

The CAT instrument is not appropriate for all institutions, but it may be a useful tool for institutions that want to better connect the assessment of critical thinking skills with faculty efforts to improve those important skills across a variety of disciplines.

Barry Stein
Tennessee Tech University
www.CriticalThinkingTest.org