Thursday, December 14, 2006

Complexity and Recognition

I actually had time to think today. Many days feel like being the subject of one of Skinner's experiments.

So I sat outside and ate the sandwich Adelheid made me for lunch. It was a sunny 65 or so. I was reading a book on the philosophy of science, which title eludes me at the moment. It was weeded out of the library collection, and I rescued it from the give-away cart. In the introduction, the author writes about perception and understanding, although he's a lot more concrete about it. He uses examples like Tycho Brache and Johannes Kepler looking at the sunrise. One sees evidence that the earth moves around a fireball, and the other evidence that the fireball moves around the earth. That is, the sensations hitting their respective retinas are roughly the same, but their understanding of the matter is quite different. Are they seeing the same thing? In the prosaic sense, yes, but in a higher order definition of seeing, they are not. Why not?

This is obviously going to be speculative, but here's a thought. What if we percieve by applying simple rules to the input that comes in on our optic nerve, etc. So if we see an asymmetrical blob loping along the ground we might identify the stick-like pieces as legs, notice that there are four, and identify it as a quadraped. The size of the thing and length of legs narrows it down quickly to a few possible animals, since it's animated. With a small amount of information we can pinpoint it as dog or dingo, depending on what continent we live on.

This suggests that our brains do a continuous data compression of input. Patterns that are identified reinforce the existing definitions, and those that aren't must be decided upon. Are they noise or are they meaningful? Is the creak in the middle of the night just the house settling, or is it a madman with a power tool? This brings us back to Skinner. If a cause is associated with an effect (immediately in the simplest case), then perhaps we file it away as meaningful. Otherwise it's ignored.

But effects have to be noticed. In this over-simplistic model, the ability to entertain the notion that an effect has been observed drives one's acquisition of knowledge (i.e. rules of inference). I imagine that some people have lower thresholds than others. If it's too low, you'd see portents in every random occurence. Perhaps you'd become paranoid and removed from (average) reality. The opposite affliction might be to continually ignore signs that your internal beliefs were wrong--like Brache's was. Finding the middle ground and staying there probably requires something special. I wonder what it is, and I wonder if it can be taught.

UPDATE: The book is Patterns of Discovery by Norwood Russell Hanson, Cambridge University Press 1969.

Tuesday, December 12, 2006

Back from Orlando

The SACS conference was short (for me) but sweet. I've learned over the last few years how to get the most out of it. Rule #1 is don't miss the round table discussions, and get there early. I forgot to get a wake-up call for Monday and almost overslept.

I ran into several old and new friends and colleagues, but didn't have nearly enough time to socialize and talk shop over a glass of wine. My wife and daughter came along on this trip in order to take advantage of the theme parks, so I spent some time driving them around.

The shuttle launch alone was worth the trip. There's something about seeing it in person that's very moving. I went out to the parkinglot about 10 minutes before launch and, like a few other people, asked the staff for the general direction to look. I kept looking at my watch and squinting at the horizon to see a blip of light. I needn't have bothered. Half the sky lit up in orange, and the rocket was almost too bright to look at.

I talked to several experienced folks about the 3.3.1 standard, and confirmed for myself that it really isn't being applied as written. I included that in my presentation. You can see the slides and copious notes if you look here.

Friday, December 08, 2006

SACS Conference

I give a talk on Monday at the annual meeting of the Southern Association of Colleges and Schools. It's in Orlando, and I'm hoping to see a shuttle launch while I'm there. I'm putting my meeting notes here. The slides aren't finished yet. They'll be uploaded when I get back.

Thursday, November 16, 2006

Coker College Cited in NSSE Report

Our core (liberal arts) skills assessment, which we call Faculty Assessment of Core Skills (FACS) was cited in the 2006 Annual NSSE Report.

In an innovative approach to measuring value-added learning,
the Coker College institutional research director coordinates faculty
assessments of core general education skills (FACS) of analytical
thinking, creative thinking, effective speaking, and effective
writing. Each course has rubrics to determine the levels of
achievement for areas relevant to the course. Faculty rate student
performance in these areas at the end of the term, with the
ratings being independent of grades. As expected, students’ FACS
scores increase from their first year to senior year. Individual
NSSE items related to general education outcomes positively
correlate at modest levels with FACS scores, except for the
item, “coming to class without completing assignments” which
was negatively correlated. The NSSE self-reported gain item,
“acquiring a broad general education,” correlated at .25 (p<.02) with FACS. Other relationships between FACS and NSSE items were generally in the expected direction, suggesting that engagement matters to desired learning outcomes.
You can find out more about this process, which I call Assessing the Elephant in presentations, here.

Tuesday, October 24, 2006

Second Ed

I was talking to a friend the other day over lunch about the educational possibilities of Second Life (SL) for education. Neither of us being experts in the current uses of SL for teaching, we wondered if there was the equivalent of a whiteboard for in-world lectures. This evening As I was looking over Moodle plug-ins, I found one for SL that linked to what looks like PowerPoint for SL.

Included: a site devoted to using SL with Moodle, with a whitepaper on the subject.

Sunday, October 01, 2006

Creating Talent

A while back I blogged about a New York Times piece by the authors of Freakonomics, giving a facinating example of how talent is made. This questions the common conception of talented individuals being distinguished by some almost mythical attribute "talent." The New Scientist recently published a similar article called "How to Be a Genius" (subscription required for the full article). The author, David Dobbs, cites a recently published book The Cambridge Handbook of Expertise and Expert Performance summarizing research in the field. One review reads in part
"[the book] makes a rather startling assertion: the trait we commonly call talent is highly overrated. Or, put another way, expert performers - whether in memory or surgery, ballet or computer programming - are nearly always made, not born. And yes, practice does make perfect."
After mulling this over for a while, I started to wonder why we don't pay more attention to attitude in education. That is, we try to teach content and abilities, and we brag about good teachers who can motivate students, but we don't (as far as I know) take student attitudes seriously enough to try to measure them. If we could measure them, maybe we could improve them.

For example, general education at a liberal arts college is supposed to expose students to a range of subjects. Usually unstated is the goal that we hope a student will find a subject that really interests her, and she'll pursue that through a degree program. And this may even work, but I'm sure that we've never tried to find out if this happens. Moreover, the general education program as a whole isn't really designed with that goal in mind.

The convenient bureacracy of learning divided up into neat classes and programs largely ignores learning that happens outside the classroom. A student who is good a cramming for tests looks the same as one who slowly mulls over ideas in really immerses himself in a subject--as long as the test scores are the same.

So suppose that in parallel to our existing way of looking at education as a collection of knowledge and skills, we began to find ways to affect attitudes of students toward intellectual activity. The objective is to start the kind of positive feedback loop that is described in the articles cited: interest leads to practice, which leads to ability, which leads to enhanced interest. If an intellectual activity is pleasurable, one is more likely to spend one's own time doing it.

I chair our Institutional Effectiveness Committee, and proposed such a program at our last meeting. As usual, I'm impatient to get to the 'doing' part of any project, so working with committees is usually frustrating. I have to say that the idea didn't get an overwhelming reception, probably because it's new and radical, and (more importantly) work for someone. Since I'm the Institutional Research director, I'll just forge ahead with some trials. I already built a little rating system in Flash. You can see it here. I'll probably attach it to the student's portal home page and put some test items up to see what they tell me.

I've already changed the way I teach my math class this semester, based on these ideas. The first thing I did was distribute the New Scientist article and discussed it with the class. I'm also spending more time taking little detours into the really cool corners of math and computer science that have the potential to hook them on the subject. But mainly I stress the need to actually practice to become good. It takes a lot of work to be innately talented!

There is yet another article on the theme What Kind of Genius are You? that appears in Wired, comparing famous artists' productivity as they aged. The author, Daniel K. Pink found that

[...] genius – whether in art or architecture or even business – is not the sole province of 17-year-old Picassos and 22-year-old Andreessens. Instead, it comes in two very different forms, embodied by two very different types of people. “Conceptual innovators,” as Galenson calls them, make bold, dramatic leaps in their disciplines. They do their breakthrough work when they are young. Think Edvard Munch, Herman Melville, and Orson Welles. They make the rest of us feel like also-rans. Then there’s a second character type, someone who’s just as significant but trudging by comparison. Galenson calls this group “experimental innovators.” Geniuses like Auguste Rodin, Mark Twain, and Alfred Hitchcock proceed by a lifetime of trial and error and thus do their important work much later in their careers. Galenson maintains that this duality – conceptualists are from Mars, experimentalists are from Venus – is the core of the creative process.
This resonates with Lee Smolin's two types of scientists--those who solve existing problems, and those who go looking for new ones. He explores this and much more in his excellent new book The Trouble with Physics.

Friday, August 25, 2006

Online Whiteboard

Last night I taught online for the first time. I had to miss class today to go to a meeting, so I scheduled an online chat with the students at vyew.com. After you register, you can create a session and then invite other people. At your disposal are a chat room and an interactive whiteboard. You can see where two students worked problems, and my comments.

This went along with the chat window (names changed here):

Me: Allen, I'm going to have to do yours in two steps to see what gives...
Me: notice how the negation (minus sign) attaches itself to each thing inside--just like normal algebra?
Me: Now I've pulled the negation inside the parentheses using de Morgan's rule a second time
Me: You got it.
Me: See--I gave you the hard one!
Allen: ok, i just didnt see the part about switching signs
Me: It's very easy to miss stuff with logic.
Kate: hi im back sry
Me: Hi Kate, and hi Scott!
Steve: hey, I'm trying to see what we're doing here
Me: We've been doing problems on de Morgan's rule--see page 24
Steve: ok, thanks
Me: Okay--let's go back to Janell's question. Scroll up and take a look.
Me: Can somebody tell us what characterizes associativity? It's true in logic and ordinary algebra.
Me: Mark, wanna take a shot at it?
Mark: both sides of the equation are equal?
Janet: they can be flip to equal the other side
Me: Important point--for each of these, it's two ways of writing the same thing.
Me: And so you can flip them around, yes. What else? Steve?
Me: Take a look at the parentheses in the associative law.
Steve: they can be switched around
Janet: something about the first letter being the parentheses
Me: exactly--associativity means moving around parenthesees
Me: Now compare this to the commutative laws...Kate--what's the difference?
Kate: im not sure
Janet: would you be moving the letters
Me: Okay--notice there aren't any parentheses to move, right?
Janet: yes
Me: So what moves?
Janet: the letter
Janet: or is it the sign
Me: Right--the variables change sides.
Janet: is that true for all the cases?
Me: So, associativity = moving parentheses, commutativity = moving variables.
Me: Yes, it's always true.
Janet: ok

Comparing Course Management Systems

Here's a great resource if you're in the market for a CMS. I'm going to install Moodle on my virtual machine and see how it goes. I really like the way they built math notation into the system. Also, it's compatible with Blackboard courseware.

Friday, June 09, 2006

General Education Assessment

We now have three years' data on our general education assessment model. You can read about it in great detail in Assessing the Elephant. I have so far focused on checking the validity of the data to see if it actually means anything. I'm beginning to think it does. Here's a graph showing the distribution of scores over time as a class ages.
The data here aren't truly longitudinal--it's a composite of three classes to give a good sample size and show performance shifts over four years. A similar approach compares students based on their overall grade point averages. All students in these data sets were still attending as of Spring 2006, so survivorship issues are controlled for.
Here you can see that the better students actually seem to be learning. The middle class of students learns more slowly, and the students with GPA < 2 don't seem to be improving at all. These studies and others seem to validate our approach. The holy grail of this investigation is to be able to pinpoint the contribution of individual courses to a student's skill improvement. I've worked on that quite a bit but have concluded that I don't have enough samples yet, and I haven't developed a sophisticated enough approach to the problem. Stay tuned.

Wednesday, June 07, 2006

Article on the Commission on Higher Ed

The article in University Business is out now. The online version has a cool Flash-driven interface that lets you flip pages. I had a conversation with a friend--I'll just call him Bob--the other night about the topic. He was pretty critical of the idea of using earned income as a proxy for learning. Bob was also quite dubious about the prospect of trying to isolate the effect of college from other factors. This problem would, of course, be shared by any other approach as well. The reason is that people who choose to go to college are different from those who do not. How much difference there is might be detectable by looking at partial completion, two years vs four years, etc. I think this can be overcome, and would certainly be easier using income rather than scores on a test. Another problem is how to account for graduates who start their own business or otherwise have their actual "income" being non-comparable to a salaried position. And what about people who intentionally choose lower-paying jobs for personal reasons. Someone who graduates and heads off on a Peace Corps mission may not really be judged simply by looking at salary. This begins to prompt questions about what it is we really value. That discussion should precede any measurements and judgments we make about the effectiveness of higher education, and certainly precede actions.

Friday, June 02, 2006

WEAVEonline

I went to a very good conference at Elon University on Wednesday--the North Carolina Independent Schools and Universities Assessment Conference. One of the speakers was Jean Yerian from Virginia Commonwealth University. They have developed their effectiveness planning management system into a commercial product that she demonstrated for us. You can get more information at www.weaveonline.net. I'm not sure what the cost is, other than it's FTE-driven. The strengths of the system seem to be rich reporting features including and administrative overview that quickly shows compliance status. So if the English Department is slow about putting their plans, assessments, actions, or mission on the system, it shows up as Not Begun, In Progress, or Complete. They are planning to add a curriculum mapping feature that creates a matrix of courses and content per discipline. There is a comment box that allows notes about budgetary impact of activities and follow-up plans. As yet they do not have a document repository built into it, but I think that may be in the works too.

This product has many more features than openIGOR does, and it's designed with a slightly different purpose in mind. The functionality of openIGOR starts with the file repository and builds up from there, whereas WEAVEonline goes in the other direction. A weakness of both systems currently is that they do not tie directly to evidence of student learning without extra work. The logical extension of each of these systems would be a link to a portfolio system. We currently have an electronic portfolio system, but it's not connected directly to assessment reporting yet. I hope to have that programming done this summer, though.

Thursday, May 18, 2006

Welcome to IGOR

I got permission from our president to modify and release the College's code for our institutional effectiveness system. I hope to officially roll it out in December at the SACS meeting. I started a blog on it at openIGOR.blogspot.com. The acronym, by the way, stands for Integrated Goals and Objectives Reporting.

Dropping SAT

In my last post I wondered if there wasn't a pool of applicants who are undervalued because of low SAT scores. In our own predictions of college success, SAT doesn't explain much of the variance, yet we continue to use it (although high school GPA is weighted much higher).

This article at Newsday.com talks about Drew University's decision to stop requiring SATs.

With test scores optional, Drew's admissions officers are concentrating more closely on how well applicants did in high school. They pay special attention to whether students took tough classes, and for leadership qualities.
Their enrollment is up, which may have something to do with their decision.

Tuesday, May 09, 2006

The Myth of Talent

There's an article in the New York Times by Stephen J. Dubner and Steven D. Levitt, authors of Freakonomics. This article is about the reality behind what we commonly call "talent." They reference Anders Ericsson, whom they describe as the "ringleader of what might be called the Expert Performance Movement" who conclude that:

[E]xpert performers — whether in memory or surgery, ballet or computer programming — are nearly always made, not born. And yes, practice does make perfect.

And because people will naturally spend more time practicing things they like to do, Ericsson concludes that there are implications for education.

Students should be taught to follow their interests earlier in their schooling, the better to build up their skills and acquire meaningful feedback.

This raises some interesting questions. First, it confirms what any teacher knows--making learning more interesting is more effective. My 8-year-old daughter used to love math. After her bedtime story, I would give her simple algebra problems to solve, and she relished the challenge of finding two numbers that add to 20 and subtract to give 4. But, now in the third grade, and confronted with things like long division, her attitude about the whole subject is souring. It's not the fault of the school, which is a very good one, or the teachers. It may be that the whole way we think about teaching rote skills like long division may need to be rethought.

Now for the flip side of the coin. There are indications that IQ, or general intelligence 'g', is partially dependent upon brain physiology. A recent article in NewScientist is one example. Others I've seen recently relate brain volume or physiology of particular parts of the brain to IQ. If we accept at face value that the physical nature of one's brain has a lot to do with academic "ability," it raises interesting questions.

Do admissions criteria like SAT scores or high school GPA derive more from static IQ-like abilities or from a student's potential to become interested in and eventually expert in a body of knowledge? Most teachers have probably had the frustrating experience of having a student who is very quick to grasp new ideas and connections--even brilliant--but who simply will not take the time to develop these gifts by doing assignments. On the other hand, we have found that our predictions of student success, based on SAT and GPA, are rather weak. In fact, students admitted with the lowest possible scores still succeed 50% of the time, some of them quite dramatically. We graduated a student this year with a perfect 4.0 who was admitted under a special program for under-qualified applicants.

This suggests an opportunity for institutions seeking to improve or enlarge their admissions pool. Because instruments like SAT in particular may not give enough weight to the potential of students with high interest/curiosity/drive, this may well represent a distortion in the admissions market.

Students are given scholarships based on their apparent ability. Hence students with high SAT scores must be competed for. But there may well be a group of "underpriced" applicants out there who do not show up on the SAT radar, but who would excel at your institution. How to go about finding those applicants would be a very interesting research project.

Possible actions following this line of thought include:

  • Place value on how interesting students find courses, particularly survey courses designed to introduce students to a field of study.
  • Try to find indicators like school attendence (perhaps) that may predict positive attitudes in students. Give these indicators weight for admissions purposes.
  • Involve students outside the classroom, ideally with faculty.
  • Create Web 2.0-like communities of study so that students can feed off of each others' interest. We've had some success in that area by making physical spaces available for Science,Math, etc.

This reminds me of an interview I heard on the radio with a former English teacher who had his best success by asking students to write an excuse why they couldn't do the assignment!

Wednesday, May 03, 2006

HigherEd BlogCon Link

My article on HigherEd BlogCon is here. It's a tongue-in-cheek account of getting the library and IT to play nice together to help us through accreditation and other natural disasters.

I also have an article on the Commission on the Future of Higher Education and its talk of standardized testing. This article is slated to appear in the June edition of U. Business.

Institutional Repositories

I got an email survey the other day from a group that's studying institutional repositories. One of the questions asked which repository we use. I printed out the list to bring home, but don't actually have the name or URL of the group doing the study. I'll post that here when I get back to work. Anyway, here's the list. The hyperlinking and descriptions are mine. Note that these repositories are mostly designed to hold academic works. They're more similar to online journals than archives.

ARNO From the Netherlands. Designed to archive academic research papers. Looks like open source software.

bePress The Berkely Electronic Press. Journals and institutional respositories (via licensed code, it looks like):

The Berkeley Electronic Press works with universities, research centers, institutes, departments and other hubs of knowledge to create electronic publications series. Many institutions license our repository technology, offered since 2004 in conjunction with ProQuest Information and Learning. Learn more about the DigitalCommons@ Institutional Repositories. Content from the following repositories and publication series are available freely and without restriction to all readers. [link]

CDSWare Cern Document Server:


...to run your own electronic preprint server, an online digital library catalogue or a document repository on the web. It complies with the Open Archives Initiative (OAI) and uses MARC21 as its underlying bibliographic standard. [more]

ContentDM From their website:

Setting industry standards for digital collection management, CONTENTdm provides tools for everything from organizing and managing to publishing and searching digital collections over the Internet.

The most powerful and flexible digital collection management package on the market today, CONTENTdm handles it all—documents, PDFs, images, video, and audio files. CONTENTdm is used by libraries, universities, government agencies, museums, corporations, historical societies, and a host of other organizations to support hundreds of diverse digital collections. Visit our customers' collections or learn more.

DigiTool From their website:

DigiTool (R) is an enterprise solution for the management of digital assets in libraries and academic environments.

DigiTool enables institutions to create, manage, preserve, and share locally administered digital collections. By improving the integration of digital collections with institutional portals and e-learning systems, institutions running DigiTool provide users with a seamless working environment.
DiVa This one seems like a specific application, not licensed. You can read about it here.

Documentum This looks like it's for really large applications. Probably expensive.


Dpubs Open source. From their website:

DPubS (Digital Publishing System) is a powerful and flexible open-source system for publishing digital documents... (more)
DSpace An open source project developed by MIT and HP. From their website:

DSpace is a groundbreaking digital repository system that captures, stores, indexes, preserves, and redistributes an organization's research data.
Fedora No, this isn't Red Hat. It's another open source solution. From their website:

[...] This unique combination of features makes Fedora an attractive solution in a variety of domains. Some examples of applications that are built upon Fedora include library collections management, multimedia authoring systems, archival repositories, institutional repositories, and digital libraries for education.

GNU Eprints Free software. From their website:

[...] enables research to be accessible to all, and provides the foundation for all academic institutions to create their own research repositories.
Greenstone Open source. From their website:
Greenstone is a suite of software for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet or on CD-ROM.
HarvestRoad This looks like a commercial solution. From their website:

HarvestRoad Hive® is a federated digital repository system that manages sharing and reuse of any form of content in any online learning environment across any number of locations or countries and integrates with any Learning Management or ERP System.
Innovative Interfaces A commercial solution targeted at libraries.

i-Tor Open source. From their website:

i-Tor is an open-source (free) technology that enables you to create websites. They may be straightforward web pages, or information from a database, an Open Archive*, or some other file. i-Tor can also be used to make modifications: the creator of a web page can manage it directly on the site, either alone or in collaboration with others.
Luna Insight Looks like it's mainly designed for images. Commercial.


MyCoRe Open source. From the website:

MyCoRe is an Open Source project for the development of Digital Library and archive solutions [...]

OPUS This seems to be open source, but I couldn't find a home page for it.

Sun SITE From Sun software and the UC Berkeley library. I'm not sure that you can use this for an institutional repository, though.

Virginia Tech ETD Software. ETD stands for electronic theses and dissertations. I'm not sure if you can have their code, though. If so, I haven't found the right link.

While tracking these down, I came across A Guide to Institutional Repository Software. It has details about several of the above.

Saturday, April 15, 2006

SACS 3.3.1

The Commission on Colleges is reviewing its Principles of Accreditation. Below is a comment I sent them on section 3.3.1, which deals with institutional effectiveness. I was motivated by my experiences during our recent accreditation, where I encountered different interpretations of what IE is, and how it should be done.

I’m writing to suggest that 3.3.1 be clarified either in the statement of the section itself or in secondary documentation.

The language in both the old and new drafts is similar:

"The institution identifies expected outcomes, assesses whether it achieves these outcomes, and provides evidence of improvement based on analysis of the results..."

The problem is with the "evidence of improvement" part. The requirement all seems very reasonable until you actually try to put it into practice, at which point a sticky logical problem arises. The assessment loop described in the statement sounds like this:

1. Set a goal

2. Measure attainment of the goal and do some analysis

3. Provide evidence that the goal is now being attained better

The problem is that there is a logical gap between 2 and 3. In my conversations with others, there seems to be wide divergence on the meaning.

The missing steps that are implied between 2 and 3 are inserted below:

1. Set a goal.

2. Measure attainment of the goal and do some analysis.

* Take actions that may lead to improvement.

** Measure goal attainment again, and compare to the first time you did it.

3. Provide evidence that the goal is now being attained better.

Example:

1. Goal: reduce attrition

2. Measurement: currently the 5 yr grad rate is 60%

* Take action: raise admission standards

** Measure again: now the 5 yr rate is 62%

3. Evidence: before and after rates constitute evidence of improvement

Notes: without * there is no guarantee that the unit is doing anything but measuring. A unit could simply measure lot of things, identify a couple that randomly improved, and report that as evidence of compliance with 3.3.1.

An emphasis on taking action intended to result in improvement is lacking in the current version of 3.3.1. This has led to strange interpretations of the standard (in my opinion). For example, a consultant with a degree in Institutional Research, who had been a member of a SACS visiting team, related the following to me. He described to me an IE ‘success’ story, recounted below as I remember it.

At a certain college, there is a required freshman course on a particular subject. Upon graduation, the students were tested on retained knowledge in that subject, and the results turned out to be disappointingly low. The action the administration took was to teach the course in the junior year instead of the freshman year. Since the students had less time to forget the material, test scores went up, and everyone celebrated.

If you accept my version at face value, it illustrates my point. Is it an appropriate action to shorten the length of time between study and assessment to improve scores? Maybe so, but it seems like there ought to be more complex considerations. Why were the students taking the course? Did they perhaps need the knowledge in their freshman year? If the result is to simply ensure that graduates have memorized some facts, then maybe the action was the correct one. The point is that without any emphasis on what actions are taken and why, it’s possible to conform to the letter of 3.3.1 without actually accomplishing anything.

Of course, anyone familiar with the theory and practice of institutional effectiveness will understand that 3.3.1 implies that there are necessary actions between steps 2 and 3. But I would like to convince you that step 3 should actually be replaced in the statement of 3.3.1 by a statement about action:

Proposed version:

The institution identifies expected outcomes, assesses whether it achieves these outcomes, and provides evidence of actions intended to produce improvements. These actions should be based on analysis of the results...

This is a more accurate statement of what most IE activities actually are. There are several related points:

  1. There is never a guarantee of action producing favorable results. That is, you can have the best assessment plan in the world, the best intentions, and reasonable actions, and still fail to produce improvements through no fault of your own. Many things are exceedingly hard to predict (like the stock market, the price of oil, or the weather, all of which affect institutions). A properly functioning IE system will still make many mistakes because of that. A realistic 3.3.1 would take this into account.

  2. As it stands in the new and old version of 3.3.1, the modifying phrase “based on analysis of the results” is ambiguous. Does it mean that the evidence is based on analysis of reports or that the improvement is? That is, which is true:
    1. “The institution identifies expected outcomes, assesses whether it achieves these outcomes, and provides evidence of [improvement based on analysis of the results]…”,

      In this case, the improvements must be based on analysis of results, implying that the actions leading to improvement were evidence-based. Analysis led to action.

    2. “The institution identifies expected outcomes, assesses whether it achieves these outcomes, and provides evidence [based on analysis of the results] of improvement…”

      Here it’s the evidence, rather than any improvement, which is based on analysis of results.

      This interpretation implies that the data demonstrate improvement.

Although to someone who doesn’t actually have to put this into practice, this may seem like counting the number of angels who can sit on a pin. But for the practitioner there is a crucial philosophical difference between the two interpretations, one that must eventually be confronted. It boils down to these two different assessment loops:

(a) First interpretation (action-based):
1. Set a goal
2. Measure attainment of the goal
3. Take reasonable actions, based on what you’ve learned, to try and improve the situation

(b) Second interpretation (evidence-based):
1. Set a goal
2 Measure attainment of the goal
3. Prove from analysis of the measurements that you’ve improved the situation

As I’ve already pointed out, the second interpretation leaves out some steps: taking action and measuring goal attainment a second time. I am arguing that this second interpretation is impractical in many situations. Unfortunately, in conversations with others in the business, it seems like the second interpretation may be more common. At the very least, this should be clarified.

Let me refer to the first, action-based, interpretation as AB, and the second, evidence-based as EB, for convenience.

While EB may sound reasonable as a policy, it rules out many popular kinds of assessment. For example, many institutions rely on surveys for formative assessments. Under EB, one would have to do a scientifically valid before-and-after survey to show compliance. This is difficult in practice, since the surveys are usually not done in a rigorous scientific manner. In general, surveys are better used to form ideas about what actions might be useful than they are at precisely identifying goal attainment. In fact, most subjective, formative assessments like focus groups, suggestion boxes, etc., would be invalid for most uses under the EB interpretation of 3.3.1 because they cannot give reliable before-and-after comparisons with statistical accuracy (at least under the conditions IR offices normally operate).

Despite this, practitioners and reviewers of IE programs often see this as a valid assessment loop:

1. Set a goal.

2. Measure aspects of goal achievement by surveying constituents.

3. Analyze survey results to identify an action to take.

4. Take action.

Notice, however, that this is not EB, since there is no evidence that any improvements are made. That is, unless you take the leap of faith that the actions taken in response to the survey improve goal attainment. But without direct evidence to that point, this is a reach. On the other hand, this is clearly AB.

Strict adherence to EB still leaves plenty of summative assessment techniques, usually a calculation like the number of graduates, the discount rate, standardized test, etc. With these, one can obtain before-and-after comparisons that are sometimes meaningful.

Of course, one can use both kinds of assessment.

In an ideal world, the assessment loop would look like this:

  1. Set a goal.
  2. Measure goal attainment with a summative assessment.
  3. Identify possible strategies with a formative assessment.
  4. Take actions based on analysis of the formative assessment.
  5. Measure the goal attainment again with the summative assessment.
  6. Show evidence of improvement, based on comparison of the summative assessments.

I’ll refer to this as the AB+EB model. This is probably the best approach for those goals for which a summative assessment is available. It is not, however, what is described in 3.3.1, which omits any reference taking action.

It must be admitted, however, that even pursuing AB+EB is still no guarantee of success. Just because we set a goal doesn’t mean that goal is even theoretically attainable, let alone that we will find a way to attain it practically.

Viewed in this light, the requirement in 3.3.1 of evidence of improvement seems too strong for many situations. If the goal is simply completion of a task, and by completion we improve some situation, then that goal trivially satisfies 3.3.1 as soon as the task is completed. But for more complex issues, like retention or student learning outcomes, what is really called for is an active process of trying things out to see what works. This is often a subjective judgment.

There are many situations where AB+EB is impractical. For example, learning outcomes are often measured in subjective ways that are not amenable to summative assessment. Our Art program has a very good sophomore review process, during which the faculty interview students and review their work and progress. This is highly individual per student, and the actions taken are customized accordingly. As an assessment process, it works very well, but could not be standardized to the point where the EB component would be possible.

The misplaced emphasis on EB creates a climate where excessive standardized testing for learning outcomes is favored over more effective subjective assessments. This is unfortunate, because the validity of these instruments is often questionable.

As an example, at a round-table discussion at the annual SACS meeting, several IR directors were discussing measuring general education outcomes. Several were of the opinion that you could just take a standardized test off the shelf, which had already been proven valid, and use it. This readiness to accept validity from a third party belies the very meaning of validity: “Does this test give meaningful answers to the questions I’m asking? A test-maker isn’t the one asking the questions and setting the goals. Without a very thorough review of these issues, an artificial summative measure is likely to appear to give meaning where there actually is none.

To summarize: demonstration of action is more important and more practical than demonstration of success via a summative assessment. The AB model should be the standard described in 3.3.1 instead of the ambiguous language that can be interpreted as AB or AE (but probably AE).

One final note: I think Trudy Banta is right when she calls for more scholarship of assessment. The community would benefit from more published research, widely disseminated, on these topics. In order for higher education to adequately respond to calls for accountability (such as from the Commission on the Future of Higher Education), we need wide, intelligent, practical discourse on the subject of what constitutes effectiveness.

Sunday, March 26, 2006

HigherEd BlogCon

I'm going to have a paper presented in the online conference HigherEdBlogCon.

HigherEdBlogCon 2006

The paper is about raising the visibility of the library by leveraging talent in organization and metadata. The library and information services part is supposed to run April 10-14, 2006. I volunteered to do chat as well, but I don't know if that'll happen or not.

Sunday, March 12, 2006

New Institutional Effectiveness Manual

I just updated our IE manual. There was some interest at the last SCICU institutional research meeting to see what our online system looks like. There are a bunch of screenshots and documentation thereof in the manual. You can get it here. Because of the embedded graphics, it's kind of big: 1.1MB. Below is a snip from the interface.

Saturday, February 11, 2006

Letter to the Commission

I wrote letters to several of the members of the Commission on the Future of Higher Education. It's not all that easy to find their email addresses, and I found only six in the time I spent looking. Here's the letter.

I'm writing to you regarding the progress of the Commission on the Future of Higher Education. I applaud the efforts of your group to improve our educational system, but I have some concerns based on news reports I've read.

It seems that the Commission is considering using standardized testing (instruments like the Collegiate Learning Assessment (CLA)) as a metric to compare institutions. If so, this is a seriously flawed approach that could become an expensive lost opportunity.

Standardized testing is convenient, but not at all suitable to judge complex skills. In contrast, the National Council on Teachers of English (NCTE) has a very good set of guidelines for assessing writing. For example, students are ideally taught write deliberately, to revise, to use social connections to improve their work (short of plagiarism, obviously). None of this is possible in a standardized testing environment. Standardized tests may be convenient and reliable, but they suffer from a lack of validity. Sometimes test makers try to assure us that the "guys in white coats" have done all that technical stuff. This would contradict the very notion of validity--the meaning of the results. In fact, from a teacher's point of view, it's hard to figure out what a standardized test score *means* about the curriculum, unless one simply addresses very narrow skills that the test focuses on.

There are better, more valid, ways to assess student learning. For example, many colleges and universities now use student portfolios in association with specific learning objectives in order to set goals and assess them. This has obvious validity since it links outcomes to a huge database of direct evidence, and we have found at Coker College that inter-rater reliability is acceptable also. Of course, comparing such measures across institutions would be a nightmare. But the truth is, that parents don't care about *learning* per se (for the most part), and perhaps your commission shouldn't either.

What parents care about is how much earning potential their graduates will have. This is far easier to measure than "learning", and lends itself to the kinds of calculations that can validate or invalidate public financial aid. Are the costs of student loans justified by the increased earning power of graduates (after subtracting the opportunity costs)? There's no way to answer that question based on averages of standardized tests.

Since the US government already has on hand tax data and financial aid data, one would imagine that very detailed studies by institution, by state, by demographic, by major, and so on, could be accomplished relatively cheaply. Conventional wisdom is that a college degree is the best investment a young man or woman can make. A definitive study of that would be very enlightening and could guide policy. Conversely, we could maximize scores on some computer-scored test and end up with a generation of students who can only write poetry. (Not that there's anything wrong with poetry, but we need engineers and doctors too.)

I'm a mathematics teacher by training, and I still believe that a good education is about learning, not money. That's a healthy attitude from the perspective of the classroom. However, at the national level, the hard decisions of how much financial aid is "worth" it, or which institution does a better job of preparing students for a productive (economically) career, we're not really talking about learning anymore, but the *effects* of learning. From public documents it seems to me that the Commission may be conflating these. I hope you will reconsider before advocating more standardized testing.
It will be interesting to see if I get a response.

Friday, February 10, 2006

Higher Profile for Testing Proposal

Currently the most emailed New York Times article (subscription may be required) is entitled "Panel Explores Standard Tests for Colleges." A quote:
A higher education commission named by the Bush administration is examining whether standardized testing should be expanded into universities and colleges to prove that students are learning and to allow easier comparisons on quality.
This is of course the Commission on the Future of Higher Education, chaired by Charles Miller. Oddly, he's quoted as saying "There is no way you can mandate a single set of tests, to have a federalist higher education system." I agree completely with that statement, but I'm not sure how they propose to achieve their stated goal of comparing institutions if each uses a different metric.

The article quotes Leon Botstein and David L. Warren staking out a reasonable position from the perspective of those actually doing the educating, opposing "a single, national, high-stakes, one-size-fits-all, uber-outcome exam."

Predictably, standardized test fans like the proposal. Here's a sample:

Any honest look at the new adult literacy level data for recent college grads leaves you very queasy. And the racial gaps are unconscionable. So doing something on the assessment side is probably important. The question is what and when.
And there's a connection to the CLA, which I've written about before here.

Mr. Miller, in his recent memorandum and in the interview, pointed to the recently developed Collegiate Learning Assessment test as a breakthrough. The exam, developed by the Council for Aid to Education, a former division of the RAND Corporation, asks students to write essays and solve complex problems.
The central problem is that in order for a uniform exam to have high reliability the scoring has to be uniform. In the best case this is done by a computer. Unfortunately, this algorithmic 'right or wrong' scoring is not aligned with the typical goal of higher education to deliver higher-order cognitive skills. What is 2+2? Correct answers include 4, IV, 100 (binary), vier (German), and 1 (modulo 3). Standardized testing by its very nature ignores any component of imagination that makes its way into the answer if that imagination introduces unwanted variability (which seems terribly likely).

See my post on validity for more. The National Council of Teachers of English has a very well-considered set of standards for judging writing. There's no way that they translate into standardized testing, however, which means they would be ruled out in the proposal under discussion.

Critical thinking is one of the abilities that some standardized tests purport to assess. Unfortunately, the decision seems to be driven by the economics of test-giving (it's easy to score bubble-sheets), and a misplaced importance on reliability. One would hope that the Commission will apply some critical thinking to this problem.

Sunday, February 05, 2006

Economics and Higher Education

Daniel S. Hamermesh at UT-Austin writes about the economic value of a college degree. On his blog at NCTE, Paul Bodmer comments that
He is an economist, so he gives economics answers. And that is good, particularly from my point of view. The quick answer to the first question is that the payoff is better for students and the public now than in the 1970’s.
This strikes me as good starting place to take a broad look at college education.

Validity in Assessment

At a round table discussion at the SACS conference last year an Institutional Effectiveness guy was expressing the same kinds of frustrations we all feel in trying to comply with requirements to assess everything. He was advocating the use of convenient off-the-shelf solutions in the form of standardized tests. What got my attention was his statement to the effect that "these things have already been proven valid."

Validity, of course, is the notion that an assessment should measure what it purports to. Despite the simplicity of the idea, it's actually quite subtle. After all, smart people have been arguing about epistemology for millinnia. Assessment is no different than any other kind of science (if the assessment is done with serious intent)--there will often be areas of doubt where individual judgment prevails. This is particularly true in using someone else's design to assess your own students/programs/etc.

My response to the IE guy was this. The validity of an instrument depends on what it's used for. The SAT may be marginally useful in predicting success in college, but of no use at all if used to predict how well one plays baseball. Accepting the test-makers' claims to validity need to be examined carefully. As a case in point, see the Applied Research Center's argument that California's CBEST test is not being used in a valid way.

Validity is particularly hard to show for tests that assess complex skills like writing. Interestingly, although course grades are often dismissed as not good assessment data it seems to be common to use grades to validate these tests. Examples include the GRE-W writing assessment, as described here.

In the Atlantic article What Does College Teach, there's a pretty strong condemnation of grades:
[T]here is the direct assessment of student learning that takes place constantly on college campuses, usually symbolized by grades and grade point averages. For our purposes these are nearly useless as indicators of overall educational quality—and not only because grade inflation has rendered GPAs so suspect that some corporate recruiters ask interviewees for their SAT scores instead. Grades are a matter of individual judgment, which varies wildly from class to class and school to school; they tend to reduce learning to what can be scored in short-answer form; and an A on a final exam or a term paper tells us nothing about how well a student will retain the knowledge and tools gained in coursework or apply them in novel situations.
In the printed literature for the Collegiate Learning Assessment, which the author is hawking, we're assured that
The CLA measures were designed by nationally recognized experts in psychometrics and assessment, and field tested in order to ensure the highest levels of validity and reliability.
That's apparently all the public information there is about the validity of their instrument.

In contrast to these standardized tests is a serious set of standards for writing assessment, some of which the CLA obviously violates, published by the National Council of Teachers of English. If you take a look at these, you'll see that they are quite sensible. The only problem (from a particular point of view) is that they are not easily used to assess legions of students in a short time (sorry, even timed essays are out). I'm embarrassed to say that I had not seen these standards when we developed our own assessment of writing and other skills. We did, however, reach most of the same conclusions independently.

The NCTE and our own model attempt to ensure that context and authenticity are paramount. This means (for us) that assessment of student learning is based on data taken from coursework. This is saved in a portfolio system for longitudinal assessment.

Authentic data gathered in a meaningful context goes a long way toward validating logical uses of an assessment. In the end, there are no short cuts to making the crucial judgment about the results: "what does it really mean?" That's just not something that can be shrink-wrapped.

Saturday, January 28, 2006

Standardization of Higher Education

I came across a Memo from the Chairman” by Charles Miller, who seems to be fond of standardized testing. He’s now the Chairman of the U.S. Secretary of Education’s Commission on the Future of Higher Education, created by Education Secretary Margaret Spellings.


Miller is obviously of the opinion that there’s not enough oversight of higher education:


[W]e need to improve, and even fix, current accountability processes, such as accreditation, to ensure that our colleges and universities are providing the highest quality education to their students.


His solution?


Very recently, new testing instruments have been developed which measure an important set of skills to be acquired in college: critical thinking, analytic reasoning, problem solving, and written communications.


And:


An evaluation of these new testing regimes provides evidence of a significant advancement in measuring student learning — especially in measuring the attainment of skills most needed in the future.


I’m not sure how he knows what’s going to be needed in the future. It’s not footnoted. But he lists his favorite standardized tests, including:

A multi-year trial by the Rand Corporation, which included 122 higher education institutions, led to the development of a test measuring critical thinking, analytic reasoning and other skills. As a result of these efforts, a new entity called Collegiate Learning Assessment has been formed by researchers involved and the tests will now be further developed and marketed widely.

That sounded familiar. Sure enough, he’s talking about the same standardized test I found a few days ago and wrote a post about. The article was by Richard H. Hersh, recently president of Trinity College (here’s an article about his resignation).


Hersh is now a senior fellow at the Council for Aid to Education (CAE). In the November 2005 Atlantic Monthly article I wrote about, Hersh subtitles his piece with It's time to put an end to "faith-based" acceptance of higher education's quality. He concludes that what we need is a standardized test of which he is co-director: the Collegiate Learning Assessment (CLA). CLA was cited in a 2001 report called Measuring Up, comparing state higher educational achievement, which laments the lack of a national measure of learning.


The president of CAE is Roger Benjamin. Here’s an article he wrote in the same vein as the Atlantic one, describing the CLA. He notes that:


Student responses can be graded by a trained reader or by a computer.


I have to comment on that. We tried out ETS’s Criterion computerized essay grader last year. It was interesting, but overly simplistic, and we decided it didn’t suit our needs. One problem with timed essays is that we try to teach students that they shouldn’t be in a hurry when they write: revise, revise, revise is the mantra. So how does that square with a timed assessment? With difficulty, I imagine.


Chairman Miller and the CLA’s Benjamin, both talked at a Columbia University press briefing on the future of higher education. Miller is listed as being associated with the Meridian National, Inc.


It seems to me that public policy is being steered toward adoption of standardized testing nationally or state-wide for higher education.


Other links:


Testimony by Miller, citing Benjamin and Hersh


Article on ‘value added assessment’ by Benjamin and Hersh in AACU’s Peer Review


Article by Benjamin from CAE about new assessment techniques.


CAE website


Hersh slams higher Education on PBS.

Sunday, January 22, 2006

Visualizing Retention

Here's a great way to take a look at your institution's retention history using pivot tables. First, you'll need data, of course. I use grade records because they seem to be the 'gold standard' for accuracy. It's difficult to keep financial aid records cleaned up, and financial records can be very complex. Of course, any student who leaves without receiving a grade doesn't show up on this particular radar. But you can use any method you like. Flag each student in a database or spreadsheet with exactly one of: graduated, attritted, still attending. One of those three things must be true. In my case, I use Perl scripts to process the raw data, which gets filtered to a flat file and whence into a database.

Here's a small part of a spreadsheet downloaded from the database, to illustrate.

The year is the first (calendar) year that the student got a grade. All of the students shown left before graduation.


There are lots of other data columns, including financial aid, GPA, and the like. You can obviously add what's important to you. Once the data set is ready in Excel, click Data->PivotTable. If you haven't run one of these before, go learn how. Create a chart that uses Grad, Attrit, and Attend as data items and set the field properties to Average, using the Percent display option. Drag the start year over to the left, and you should create something like this:
You can sort the data columns however you like. The one above shows graduates in blue, non-returners in yellow, and students still in attendence in magenta. This latter band widens out on the right because those are recent enrollees. The year of the 'class' appears at the bottom.

Once the chart is up and running, you can use page selects or other options to narrow the scope of samples to a particular major or student demographic. You can also see side-by-side comparisons for males and females, for example.

I create two of these charts. One shows percentages, like the one above. The other shows absolute numbers of students so you can see total enrollment for the group selected.

Saturday, January 14, 2006

CopyMyth

Part of being a library manager is dealing with copyright issues. It's complicated, and here's a page to sort out some of the truth from myth.

Saturday, January 07, 2006

What Does College Teach?

In a November 2005 Atlantic Monthly article, Richard Hersh subtitles his piece with "It's time to put an end to "faith-based" acceptance of higher education's quality". The thesis is that colleges and universities don't actually know, because they can't measure, whether or not students are learning. After raising the question in rather breathless terms, the author summarizes the situation in assessment, and then gets predictably bogged down in trying to propose a solution.

Hersh starts with the question "What makes your college worth $35,000 a year?" I suppose this was supposed to be provocative, but he could have hardly chosen an easier question to answer. He says this is a hard question for college presidents to answer. Why?

About 5 seconds' search on Google comes up with a suitable response:

Any way you measure it, a college degree is the best investment of your life. In today's dollars, a bachelor's degree is worth more than $2.1 million over 40 years. "Having that post-secondary diploma can make such a difference in lifetime earnings," said Washington, D.C.-based Employment Policy Foundation President Ed Potter.
Never mind the fact that few students pay full sticker price. This is a ROI that any CFO would fantasize about. But read a little further and you discover that Hersh really means something else. He wants a measure on learning, not worth.

[W]e have little direct evidence of how any given school contributes to students' learning.
Apparently what he's really after is a way to rate colleges after taking student quality into account. Again, you could produce such ratings by looking at salary histories of graduates, normalized by socio-economic status of the parents/guardians. But despite his original question, he's not really interested in economic worth, but improvement in learning.

We might well ask what the source of college graduate's extra income is. Is the marketplace so easily duped that a college degree represents nothing about the skills and abilities of graduates, and that the mere slip of sheepskin is sufficient to accrue the megabucks that accompany a Bachelor's degree? That seems exceedingly unlikely. If graduates were no better than non-graduates, US companies would very quickly stop paying them a premium. So even though we haven't yet discussed direct measures of learning, there certainly is good circumstantial evidence of it.

Hersh cites four current categories of measures of college quality: actuarial data (including graduation rates, admissions selectivity and the like), expert ratings by college administrators and professors, student/alumni surveys (including NSSE), and direct assessment (grades). Hersh is openly contemptuous of grades:
For our purposes [grades and grade point averages] are nearly useless as indicators of overall educational quality--and not only because grade inflation has rendered GPAs so suspect that some corporate recruiters ask interviewees for their SAT scores instead. Grades are a matter of individual judgment, which varies wildly from class to class and school to school; they tend to reduce learning to what can be scored in short-answer form; and an A on a final exam or a term paper tells us nothing about how well a student will retain the knowledge and tools gained in coursework or apply them in novel situations.

You see what I mean about 'breathless'. Where to begin? First, there are no references to support any of this, but my own experience (about 20 years in the classroom) doesn't jibe with this. Generally speaking, 'A' students are going to be better than 'D' students. I'm trying to imagine what 'short answer' means for a class on assembly language or mathematical analysis. If we accept this paragraph as an emotional outburst rather than rational argument, I think we can boil it down to this: the author doesn't find it acceptable that student GPAs aren't useful for comparing one school to another.

Of course they're not! The whole endeavor of trying to rank one college against another is daft--anybody who's ever been to college knows that some programs are better than others. What good would an average program quality be to anybody, even if you contrived to compute it? Moreover, such rankings are by nature one-dimensional. Imagine if we had to rank people--your friends and co-workers--on such a scale. Bob is a 4.5, but Mary is 7.2. Ridiculous.

Again, the article makes no mention in the list of assessments of post-college performance in the workplace, although he alludes to it disparagingly "What is worth learning cannot be measured, some say, or becomes evident only long after the undergraduate years are over."

In the following paragraphs, Hersh plays a switcheroo. After arguing that there's no good metric to compare colleges against each other, he assumes that he's concluded that even at the classroom or program level no good assessment is going on. "[C]ummulative learning is rarely measured."

Huh? Anyone who's been to a conference on education lately (SACS for example), knows that half of the sessions are devoted to this very topic. Not only are people interested in it, it's required by most accrediting bodies (all the ones I know of). After saying that cumulative learning is rarely measured, the author revises this to "[M]easuring cumulative learning hasn't been tried and found wanting: it has been found difficult and untried."

I don't know about your institution, but every academic program at mine has a capstone course sequence. For example, in mathematics, students have three semesters of independent research with a faculty advisor, during which we rate them for various skills and abilities on a scale that ranges from 'remedial' to 'graduate'. Math is like most disciplines, where without cumulative learning, you simply can't progress. How could you hope to pass differential equations if you haven't got a clue about basic calculus?

Now that the author's point has wandered off into the landscape of student and program assessment (rather than college rankings), he finds some promising approaches, including portfolios. These have been around for a long time, of course, and there are some very good electronic portfolio vendors out there (just google 'electronic portfolio'). I don't know what the statistics are on usage, but a lot of schools are using them to track student work. We built our own in 2005.

We finally reach the punch line of the article: a pitch for a product that the author is co-director of: the Collegiate Learning Assessment Project. The last two sevenths of the paper is devoted to this product. As a matter of fact, my copy of the article came bundled in an envelope with more advertisement for the CLA surveys.

So the answer to the college ratings 'problem' is a standardized test? The accompanying literature assures us that it has been proven valid and reliable. As convenient as that is, validity is not something that can be externally determined because it depends on the question you're trying to answer by giving the test. That's a subject for another post...