Saturday, October 17, 2009

Dim Dim

It just struck me that I probably now have no way to access the first code I wrote--programs for the Apple II--back in the 1980s. I suppose some of it is on floppy disks in my parents' basement, but I imagine they've leaked more than a few bits by now.

I was led to this thought as I typed the title of this piece because DIM is BASIC syntax for declaring an array. It stands for dimension, and that's what prompted me to put fingers to keyboard. Not the headache inducing 10 dimensions of physical spacetime that string theory (or M-theory) hypothesizes or the span of the metaverse required to banish the cosmological anthropic principle, but ordinary everyday uses and abuses. Well, mostly abuses.

I was amazed to learn over lunch this week that our course numbering scheme is more or less restricted so that the middle digit is the number of credit hours of the course. Given that the first number is used to denote the level of difficulty, this only leaves one digit out of three to actually identify the course content. I was made aware of this when a faculty member asked me if we could increase it to four digits. The problem is obvious: if you have linked course content across levels, like Math 305 Intro to Analysis, then it's nice to number Intermediate Analysis as 405, and so on. But if you only have a total of 10 courses per credit hour size that becomes difficult. Since most courses are three credits, the information in the second digit is mostly wasted. One might say that it's completely wasted since the number of credit hours is always listed next to the course anyway. There are in effect three dimensions crammed into one number:
  1. First digit is level of course
  2. Second digit is number of credits
  3. Third digit identifies the actual course
It would be better to separate out all three into three fields. Tradition dictates that this will never happen with the first one, of course.

Another example is common in any database where code tables are used. Suppose you want to categorize college applicants by day, evening, accepted, rejected, withdrawn, awarded aid, or declined aid. Sometimes you see these all crammed into the same code table with either numbers or letter abbreviations to identify them. It becomes a terrible mess to run reports with such an arrangement. In this case, day/evening is one dimension and should get its very own field. Same with accepted/rejected. Withdrawn needs a field because it could happen before or after being accepted. And financial aid needs a whole bunch of fields, of course.

The examples above are more a nuisance than anything else, but it shows something about how our brains work. We like to be clever and combine things, even when they shouldn't be. Once different qualities are packaged together we sometimes have trouble even imagining that they are separate. In a ACM article by Lance Fornow called "Viewpoint: Time for computer science to grow up" there is evidence that this tendency runs deep indeed.

The author describes how computer science has not grown up around the journal culture, as older disciplines have. Instead, researchers use conferences and conference proceedings as a way to disseminate ideas and document professional competence. The problem he presents is that there are too many conferences, which fragments the community, and that it's too expensive and arbitrary--who gets to present at a conference is not determined as fairly as one might wish. He argues for the growth of a strong journal publication system as a solution.

Maybe you noticed in the second sentence of the paragraph above that there are two goals in two dimensions: disseminate ideas and document competence (e.g. for tenure review). The second goal has been attached to the first for so long--like embedding a difficulty level in a course number--that we don't necessarily notice they're two very different things. Daniel Tunkelang notes this in a comment on the source article, with two points I'll highlight here:
  1. [I]t makes no sense for publishers to act as filters in an age of nearly-free digital distribution.
  2. The peer-review process (and review processes in general) should serve to endorse content--and ideally even to improve it--rather than to filter it.
In the days of yore, publishing was expensive, and so only so much stuff could be published. (How long did it take them to catch up to everything Euler wrote?) Therefore, filters on what got published were reasonable. Filters imply standards, and turning things around "published => worthwhile" became accepted. It's high time those two dimensions were decoupled. Why on Earth would we want to filter work before it's published if we don't have to? The Internet has lifted all reasonable restrictions on the ease of disseminating ideas. Figuring outhow to decide quality is a whole other problem, and I venture to say the answer that evolves will be far more robust than the slow and arbitrary peer-review that created the journal culture.

The physics preprint site is a perhaps a prototype for the publication model. There are modest barriers to publication there (you have to get endorsed once, based on a glance at your work to make sure it's not a drawing of a spider or something silly). Do people look at it, or is it a "write-only" repository? Here's Alexa, comparing ACM, IEEE, and Arxiv.
The bottom line is, showing it to be a third to quarter the traffic of, measured in page views, reaching (another graph) .0078% of global Internet users. I couldn't figure out Alexa's interface well enough to find the raw number of page views, but Wolfram Alpha came to the rescue (reporting Alexa stats, ironically), putting at about 100,000 daily visitors, with about two pages views each. That seems pretty healthy to me.

It isn't hard to imagine how ratings or rankings might get layered on top of such an archive, but I hope that doesn't happen. Much more meaningful would be hot-linked citations, keywords, and other richly-connected data, perhaps including comments (NOT anonymous ones, though). Determining the merit of a scholar based on his or her work will always be a messy, subjective, and political process. The tenure review process needs a reboot. The incentive is that those who figure out the best way of doing it get to attract and keep the best researchers. That's how evolution works...


  1. I see what you did there... moving us from simple data structures through tenure to evolution.

    But the democratization of publishing is what snagged my interest here.

    I'm a big fan of the Assessomat. Having thousands of brains chewing through material and bringing up the worthy chunks while burying the buriable is a fine use for massively parallel brain activity. The presentation layer of the Assessomat is yet immature: it is fragmented and unreliable (sometimes Zombies emerge which should have remained buried, requiring me to evaluate and rebury them on my own... my contribution to the Assessomat network). But its power grows with the Internet and proliferation of serious-minded online publishing from the heart of the academy.

    The evolution of the Assessomat, its ability to survive in a changing environment, will track along with an increase in the number of humans who publish seriously, and those who apply to it an evaluative rigor that, in the aggregate, is an analog of the print industry's filtering effect which is, as you noted, a messy, subjective, and political process. The Assessomat brings some useful transparency to what has traditionally been a closed-door and mysterious process.

    In many ways, submitting material to the Assessomat is like being punted alone into the Roman Coliseum to face an endless stream of gladiators. Only the strong survive and the crowd cheers all outcomes.


  2. JUS, I think you pulled off a very difficult trick there. I googled "assessomat" and got exactly ONE hit. There should be a name for that: a monogooglism, perhaps. Of course, it won't work after this page is indexed. Better grab while supplies last.

    I see the filtering working like you describe it. Sources like reddit already feed me a regular diet of stuff I would never have found on my own. I do imagine professional organizations will want indices and cross-references and archives for academic works, rather than facing linkrot. Something ramped up from arxiv, perhaps. A harder question is how writers of long works will be rewarded for their efforts. I speculated about that in a post (The Future of Self) a couple of weeks ago. In this case, evolution may favor shorter and fewer works.