Saturday, October 24, 2009

Automated Reporting

One of the most time-consuming tasks of an IR office is to provide answers to a constant stream of questions like "how many female psychology majors do we have?" For advisors, there are detailed questions about courses, student schedules, and course histories that are essential to have timely answers to as well, and these come all at once. Of course, any college nowadays has and electronic system of databases and interfaces that allows answers to such questions. The ones I'm familiar with, however, are much better at storing information than reporting it back--it can be done, it's just not always as easy as one might like.

What do you do if you want to find a water fountain in a building? This seems like a non sequitur, but bear with me for a moment. There are two things I always emphasize to my young daughter when we're out and about. The first is, turn around once in a while so you'll know what your path looks like on your way back, and it will be familiar. The second is "think like the designer." If you want to know where the water fountain is, try to think like the architect. A water fountain is an add-on, so it's likely to be close to other water sources: the bathrooms, for example. It's likely to be on a traveled hallway, etc... This works with interfaces too. The best software interfaces seem effortless because the designer put a lot of thought into consistency and ease of use. Most university software systems like Datatel, Banner, Jenzabar, aren't as easy to use. If you want to figure out why a particular thing is the way it is, you should keep in mind that the designer was probably a programmer, not a user. There's a big difference.

Programmers like modes, for example. Switching from one thing to another. Does your system have "modules?" You can bet some programmer cooked up that idea. Do you have to switch screens three times to do the job of one? Same thing. What's convenient for a programmer is not usually convenient for a user. An example is Datatel's implementation of its windows-based user interface. For years we used a terminal mode that looked like it was out of the 1960s, and probably was. It was ugly, but it worked, and it was fast. It was programmer-designed, no doubt, but it had the advantage that the network demands were very small for sending little packets of text back and forth and throwing them up on the screen. Response time was great, and once you learned the dictionary of short-cut keys, users could fly through the screens. Not a perfect situation, but acceptable. Datatel was probably taking heat for it though--everybody wants a mouse interface, pull-down menus, and a nice GUI look, right? Rather than starting from scratch and building the thing, as far as I can tell, they just layered it on top of the existing software stack. What this means to the user is that it was very slooooooow. After our implementation, the staff complained bitterly about it, in some cases reporting tasks that took more than five times as long as before. Of course we passed this on to the technical reps at Datatel, but they'd heard it all before, and had little tweaks instead of solutions. I hope they've fixed it by now, but this is a good example of a programmer's "solution" to a user problem--it actually made things worse. Note that this isn't an indictment of the whole Datatel system--the database itself was very stable and usable. With a package to get data out, reporting is bearable. There have been some very unfortunate design decisions in the last releases from the user point of view, but underneath I found it to be very solid (after about eight years of use as IR guy and at one point sysadmin).

So if you want to minimize calls to IR and you want to provide data to users, ease of use and response time are premium qualities to keep in mind. A few tenths of a second on a web page refresh make a difference in the real world... Simple and to the point is good.

Here's a partial solution to the problem of getting what you want out of a cranky database. It's not perfect, and it's designed by a programmer (me), but one who is also a user of the system. So it's a compromise between time programming and elegance of design. The users love it, once they get over the small learning curve.

Schematically, here's the architecture. openIGOR is my open-source project for a document repository, but I also use it to push out portal functions. It uses perl's DBI interface to talk to the databases it needs to. All of them are SQL, so this is pretty straightforward. Note that it's not at all straightforward if you mix linux and windows, but I've now ported openIGOR to windows and will be releasing that code soon.
The warehouse database contains a Reports table that has the name of the report, SQL code for the report, a list of field names to assign to the columns, and a report type. The SQL code can have variables embedded in it, which are interpolated before being sent out. For example:
SELECT student_id FROM Students WHERE (Start_Term eq '_current_term');
This is an invented example where the _current_term will be interpolated by the perl script to put in the actual current term: a literal like "09FA" or something. This makes the reports general and customizable.

In addition, the field names from the report table can have markups in them. The field names are used to label columns on the report that gets cranked out, but there are two specifications for filtering that can be included. First I use this syntax:
(simple_date) Date
To tell the reporter to run the subroutine "simple_date" on the data before sending it to output. In this case, the date that comes out of the database is a long ugly thing and I want to simplify it down to day/month/year. The second type of markup is a bit more complicated. For example,
(sub_2) [Schedule]
tells the reporter that instead of printing data in this field, it is to be a sub-report. A list of students could include a field that hyperlinks to that student's current course schedule, for example. The "2" means to hyperlink to sub-report number 2, and the brackets mean to provide a hyperlink instead of the actual data.

So how does all this programese translate to user experience? Here's what it looks like. After a pulldown menu to choose the report, the user gets a list of possible fields to choose from. Here's the one for Current Course Enrollment:

The interface is a bit busy, but it's simple to use. Select the boxes next to fields you want to see, and enter any restrictive filters on the right. Here, I've chosen to see all math classes. Here's a portion of the report that comes back:
The green hyperlinks take you to the student roster for that class. Once there, you can click to see an individual student's course schedule or see their whole course history with grades (there are some security precautions there for FERPA--I had to work it out with the college lawyer).

The requests for these reports use the cgi GET method, which means in practice that every report can be bookmarked for later use. Compared to trying to get this information out of our Jenzabar interface, this is MUCH easier.

The ability to deliver reports through the web opens up even more powerful reporting methods. For example, Excel has the ability to look at a table on a web page and grab data off of it to populate a field. Then you can do calculations. So if you wanted to know the total number of students enrolled in either math or English courses, with a few clicks you could generate those reports, embed the results into a spreadsheet and calculate the total.

But here's the really cool part. If you look at the report designer again you'll see a box at the bottom that produces the output in a text delimited format, rather than as HTML. This is great for feeding into other number crunchers. You could use Yahoo Pipes, for example, if the system weren't locked behind a firewall for security. This week I got a request for a custom report: financial aid needs to know right away if a student crosses the 12-credit mark in either direction. If they drop below, they're no longer full time, and if they go from 9 credits to 12 they become full time. Either way, there are financial aid ramifications.

Normally, solving this reporting problem would start with identifying the SQL query to pull down the data. I don't have to do that now because I have an interface. I just clicked through the report generator to ask for student identification information and the number of enrolled hours, and for good measure added the course schedule link for each student. Then I created a perl script, using the LWP web module, to run the web request, pull back the data in delimited form, parse it out, and generate a file with student ID and number of credits. Each day at 5:30 a cron job runs to check the current data against the day before, and if there's a student who passed the 12 credit threshold, email the right people. It takes perhaps a second longer to run the http request than to run a direct SQL query from the database. But in this context, a second delay means nothing.

Advantages. In general, the architecture presented above has some significant advantages. Database security is restricted to the the point of query--the reporter script--which runs on a secured database. So rather than checking each user's credentials for each report, this can be handled through openIGOR and not touch the Jenzabar credentials table at all. In addition to security, it's much more accessible: you don't need a specialized executable sitting on your desktop to run the report, just a web browser. You can do it from your phone. From the perspective of the IR office, imagine being able to send a hyperlink to a live report back to a requestor, rather than a number that will only satisfy them for the moment. Most users are sophisticated enough to generate their own reports after being shown the interface.

Disadvantages. The reporter does not try to be fancy about SQL queries. It works from the one given in the report definition, which generally pulls a swath of data on a topic, like enrollment. The filtering happens only after the whole table has been downloaded. This is an inefficiency because it's a lot faster to know up front if you only want math classes, for example--then you can as Jenzabar (or other database) for just the math classes, rather than all classes. Therefore, the reporter as described would not be suitable for very heavily used reports, running all the time; those you'd want to optimize. Rather, it's designed to be very flexible and to make it easy to drop in new reports.

Implementation. The code for running all this is not all that complex. I've made it work for Datatel and Jenzabar, and in principle it works for any reasonable configuration of database tables. Setting it up to work the first time requires some expertise--it's not plug and play. But I'd be happy to help others do that if there's demand.

No comments:

Post a Comment