Friday, February 22, 2008

Upgrading to YUI 2.5.0

I'm upgrading the site to use YUI 2.5.0 (the previous version was 2.4.1). I decided to do this because the new version has many more features in data tables, and much of the utility of the TreeTapper site comes from interaction with data tables. The process of upgrading isn't too bad: the only problem has been needing to either add a "?" at the end of datasource files and then using initialRequest, or at least setting initialRequest to "" rather than the new default of "null" [under 2.4.1,

this.myDataSource = new YAHOO.util.DataSource("templates/vocabtable_js.php? table=applicationkind");

worked, while under 2.5.0, it is transmitted as

[URL]/templates/vocabtable_js.php?table=applicationkindnull

which won't work. The solution is to do either

this.myDataSource = new YAHOO.util.DataSource("templates/vocabtable_js.php?");
[...]
this.myDataTable = new YAHOO.widget.DataTable( "applicationkind", myColumnDefs,this.myDataSource, {initialRequest:"table=applicationkind"});

or

this.myDataSource = new YAHOO.util.DataSource( "templates/vocabtable_js.php?table=applicationkind" );
[...]
this.myDataTable = new YAHOO.widget.DataTable("applicationkind", myColumnDefs,this.myDataSource, {initialRequest:""});

The first option seems better].

The new version also has parsers (perhaps they were there before, but I missed them) that allow text coming over XHR to be converted to numbers, allowing numerical sorting.

The new version has at least a couple of downsides. First, making the tables seems slower (see this discussion), which is a problem when you need big tables, as TreeTapper does, and column headers are now drawn separately, and it can take a long time (>5 seconds) for them to line up with the corresponding column in the table.

Tuesday, February 19, 2008

Google charts


I've added some Google charts to the front page of the TreeTapper site. I'm using googlechartseasyphpclass v 1.02 to more easily generate the code to call the charts (it involves converting numbers to letters for plotting, for one thing), though it does limit flexibility a bit (but the source code for the PHP script can be modified easily). YUI also has a new charts API, but it requires a very recent version of Flash for people to use (more recent than I had in FireFox) -- Google can simply create a png, which is convenient. The code to make the data to pass to the charts takes a little while to work (dozens of postgres calls): I might just update this daily and have the site call a saved version of the data. The charts I've made allow tracking of how much the database has grown in the previous month as well as breakdowns of references by year.

Citation counts

When deciding which method to use, or on what areas to focus development, the popularity of related references matters. For example, if method A is used 20 times more frequently to answer a given question than method B, in the absence of other information, a naive user should probably use method A. Just counting total citations could be misleading, though: a good new method only available in the last year will take some time to gain as many citations as a poorer older method, despite acquiring citations at a faster rate. Thus, both number and rate matter. I'm thinking of showing both the total number of citations and rate of gain of citations over a year or so; another way of displaying related info would be comparing the number of citations for a paper with the median number of citations for papers published in the same year (perhaps limiting the papers in the reference set to those similar in scope to the paper of interest). To get this info, I'll need citation info, which is not generally available (see earlier post). I'm using the number of hits in Yahoo, using its search API (which gives slightly different numbers than its html form search results) for both all pages and only PDF-formatted pages with an article's title phrase, last name of first author, and publication year. The tricky things getting this to work were converting apostrophes to html characters and making sure to include the title as a phrase, rather than as a string of words. This approach has a few disadvantages: while web hits probably correlate with how many times a paper is cited (early work suggests this is roughly true), it is not wonderfully correlated (but it can pick up hits for interesting new papers faster than waiting for later citing papers to appear), plus it is easier to mislead (I could add my papers' titles, authors, and years as signatures to all my posts and then start posting on various forums: TreeTapper would see my papers as very popular). Perhaps with the upcoming release of the new CiteSeerX, I can use that system to track citations better. I've set up TreeTapper to store the number of hits for each paper every two weeks; this will allow me to track how the popularity of articles changes through time to recover rate rather than just number of citations. As this recording is new, I'm now only showing total hits until the data are recorded over more time intervals.