Wednesday, February 13, 2008

Added people

I've gotten the reference parsing to work and have added 1545 references, from which 1840 authors were extracted. To get the references, I looked through various online reference databases (PubMed, ISI Web of Science) to select articles about methods (searching for particular authors, looking in particular journals (for example, issues of Systematic Biology over the past several years and articles in Bioinformatics that mentioned phylogenies), etc.) based on the article titles, downloaded these citations, imported them with their various formats into EndNote, and then exported a RIS-formatted file that I then uploaded to the website (in several chunks) and parsed using bibutils conversion and then a custom XML parser written using SimpleXML in PHP. The XML from bibutils is also saved with each reference in the database, making conversion of user-selected references for export in various formats easier (I hope, but we'll see when I code that). I've also created templates on the development site to automatically display information on the included authors in a RESTful way: http://treetapper.nescent.org/person will display a paginated table of all the authors in the database with the number of references, methods, and software each has in the database; clicking on an author's name will go to a page listing her or his coauthors (ranked by number of papers in common) and references. For example, going to http://treetapper.nescent.org/person/23 will go to a page for Mike Sanderson. You can then link from person to person in this way. 

Left to do: add XML output as an option, rather than just the html output with datatables; get the datatables to sort properly (currently, Yahoo User Interface datatables (version 2.4.1) sort lexically: sorting [5, 200, 12] gives [12, 200, 5], but author names also aren't sorting properly); and do tables and a REST interface for references (I also want to be able to autocomplete on an author's name and then just display the relevant references). It might be interesting at some point to add a way to output files to visualize author relationships (perhaps with Graphviz); this could also provide another way to navigate the database.

No comments: