One thing that bothers me is that so far, my project seems to be about database and website design and coding, not science. However, the science comes later: adding items to the DB is at least related to scientific methods, and once the DB is full enough, I'll be able to use it to figure out what new methods need to be created to answer questions (the real goal of the project). I'm also still doing science, despite the impression blog readers might get: this week, I have done a series of likelihood bootstraps on my ant data (I had to move a couple of intron boundaries based on info from genbank, which then required a new partitioned analysis), started doing the power/bias tests for new methods of trait evolution I've developed, worked on analyses for a paper on fish evolution with Dave Collar using new methods in my program Brownie, talked to a student about models of gene evolution [see my published authored appendix on this], and twiddled my thumbs waiting for reviews on a species delimitation paper (>9 weeks in review so far [but at least it's in review]).
Repeat of the design goals for viewing missing methods/software:
Allow users to see on a tree (using branch coloring) which questions don't have methods, which methods aren't in software, etc.
Allow users to arrange the order in which things are displayed: question->criterion->method, or question->character type 1 -> tree type -> branch length type -> data format -> software
Allow users to filter by option (only show methods relating to DNA data, for example)
Make it fast, intuitive, etc.
I think a way to address these is to create table views; one has all the available methods ("actualmethods"), one has all the available methods and software ("actualsoftware"), and one could have all the imaginable combinations of all options ("biggie"). That way, all the database logic for combining the primitive tables (method->methodtotreetypetobranchlengthtype->treetype, for example) is taken care of at the view creation step, rather than requiring it all to be created on the fly when a user re-orders options. [Aha: so this is a reason for using MVC]. To draw the tree of methods/software based on user choice, one gets the tree structure by looking in the [actually hypothetical] "biggie" table; combinations (edges on the display tree) present in the "actualsoftware" table are shown in the "+methods +software" color (purple?), combinations not present there but in the "actualmethods" table get the "+methods -software" color (black?), and others get the "-methods -software" color (gray?). The only problem with this is the size of the "biggie" table view: except for built-in relations between general and posed questions and posed questions and relevant combinations of characters (see schema), it's basically a massive cross join. That means that if there are 2 data formats, 6 tree formats, and 7 platforms, the table has 2 x 6 x 7 = 84 rows. The actual "biggie" table, having info on all the imaginable options (input formats, tree types, character types, criteria, etc.) would currently have 1,361,817,600 (1.3 billion) distinct rows. Instead of creating such a huge table, I will have a view ("generaltoposedtochartype") containing the essential relations between general question, posed question, and character combinations (only 2,480 rows currently) and then just have the program returning possible branches for the tree know that all the other options can essentially be cross joined.
I had a bit of trouble creating the initial "actualmethods" view efficiently; Hilmar Lapp, an IT guru here at NESCent (codes for BioPerl, organizes hackathons, organizes people) edited the query to make it more efficient and eliminate return of duplicate rows (without using "distinct"). Below I've posted the sql statement used to make these views in case it's useful for others (or for me in the future).
I'm giving a talk at Evolution 2008 in June on TreeTapper and information learned about missing methods/software so far (the idea is that there will be something learnable by that point, besides the proper use of YUI APIs). No word on time/session yet.
It appears the strategy of using Google Maps with YUI display of items to add to the map will basically work. Rather than doing drag and drop between YUI datatables, I'm using just the drag and drop YUI code on a list of options, each with the possibility of limiting it to one item (for example, one can first organize by optimality criterion, deciding to show all or just likelihood). The interface is based on the YUI example, but with just one list, with one element of a different color so that options placed above this element appear on the tree while ones below do not (inspired by "the line" on Google Summer of Code's mentorship application). I originally thought of having two lists side by side, allowing people to move elements from one list to the other, but this was too wide for some screens once the possibility of options selection was added.
I've also gotten a plain white Google Map (to replace traditional geographic maps) working, as well as overlays. Google maps have a wrapper for XmlHttpRequest called GXmlHttp that should make refreshing the chart based on user-sorted options possible. Now the question is how to efficiently recover information from the database to draw the tree, highlighting which branches lead to software+methods, just methods, or nothing.
For Google Summer of Code, NESCent had 11 project ideas, 31 applicants, and just 5 slots. Paul McMillan, an undergraduate student at UC Berkeley, was one of the applicants and proposed working on the WebDot navigation of databases project (though he might end up using GraphViz directly, rather than WebDot). His application was detailed and showed good background knowledge; more impressive were his conversations (over IRC) regarding the problem, where it became evident that he had given it a lot of thought and certainly had the background to do this. This project should help with TreeTapper navigation (looking at coauthor networks, for example) and become an easy-to-use solution for other website developers. Congrats to Paul.
This was my first year with Google Summer of Code. I was impressed by the quality of the applications NESCent-affiliated projects received and how passionate the students are about them (several whose projects didn't get funding have volunteered to work on them anyway, which is amazing (since they'll have to do something else for money, and so will have less time)).
The key interesting thing about TreeTapper for me is the ability to find missing methods or software. Any list of software and methods will tell you what's available (and isn't trivial to make), but for developers, finding what doesn't exist yet is key. At first, I was just doing a typical treeview (not in the phylogeny sense, but in the nested series of folders sense):
I started adding the beginnings of bar plots (the red squares above) to show the number of techniques/question available for each topic. The problem with this is that it's very hard to get a quick overview of what's missing: a user has to drill down into each section and remember what's there (sensible display of some of this in with bar plots might help, but it's still not great). But thinking about it, what would be good to show is an actual tree: for a given starting point (such as a topic: speciation rate), and then all possible descendants (such as all possible questions for this topic). Those descendants available in methods/software get one branch color (say, black), those not get another branch color (gray) [though it might be good to distinguish those present in methods but not software]. Here's a hand-drawn example for the basic idea:
And with colors and labels:
Under this approach, it's easy to distinguish areas with methods/software available (black/solid) from those lacking methods or software (gray/translucent). In the example above, the central dot represents a topic, the first circle represents questions, the second circle criteria, the third perhaps character type, etc. Derrick Zwickl had the good suggestion to allow users to set the order in which options are plotted; I'd also like to allow users to fix certain values (only look for missing methods under a likelihood criterion, for example).
Well, we'll see how it goes. As with all posts, please feel free to make suggestions in the comments.