Thursday, April 24, 2008

Viewing missing methods/software, part III: Table views

Repeat of the design goals for viewing missing methods/software: 
  1. Allow users to see on a tree (using branch coloring) which questions don't have methods, which methods aren't in software, etc.
  2. Allow users to arrange the order in which things are displayed: question->criterion->method, or question->character type 1 -> tree type -> branch length type -> data format -> software
  3. Allow users to filter by option (only show methods relating to DNA data, for example)
  4. Make it fast, intuitive, etc.
I think a way to address these is to create table views; one has all the available methods ("actualmethods"), one has all the available methods and software ("actualsoftware"), and one could have all the imaginable combinations of all options ("biggie"). That way, all the database logic for combining the primitive tables (method->methodtotreetypetobranchlengthtype->treetype, for example) is taken care of at the view creation step, rather than requiring it all to be created on the fly when a user re-orders options. [Aha: so this is a reason for using MVC]. To draw the tree of methods/software based on user choice, one gets the tree structure by looking in the [actually hypothetical] "biggie" table; combinations (edges on the display tree) present in the "actualsoftware" table are shown in the "+methods +software" color (purple?), combinations not present there but in the "actualmethods" table get the "+methods -software" color (black?), and others get the "-methods -software" color (gray?). The only problem with this is the size of the "biggie" table view: except for built-in relations between general and posed questions and posed questions and relevant combinations of characters (see schema), it's basically a massive cross join. That means that if there are 2 data formats, 6 tree formats, and 7 platforms, the table has 2 x 6 x 7 = 84 rows.  The actual "biggie" table, having info on all the imaginable options (input formats, tree types, character types, criteria, etc.) would currently have 1,361,817,600 (1.3 billion) distinct rows. Instead of creating such a huge table, I will have a view ("generaltoposedtochartype") containing the essential relations between general question, posed question, and character combinations (only 2,480 rows currently) and then just have the program returning possible branches for the tree know that all the other options can essentially be cross joined.

I had a bit of trouble creating the initial "actualmethods" view efficiently; Hilmar Lapp, an IT guru here at NESCent (codes for BioPerl, organizes hackathons, organizes people) edited the query to make it more efficient and eliminate return of duplicate rows (without using "distinct"). Below I've posted the sql statement used to make these views in case it's useful for others (or for me in the future).

-- All actual methods
CREATE VIEW 
actualmethods
(
actualmethods_generalquestion,
actualmethods_posedquestion,
actualmethods_char1,
actualmethods_char2,
actualmethods_char3,
actualmethods_treetype,
actualmethods_branchlengthtype,
actualmethods_criterion,
actualmethods_method
)
AS
SELECT
pq.posedquestion_generalquestion,
pq.posedquestion_id,
  cc.charactercombination_char1, 
cc.charactercombination_char2, 
cc.charactercombination_char3,
mttblt.methodtotreetypetobranchlengthtype_treetype,
mttblt.methodtotreetypetobranchlengthtype_branchlengthtype,
   mc.methodtocriterion_criterion,

mttblt.methodtotreetypetobranchlengthtype_method

FROM
methodtotreetypetobranchlengthtype mttblt,
methodtocriterion mc,
methodtocharactercombination mcc,
methodtoposedquestion mpq,
posedquestion pq,
charactercombination cc,
posedquestiontocharactercombination pqcc
WHERE
methodtocriterion_method=methodtotreetypetobranchlengthtype_method
AND  
methodtocharactercombination_method=methodtotreetypetobranchlengthtype_method
AND  
methodtocharactercombination_charactercombination=charactercombination_id
AND  
methodtoposedquestion_method=methodtotreetypetobranchlengthtype_method
AND
methodtoposedquestion_posedquestion=posedquestion_id
AND
posedquestiontocharactercombination_posedquestion=posedquestion_id
AND  
posedquestiontocharactercombination_charactercombination=charactercombination_id
;

-- all actual methods and software
CREATE VIEW 
actualsoftware
(
actualsoftware_generalquestion,
actualsoftware_posedquestion,
actualsoftware_char1,
actualsoftware_char2,
actualsoftware_char3,
actualsoftware_treetype,
actualsoftware_branchlengthtype,
actualsoftware_criterion,
actualsoftware_method,
actualsoftware_dataformat,
actualsoftware_treeformat,
actualsoftware_applicationkind,
actualsoftware_platform,
actualsoftware_program
)
AS
SELECT
pq.posedquestion_generalquestion,
pq.posedquestion_id,
cc.charactercombination_char1, 
cc.charactercombination_char2, 
cc.charactercombination_char3,
mttblt.methodtotreetypetobranchlengthtype_treetype,
mttblt.methodtotreetypetobranchlengthtype_branchlengthtype,
mc.methodtocriterion_criterion,

mttblt.methodtotreetypetobranchlengthtype_method,

pdf.programtodataformat_dataformat, 
ptf.programtotreeformat_treeformat, 
ppak.programtoplatformappkind_applicationkind, 
ppak.programtoplatformappkind_platform, 

  pmcc.programtomethodtocharactercombination_program
FROM
methodtotreetypetobranchlengthtype mttblt,
methodtocriterion mc,
methodtocharactercombination mcc,
methodtoposedquestion mpq,
posedquestion pq,
charactercombination cc,
posedquestiontocharactercombination pqcc,
programtodataformat pdf,
programtotreeformat ptf,
programtomethodtocharactercombination pmcc,
programtoplatformappkind ppak
WHERE
methodtocriterion_method=methodtotreetypetobranchlengthtype_method
AND  
methodtocharactercombination_method=methodtotreetypetobranchlengthtype_method
AND  
methodtocharactercombination_charactercombination=charactercombination_id
AND  
methodtoposedquestion_method=methodtotreetypetobranchlengthtype_method
AND 
methodtoposedquestion_posedquestion=posedquestion_id
AND 
posedquestiontocharactercombination_posedquestion=posedquestion_id
AND  
posedquestiontocharactercombination_charactercombination=charactercombination_id
AND 
programtomethodtocharactercombination_methodtocharactercombination=methodtocharactercombination_id
AND 
programtomethodtocharactercombination_program=programtodataformat_program
AND 
programtodataformat_program=programtotreeformat_program
AND 
programtotreeformat_program=programtoplatformappkind_program
;

-- all general+posedquestions+chartypes
CREATE VIEW
generaltoposedtochartype
(
generaltoposedtochartype_generalquestion,
generaltoposedtochartype_posedquestion,
generaltoposedtochartype_char1,
generaltoposedtochartype_char2,
generaltoposedtochartype_char3
)
AS
SELECT
generalquestion_id,
posedquestion_id, 
charactercombination.charactercombination_char1, 
charactercombination.charactercombination_char2, 
charactercombination.charactercombination_char3
FROM
charactercombination,
posedquestiontocharactercombination,
posedquestion,
generalquestion
WHERE
posedquestion_generalquestion=generalquestion_id
AND
posedquestiontocharactercombination_charactercombination=charactercombination_id
AND
posedquestiontocharactercombination_posedquestion=posedquestion_id
;

-- all conceivable combinations of parameters other than question and chartype (a massive cross join, probably not used)
CREATE VIEW
crossjoinoptions
(
crossjoinoptions_treetype,
crossjoinoptions_branchlengthtype,
crossjoinoptions_criterion,
crossjoinoptions_dataformat,
crossjoinoptions_treeformat,
crossjoinoptions_applicationkind,
crossjoinoptions_platform
)
AS
SELECT 
treetype_id, 
branchlengthtype_id, 
criterion_id,

dataformat_id,
treeformat_id,
applicationkind_id,
platform_id
FROM
treetype,
branchlengthtype,
criterion,
dataformat,
treeformat,
applicationkind,
platform
;

No comments: