One of the things that keeps bothering me is the lack of compelling ways to visualise information in phylogenetic databases. Trees themselves are, I feel, pretty awful objects to work with. They are large, and displaying them takes up a lot of screen real estate. Yet, in many ways, the more one sees of the tree the less one gains from the experience. For example, CAIDA's Walrus tool (right), used by Tim Hughes to display large trees looks fabulous, but is it useful? By which I mean, can we use it find out about stuff, or do we just spin it around and go "ohh, isn't it pretty?"
Treemaps are another tool that I've looked at, but never been terribly impressed. However, quantum treemaps, described in Ordered and Quantum Treemaps: Making Effective Use of 2D Space to Display Hierarchies, look potentially useful. To quote from the paper describing them:
The goal of the Quantum Treemap algorithm is similar to other treemap algorithms, but instead of generating rectangles of arbitrary aspect ratios, it generates rectangles with widths and heights that are integer multiples of a given elemental size. In this manner, it always generates rectangles in which a grid of elements of the same size can be layed out. Furthermore, all the grids of elements will align perfectly with rows and columns of elements running across the entire series of rectangles. It is this basic element size that cannot be made any smaller that led to the name of Quantum Treemaps
Quantum treemaps have made their way into Photomesa.
So, here's my thought. What if we used a quantum treemap to browse TreeBASE? Suppose we have a mapping between TreeBASE taxa and the NCBI taxonomy (or any other taxonomy, it doesn't really matter). If we then have some notion of what taxa each TreeBASE study is mainly about, then we could display a quantum treemap of studies rooted at any node in the NCBI taxonomy. For example, studies on mammals, grouped by order. The point here is not to see the tree, but to navigate through the studies using the tree.
Whereas treemaps usually display a nested hierarchy, my sense is that quantum treemaps are used to display the children of a node, rather than the whole tree. I think this is because the final size of a quantum treemap is unpredictable.
The mapping of TreeBASE names to NCBI tax_ids is not trivial, but I've got most of one done. Mapping studies to taxa needs a little thought. One approach is to take a tree from a study, relabel it with NCBI tax_ids, then find the least common ancestor in the NCBI taxonomy of the centroid of the tree. The idea is that this is in the core of the tree, and hence should capture what the tree is about. Finding the LCA of the root would be an obvious thing to do, but if one has a tree comprising mostly vertebrates, but rooted with a bacterium, then the root LCA is the root of life, which isn't a terribly accurate summary of the tree.
I've been playing with generating quantum treemaps, based on a C++ port of some Java code written by Ben Bederson. The next step is to try and bolt this together into a demo of how this might be used to navigate TreeBASE.