Wednesday, April 30, 2008

Paper published


Bit of a rarity these days. My paper on identifiers in biodiversity informatics, which I mentioned earlier when I deposited the preprint at Nature Precedings, has been published in Briefings in Bioinformatics (doi:10.1093/bib/bbn022).

Here's the abstract:
A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular taxonomic and phylogenetic research means that a lot of information in public sequence databases is not linked to formal taxonomic names. This review explores the use of other identifiers, such as specimen codes and GenBank accession numbers, to link otherwise disconnected facts in different databases. The structure of these links can also be exploited using the PageRank algorithm to rank the results of searches on biodiversity databases. The key to rich integration is a commitment to deploy and reuse globally unique, shared identifiers [such as Digital Object Identifiers (DOIs) and Life Science Identifiers (LSIDs)], and the implementation of services that link those identifiers.

Monday, April 28, 2008

Google Code wiki using Subversion

For some time now Google Code has been displaying the message:

The web interface for wiki content is currently READ-ONLY for maintenance.
You may still add comments, and members may add, edit, or delete wiki pages via svn. Learn more.

This is a bit of a pain as I've recently put the code for my LSID tester into Google Code (the project is here). Since having a simple wiki is part of the attraction of Google Code, I decided to finally figure out how to add a wiki via Subversion. Turns out it is pretty straight forward. I created a folder called "wiki" and added a file with some wiki markup. I then added it to the repository
svn import -m "Trying to add wiki"
wiki https://lsid-php.googlecode.com/svn/wiki/
--username USERNAME

(do this from the folder containing "wiki", not within the "wiki" folder itself). This adds the contents of the wiki folder to the Google Code repository. You can then check this out:
svn checkout
https://lsid-php.googlecode.com/svn/wiki/
lsid-php-wiki --username USERNAME

This probably seems obvious to many, but I'm used to CVS, having run a CVS repository since the late 1990's when Mike Charleston and I were working on TreeMap. I've been resisting moving to Subversion simply because of the hassle of learning stuff that doesn't actually make my life any easier. That said, Google Code is a nice way to host projects.

Thursday, April 03, 2008

Biodiversity informatics: the challenge of linking data and the role of shared identifiers

The manuscript for Briefings in Bioinformatics that I alluded to earlier has been accepted for publication. I've put a preprint up at Nature Preceding (hdl:10101/npre.2008.1760.1). The final version will appear in print later this year.