Cynthia Sims Parr
Informatics

(page under development)

The term informatics can be applied to two different areas of inquiry. The first would be the development and use of computational or mathematical techniques for analysis or modeling in science. This would include the in silico biological research on ecological interactions described above. The second involves the development and use of database-related technologies in support of such scientific endeavors. In this latter area my efforts fall into two distinct categories.

Information visualization. My colleagues and I at the Human-Computer Interaction Lab are designing and evaluating highly visual, interactive software to allow better access to and understanding of biological information. Domains include biological taxonomies and phylogenies (e.g. TaxonTree and DoubleTree), biological ontologies, and ecological interaction datasets (EcoLens and TreePlus). We have found significant advantages in user speed and accuracy using incremental, label-based tree browsing over more traditional force-directed graph layouts, particularly for high-density datasets. Our emphasis has been on exploration of details rather than on overviews. We characterized biodiversity information-seekers, and developed a taxonomy of tasks by which to evaluate tree and graph visualizations.

The techniques we develop and evaluate are valuable quite broadly to any domain requiring tree and graph visualization, particularly for large, multidimensional datasets. TaxonTree is now an alternative browsing interface to the Animal Diversity Web, and is being adapted for use on the LepTree project. The other products are available for download at http://www.cs.umd.edu/hcil/biodiversity. Other recent work includes the visualization of uncertainty in hierarchies, for example in multiple evolutionary trees (in collaboration with Microsoft Research).

Since July 2007 I have working with the Smithsonian's INOTAXA project as a consultant to Information International Associates to develop requirements, use cases, and a prototype system for searching and browsing information in the taxonomic literature. We are starting with Biologia Centrali-Americana, which is already available digitally but not in a way that makes it easily integrated with related resources such as other publications, classifications, images, and specimen data.

Data sharing and artificial intelligence. Previous work ranged from collaboratively-built online encyclopedias (Animal Diversity Web), to handheld data collection and online data collation (BioKIDS) software. We are currently working on semantically-rich data sharing for ecologists (Spire) and molecular systematists (LepTree). In particular, we are critically examining the Semantic Web as a means of data sharing and integration. My colleagues and I are creating several biological ontologies, machine-readable logic-rich controlled vocabularies, which address phylogenies, natural history, ecological, behavioral, and morphological information. We are tying them to active web content management systems. Some will dynamically feed the ecological models described above (SPIRE project), others will enable tests of automatically integrating morphological character data and determining homology (the NSF Assembling the Tree of Life LepTree project).

Combining these two interests, we are interested in developing ways for the public to share online semantically-annotated outdoor observations (e.g. images or text reports) of outdoor observations with each other and with scientists. This citizen science technology, Spotter should provide data for scientists studying climate change and invasive species. It requires innovations in human-computer interaction (e.g. microformat or ontology editors for the general public) and applications of cutting edge data mining and semantic technologies to enable what I call a "global human sensor net." Pilot testing involves several blogs, including FieldMarking and the Howard County Conservancy.