BIP-Splice Use Cases

BIP-Splice Use Cases

There are multiple use cases for BIP-Splice. To analyze these cases, we must first identify the types of users of the system:
Researchers / Administrators / Developers / General Purposes / Single or limited purpose

Researchers
These are the end users of the system. They are the ones to determine which information is needed to create a BIP-Splice database, and which questions need to be answered using the data. Researchers typically will not interact with the database directly, but will use the web interface to perform some queries and will send more complex queries to the developers to formulate custom scripts. Researchers will also interpret the data which is returned by queries to the BIP-Splice datbase(s).

Administrators
Admins are the people responsible for running the BIP-Splice tools to create and maintain BIP-Splice databases. In most cases they will also be responsible for retrieving the input data specified by the researchers.

Developers
Developers are able to create specialized queries based on the needs of the developers, and will run these queries against the database directly. Developers will use a combination of SQL and another SQL-compatible programming language, such as Perl, Java, PHP, to create scripts to run the resercher's queries.

Since BIP-Splice databases may be general purpose, incorporating an organism's entire transcriptome, or more focused, using only transcripts which are selected based on criteria such as functional annotation, library/tissue, etc., it is necessary to see what the differences between them are.

General purpose
The general purpose BIP-Splice database aims to capture as much transcript data for one organism as possible.  Upon creation, all transcripts from desired data sources are downloaded and aligned to the genome.  Transcripts which do not pass the quality filters remain in the database, unmapped.  This database type is expected to be the most used, and most practical.
Advantages:
This method allows for the most flexible database to be created.  A new database does not need to be created each time the requirements change, only the queries need to be modified.  Also the transcript clustering and splice analysis gives a much more complete picture of what is happening in the organism.
Disadvantages:
The main disatvantage is storage space.  More space is required to hold many transcripts and related data if only a small subset will need to be studied.  Also, since an increase in number of transcripts equals more processing time, it will take longer to build this type of database.

Single or limited purpose
A limited database uses input data which is pre-filtered based on certain criteria.  Transcripts which are used for this database may be chosen based on the functional annotation, library/tissue, sequence length, date of sequence record, etc.
This case will be less used than general purpose, but it will be useful in certain situations.
Advantages:
Build time will be reduced.  Queries can be simpler because they do not need to filter the data based on the criteria which formed the pre-build filters.  This database will occupy less storage space as well.
Disadvantages:
This is not the most flexible option.  If the requirements change at a later date, a new BIP-Splice database needs to be created using new input data.  The gains in processing time and storage space may be reversed when time and space need to be used for a new database.

One other aspect of BIP-Splice usage relates to where the database is stored.  If the database is stored at the researcher's location, then they have the authority to decide which data is used as input, as well as operating parameters, etc.  BIP-Splice databases may also be created at another location and accessible only via web interface.

Here are two examples of BIP-Splice database creation:

General purpose
Pre-filtered input (single/limited purpose)
Once built, the database can be browsed or queried with the web interface, custom scripts, or direct SQL statements.

Query examples:
Counting the number of clusters for the organism.
SQL statement:
select count(cluster_id) from cluster;

Determining the amount of alternative splicing present. Count both variant and invariant clusters.
SQL statement:
select count(cluster_id) from cluster where variant='t';
select count(cluster_id) from cluster where variant='f';


Finding the clusters in a particular tissue which have alternative splicing events:
SQL statement:
select distinct cluster.cluster_id from cluster, clone where clone.library ilike '%liver%' and clone.cluster_id=cluster.cluster_id;

Here is a more complex analysis pipeline which can be implemented using a BIP-Splice database:

  1. For each cluster, analyze gene variations within the transcripts. Clusters and transcripts can be selected according to some criteria such as library/tissue, chromosome, etc. Determine whether splice variations occur within the coding region, and if the variations cause frameshifts.
  2. Analyze coding region using Interpro or similar tools. Determine impact of splice variation, does it change functional domains or cause other effects in the protein?
  3. Use PHD (Rost, 2003) and SigPep (Neilsen et al 1997, Saxova et al 2003) to predict protein secondary structure, peptide leader sequences, and buried vs. exposed residues. Determine which secondary structure features overlap with gene variations.
  4. Compare the translated protein with sequences from the PDB database of 3D protein structures (Berman et al 2002, Westbrook et al 2003) and with 3D models from MODBASE (Peiper et al 2002). If there is a solved structure or a model, determine whether the gene variation overlaps with surface or interior residues.

    For any comments or suggestions, send an e-mail at: BIP_FEEDBACK@asu.edu
BIP-Splice
BIP-Splice Use Cases
BIP-Splice General Use Case
BIP-Splice Schema
Dented Wheel
Last updated: 12/12/2006

Home | Site Map | Project | Users | Participants | Sponsors | Links | Papers | Contact | FAQ | Glossary ©2006 BIP