TY - JOUR T1 - Archaeosortases and exosortases are widely distributed systems linking membrane transit with posttranslational modification. JF - J Bacteriol Y1 - 2012 A1 - Haft, Daniel H A1 - Payne, Samuel H A1 - Jeremy D Selengut KW - Amino Acid Sequence KW - Aminoacyltransferases KW - Archaeal Proteins KW - Bacterial Proteins KW - Cell Membrane KW - Cysteine Endopeptidases KW - Gene Expression Regulation, Archaeal KW - Gene Expression Regulation, Bacterial KW - Gene Expression Regulation, Enzymologic KW - Molecular Sequence Data KW - Protein Processing, Post-Translational AB -

Multiple new prokaryotic C-terminal protein-sorting signals were found that reprise the tripartite architecture shared by LPXTG and PEP-CTERM: motif, TM helix, basic cluster. Defining hidden Markov models were constructed for all. PGF-CTERM occurs in 29 archaeal species, some of which have more than 50 proteins that share the domain. PGF-CTERM proteins include the major cell surface protein in Halobacterium, a glycoprotein with a partially characterized diphytanylglyceryl phosphate linkage near its C terminus. Comparative genomics identifies a distant exosortase homolog, designated archaeosortase A (ArtA), as the likely protein-processing enzyme for PGF-CTERM. Proteomics suggests that the PGF-CTERM region is removed. Additional systems include VPXXXP-CTERM/archeaosortase B in two of the same archaea and PEF-CTERM/archaeosortase C in four others. Bacterial exosortases often fall into subfamilies that partner with very different cohorts of extracellular polymeric substance biosynthesis proteins; several species have multiple systems. Variant systems include the VPDSG-CTERM/exosortase C system unique to certain members of the phylum Verrucomicrobia, VPLPA-CTERM/exosortase D in several alpha- and deltaproteobacterial species, and a dedicated (single-target) VPEID-CTERM/exosortase E system in alphaproteobacteria. Exosortase-related families XrtF in the class Flavobacteria and XrtG in Gram-positive bacteria mark distinctive conserved gene neighborhoods. A picture emerges of an ancient and now well-differentiated superfamily of deeply membrane-embedded protein-processing enzymes. Their target proteins are destined to transit cellular membranes during their biosynthesis, during which most undergo additional posttranslational modifications such as glycosylation.

VL - 194 CP - 1 M3 - 10.1128/JB.06026-11 ER - TY - JOUR T1 - Whole genome analysis of Leptospira licerasiae provides insight into leptospiral evolution and pathogenicity. JF - PLoS Negl Trop Dis Y1 - 2012 A1 - Ricaldi, Jessica N A1 - Fouts, Derrick E A1 - Jeremy D Selengut A1 - Harkins, Derek M A1 - Patra, Kailash P A1 - Moreno, Angelo A1 - Lehmann, Jason S A1 - Purushe, Janaki A1 - Sanka, Ravi A1 - Torres, Michael A1 - Webster, Nicholas J A1 - Vinetz, Joseph M A1 - Matthias, Michael A KW - DNA, Bacterial KW - Evolution, Molecular KW - Gene Transfer, Horizontal KW - Genome, Bacterial KW - Genomic islands KW - HUMANS KW - Leptospira KW - Molecular Sequence Data KW - Multigene Family KW - Prophages KW - Sequence Analysis, DNA KW - Virulence factors AB -

The whole genome analysis of two strains of the first intermediately pathogenic leptospiral species to be sequenced (Leptospira licerasiae strains VAR010 and MMD0835) provides insight into their pathogenic potential and deepens our understanding of leptospiral evolution. Comparative analysis of eight leptospiral genomes shows the existence of a core leptospiral genome comprising 1547 genes and 452 conserved genes restricted to infectious species (including L. licerasiae) that are likely to be pathogenicity-related. Comparisons of the functional content of the genomes suggests that L. licerasiae retains several proteins related to nitrogen, amino acid and carbohydrate metabolism which might help to explain why these Leptospira grow well in artificial media compared with pathogenic species. L. licerasiae strains VAR010(T) and MMD0835 possess two prophage elements. While one element is circular and shares homology with LE1 of L. biflexa, the second is cryptic and homologous to a previously identified but unnamed region in L. interrogans serovars Copenhageni and Lai. We also report a unique O-antigen locus in L. licerasiae comprised of a 6-gene cluster that is unexpectedly short compared with L. interrogans in which analogous regions may include >90 such genes. Sequence homology searches suggest that these genes were acquired by lateral gene transfer (LGT). Furthermore, seven putative genomic islands ranging in size from 5 to 36 kb are present also suggestive of antecedent LGT. How Leptospira become naturally competent remains to be determined, but considering the phylogenetic origins of the genes comprising the O-antigen cluster and other putative laterally transferred genes, L. licerasiae must be able to exchange genetic material with non-invasive environmental bacteria. The data presented here demonstrate that L. licerasiae is genetically more closely related to pathogenic than to saprophytic Leptospira and provide insight into the genomic bases for its infectiousness and its unique antigenic characteristics.

VL - 6 CP - 10 M3 - 10.1371/journal.pntd.0001853 ER - TY - JOUR T1 - Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function. JF - BMC Bioinformatics Y1 - 2010 A1 - Jeremy D Selengut A1 - Rusch, Douglas B A1 - Haft, Daniel H KW - algorithms KW - Amino Acid Sequence KW - Gene Expression Profiling KW - Molecular Sequence Data KW - Phylogeny KW - Proteins KW - Sequence Analysis, Protein KW - Structure-Activity Relationship AB -

BACKGROUND: Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets.

RESULTS: Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization.

CONCLUSIONS: SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites.

VL - 11 M3 - 10.1186/1471-2105-11-52 ER - TY - JOUR T1 - Unexpected abundance of coenzyme F(420)-dependent enzymes in Mycobacterium tuberculosis and other actinobacteria. JF - J Bacteriol Y1 - 2010 A1 - Jeremy D Selengut A1 - Haft, Daniel H KW - Actinobacteria KW - Amino Acid Sequence KW - Binding Sites KW - Coenzymes KW - Flavonoids KW - Gene Expression Profiling KW - Gene Expression Regulation, Bacterial KW - Genome, Bacterial KW - molecular biology KW - Molecular Sequence Data KW - Molecular Structure KW - Mycobacterium tuberculosis KW - Phylogeny KW - Protein Conformation KW - Riboflavin AB -

Regimens targeting Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), require long courses of treatment and a combination of three or more drugs. An increase in drug-resistant strains of M. tuberculosis demonstrates the need for additional TB-specific drugs. A notable feature of M. tuberculosis is coenzyme F(420), which is distributed sporadically and sparsely among prokaryotes. This distribution allows for comparative genomics-based investigations. Phylogenetic profiling (comparison of differential gene content) based on F(420) biosynthesis nominated many actinobacterial proteins as candidate F(420)-dependent enzymes. Three such families dominated the results: the luciferase-like monooxygenase (LLM), pyridoxamine 5'-phosphate oxidase (PPOX), and deazaflavin-dependent nitroreductase (DDN) families. The DDN family was determined to be limited to F(420)-producing species. The LLM and PPOX families were observed in F(420)-producing species as well as species lacking F(420) but were particularly numerous in many actinobacterial species, including M. tuberculosis. Partitioning the LLM and PPOX families based on an organism's ability to make F(420) allowed the application of the SIMBAL (sites inferred by metabolic background assertion labeling) profiling method to identify F(420)-correlated subsequences. These regions were found to correspond to flavonoid cofactor binding sites. Significantly, these results showed that M. tuberculosis carries at least 28 separate F(420)-dependent enzymes, most of unknown function, and a paucity of flavin mononucleotide (FMN)-dependent proteins in these families. While prevalent in mycobacteria, markers of F(420) biosynthesis appeared to be absent from the normal human gut flora. These findings suggest that M. tuberculosis relies heavily on coenzyme F(420) for its redox reactions. This dependence and the cofactor's rarity may make F(420)-related proteins promising drug targets.

VL - 192 CP - 21 M3 - 10.1128/JB.00425-10 ER - TY - JOUR T1 - Three genomes from the phylum Acidobacteria provide insight into the lifestyles of these microorganisms in soils. JF - Appl Environ Microbiol Y1 - 2009 A1 - Ward, Naomi L A1 - Challacombe, Jean F A1 - Janssen, Peter H A1 - Henrissat, Bernard A1 - Coutinho, Pedro M A1 - Wu, Martin A1 - Xie, Gary A1 - Haft, Daniel H A1 - Sait, Michelle A1 - Badger, Jonathan A1 - Barabote, Ravi D A1 - Bradley, Brent A1 - Brettin, Thomas S A1 - Brinkac, Lauren M A1 - Bruce, David A1 - Creasy, Todd A1 - Daugherty, Sean C A1 - Davidsen, Tanja M A1 - DeBoy, Robert T A1 - Detter, J Chris A1 - Dodson, Robert J A1 - Durkin, A Scott A1 - Ganapathy, Anuradha A1 - Gwinn-Giglio, Michelle A1 - Han, Cliff S A1 - Khouri, Hoda A1 - Kiss, Hajnalka A1 - Kothari, Sagar P A1 - Madupu, Ramana A1 - Nelson, Karen E A1 - Nelson, William C A1 - Paulsen, Ian A1 - Penn, Kevin A1 - Ren, Qinghu A1 - Rosovitz, M J A1 - Jeremy D Selengut A1 - Shrivastava, Susmita A1 - Sullivan, Steven A A1 - Tapia, Roxanne A1 - Thompson, L Sue A1 - Watkins, Kisha L A1 - Yang, Qi A1 - Yu, Chunhui A1 - Zafar, Nikhat A1 - Zhou, Liwei A1 - Kuske, Cheryl R KW - Anti-Bacterial Agents KW - bacteria KW - Biological Transport KW - Carbohydrate Metabolism KW - Cyanobacteria KW - DNA, Bacterial KW - Fungi KW - Genome, Bacterial KW - Macrolides KW - Molecular Sequence Data KW - Nitrogen KW - Phylogeny KW - Proteobacteria KW - Sequence Analysis, DNA KW - sequence homology KW - Soil Microbiology AB -

The complete genomes of three strains from the phylum Acidobacteria were compared. Phylogenetic analysis placed them as a unique phylum. They share genomic traits with members of the Proteobacteria, the Cyanobacteria, and the Fungi. The three strains appear to be versatile heterotrophs. Genomic and culture traits indicate the use of carbon sources that span simple sugars to more complex substrates such as hemicellulose, cellulose, and chitin. The genomes encode low-specificity major facilitator superfamily transporters and high-affinity ABC transporters for sugars, suggesting that they are best suited to low-nutrient conditions. They appear capable of nitrate and nitrite reduction but not N(2) fixation or denitrification. The genomes contained numerous genes that encode siderophore receptors, but no evidence of siderophore production was found, suggesting that they may obtain iron via interaction with other microorganisms. The presence of cellulose synthesis genes and a large class of novel high-molecular-weight excreted proteins suggests potential traits for desiccation resistance, biofilm formation, and/or contribution to soil structure. Polyketide synthase and macrolide glycosylation genes suggest the production of novel antimicrobial compounds. Genes that encode a variety of novel proteins were also identified. The abundance of acidobacteria in soils worldwide and the breadth of potential carbon use by the sequenced strains suggest significant and previously unrecognized contributions to the terrestrial carbon cycle. Combining our genomic evidence with available culture traits, we postulate that cells of these isolates are long-lived, divide slowly, exhibit slow metabolic rates under low-nutrient conditions, and are well equipped to tolerate fluctuations in soil hydration.

VL - 75 CP - 7 M3 - 10.1128/AEM.02294-08 ER -