From AureoWiki
Jump to navigation Jump to search

AureoWiki - The Wiki way of Staphylococcus aureus annotation[edit source]

AureoWiki is the first tool to support curated annotation of S. aureus biological entities in a Wiki based manner. At the currently available state, AureoWiki provides information for an S. aureus pan-genome of 33 strains. Furthermore, detailed information on the genes and proteins of S. aureus strains COL, NCTC8325, N315, Newman and USA300_FPR3757 is provided. These five strains were selected because of their detailed annotation (N315) or their clinical or scientific importance. The AureoWiki gene pages contain a lot of information from a variety of sources, including e.g. data on protein function and localization, transcriptional regulation, and gene expression. Orthologous genes of the individual strains are linked by a pan-genome gene identifier and a unified gene name. The main objective of AureoWiki is to facilitate the transfer of knowledge gained in studies with different S. aureus strains, thus supporting functional annotation and better understanding of this organism. Some direct links may give an impression of typical gene / protein pages: hld, eno, hlY, splA, pflB, saeR, sarA

The scientific community is invited to extend the current information. This can be performed via the Edit icon on the top right of each page or the [edit] links on top of each section or inside some sections. Changes can be made as a registered user (you can setup your own account by using the button with the small down arrow next to the Log in link) or by answering a Staphylococcus research field associated expert question before the saving process is initiated. Any user will easily be able to add information to the AureoWiki as described here.

Information preceded by a filled bullet point is fed from a database behind the AureoWiki and cannot be changed. This is because we propose that especially sequence based information should be consistently maintained in accordance with the RefSeq annotation and other databases. An [edit] link makes it possible to add user generated content or comments. User added information is indicated by open bullets. At some places we added placeholders to ask for user input (e.g. phenotype section of the strain-specific gene pages) to complement the currently available information.

The Main Page presents an easy to use Search mask. Here the user can search for locus tags, gene symbols or keywords. The [Getting Started] button brings you to the actual page.

The bar of register tabs on top of the AureoWiki pages allows switching between the pan-genome pages and the strain-specific pages for a gene. The pages are linked via an orthologue reference table that was constructed within our work and can be downloaded using the [Downloads] button. What kind of information you can find in the AureoWiki and where this information comes from will be explained in the following chapters:

The S.aureus Pan Genome and the AureoWiki Pan Genome Pages[edit source]

For a comparative analysis of S. aureus strains an overall alignment of a publicly available set of 33 strain-specific S. aureus genomes from NCBI RefSeq was performed.

NC_002951 COL NC_002745 N315 NC_007795 NCTC_8325 NC_009641 Newman
NC_007793 USA300_FPR3757
NC_017340 04_02981 NC_013450 ED98 NC_016928 M013 NC_007622 RF122
NC_018608 08BA02176 NC_017763 HO_5096_0412 NC_002952 MRSA252 NC_017333 ST398
NC_017351 11819_97 NC_009632 JH1 NC_016941 MSHR1132 NC_017347 T0131
NC_022222 6850 NC_009487 JH9 NC_002953 MSSA476 NC_017342 TCH60
NC_017673 71193 NC_017341 JKD6008 NC_009782 Mu3 NC_017331 TW20
NC_017343 ECT_R 2 NC_017338 JKD6159 NC_002758 Mu50 NC_010079 USA300_TCH1516
NC_017337 ED133 NC_017349 LGA251 NC_003923 MW2 NC_016912 VC40

On this basis it became possible to construct a positionally corrected and all genes containing pan-genome. This was used to assign homologous genes (at least 50% identity on DNA and 70% identity on protein level) of different S. aureus strains to orthologous gene groups, so called unified pan-genome gene IDs. The corresponding pan-genome gene symbols are species-wide unified and unique gene name resulting from a manual curation effort. In brief, pan gene symbols were assigned based on strain-specific gene symbols from NCBI-RefSeq and/or supported by relevant publications or, alternatively, the name of the orthologous Bacillus subtilis gene if it is not a “y”-name.

The pan-genome was determined by the group of Kay Nieselt (Center for Bioinformatics, University of Tübingen) by progressiveMAUVE [1] based total genome alignment followed by an iterative refinement of orthologue groups with special attention to gene synteny by using OrthologyPredator and PanGee (L. Backert, A. Henning, unpublished data).

The so called pan-genome pages, which can be accessed by using the most left register tab on top of the AureoWiki pages, present the following information:

A unique pan-genome gene ID (pan ID) resulting from the order of all genes within the S. aureus pan-genome, followed by the pan-genome gene symbol and a list of gene descriptions that were extracted from the RefSeq annotations of all 33 S. aureus genomes. Next, you find the strand and the start and end positions of the pan-genome genes within the positionally standardized pan-genome as well as the so called synteny blocks, which define phylogenetically conserved genome regions. How often a gene has orthologues genes in the 33 S. aureus genomes used for the pan-genome construction is shown in the occurrence entry.

In the following Orthologs section, all 33 S. aureus strains are listed in a fixed order and, if the gene is present in the respective strain, the locus tag is shown together with the strain-specific gene name, if assigned, from the RefSeq annotation.

A graphical display of the respective genomic region of the N315, COL, NCTC8325, Newman and USA300_FPR3757 strains highlights phylogenetic conservation. The used colors encode for gene functional assignments (see table below) performed by using a collection of Hidden Markov models of TIGRFAMs.

Meta Function Gene Functional Class (TIGRFam Main Role) Color Code
Envelope Cell envelope
Cellular processes Cellular processes
Metabolism Amino acid biosynthesis
Biosynthesis of cofactors, prosthetic groups, and carriers
Central intermediary metabolism
Energy metabolism
Fatty acid and phospholipid metabolism
Purines, pyrimidines, nucleosides, and nucleotides
Transport and binding proteins
Genetic Info processing DNA metabolism
Mobile and extrachromosomal element functions
Protein fate
Protein synthesis
Regulation Regulatory functions
Signal transduction
Unknown function Hypothetical proteins
Unknown function
RNAs RNA genes

The Strain Specific Pages[edit source]

The Summary section at the top of each page contains the locus tag, the gene name and function of the gene product from the RefSeq annotation as well as the pan locus tag and the pan gene symbol. In the following Genome View (based on a vectorized (SVG) file format), condensed genome information is provided, initially aligned to the position of the respective gene. The genome position in the genome browser can be changed by dragging the slider. By clicking on gene arrows, the user is transferred to the corresponding gene page, thus enabling a page by page walking through the genome. Colors correspond to the gene functional categories as described above. Finally, the genome browser combines for each strain the well-established RefSeq annotation and the new RefSeq annotation introduced in 2015 [2], thus directly showing the differences in gene content and/or coordinates.

The Gene section contains the information about the gene. It covers basic information as in the Summary section, complemented by the gene coordinates, gene length, essentiality, DNA sequence, and external accession numbers with links to the gene-specific database entries.

The largest section of the page, the Protein section, is devoted to the encoded protein. It shows, amongst others, the protein length, the molecular weight and isoelectric point, catalyzed reaction, protein function assignments (see next paragraph), protein interaction partners, and subcellular localization. Finally, the Protein section contains database links (NCBI Protein database and UniProt), the protein sequence and experimental data including protein localization [3] and absolute quantification of cytoplasmic proteins [4].

Functional assignments have been generated as follows:

  1. For enzymes the catalytic activity is provided by the EC number (extracted from NCBI RefSeq and UniProt databases), complemented by the corresponding enzyme name and reaction equation (extracted from ExPASy).
  2. The assignment of protein sequences to TIGRFAMs protein families [5] is based on TIGRFAM Hidden Markov Models (HMM) using hmmscan of the HMMER3 software package [6]. The display of TIGRFAMs is ordered according to their HMM scores as a significance measure of the assignment and possesses a tree like structure including the TIGR role categories (main role and sub role) and an added meta level summarizing the TIGR main roles (see also color table above).
  3. As described for TIGRFAMs, the assignment of sequences to Pfam protein families [7] is based on HMMs and uses the HMMER package. Pfams with the highest HMM scores are shown first. A large part of Pfams is grouped into clans (evolutionary related families), which are displayed on top of the Pfam annotation.
  4. Assignment of predicted protein functions is obtained from the SEED database [8], a comparative genomics database based on expert annotation of subsystems (sets of related functional roles). By default the lists of assigned predicted functions are collapsed and show only the hit with the highest score, but can be expanded by clicking the plus sign.

The following section of the gene page provides information on Expression & Regulation, including the predicted operon structure obtained from MicrobesOnline [9], gene expression regulation as well as gene expression profiles. Data on transcription factor regulons, alternative RNA polymerase sigma factors (SigB and SigH) and regulatory RNAs was retrieved from the RegPrecise database [10] and published literature. Gene expression data from two large-scale studies [11] covering a wide range of growth, stress and infection-related conditions were included in AureoWiki by graphical representations of condition-dependent mRNA levels and protein induction profiles on the gene pages of strains NCTC 8325 and COL, respectively.

All data are provided with links to the external data sources, including various databases and published literature. References are indicated by the book symbol. Details of the corresponding publication are displayed by mouse-over. A list of additional literature is found under "Relevant publications" at the bottom of the page.

References[edit source]

  1. Aaron E Darling, Bob Mau, Nicole T Perna
    progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement.
    PLoS One: 2010, 5(6);e11147
    [PubMed:20593022] [] [DOI] (I e)
  2. Tatiana Tatusova, Michael DiCuccio, Azat Badretdin, Vyacheslav Chetvernin, Eric P Nawrocki, Leonid Zaslavsky, Alexandre Lomsadze, Kim D Pruitt, Mark Borodovsky, James Ostell
    NCBI prokaryotic genome annotation pipeline.
    Nucleic Acids Res: 2016, 44(14);6614-24
    [PubMed:27342282] [] [DOI] (I p)
  3. Dörte Becher, Kristina Hempel, Susanne Sievers, Daniela Zühlke, Jan Pané-Farré, Andreas Otto, Stephan Fuchs, Dirk Albrecht, Jörg Bernhardt, Susanne Engelmann, Uwe Völker, Jan Maarten van Dijl, Michael Hecker
    A proteomic view of an important human pathogen--towards the quantification of the entire Staphylococcus aureus proteome.
    PLoS One: 2009, 4(12);e8176
    [PubMed:19997597] [] [DOI] (I e)
    Annette Dreisbach, Kristina Hempel, Girbe Buist, Michael Hecker, Dörte Becher, Jan Maarten van Dijl
    Profiling the surfacome of Staphylococcus aureus.
    Proteomics: 2010, 10(17);3082-96
    [PubMed:20662103] [] [DOI] (I p)
    Kristina Hempel, Florian-Alexander Herbst, Martin Moche, Michael Hecker, Dörte Becher
    Quantitative proteomic view on secreted, cell surface-associated, and cytoplasmic proteins of the methicillin-resistant human pathogen Staphylococcus aureus under iron-limited conditions.
    J Proteome Res: 2011, 10(4);1657-66
    [PubMed:21323324] [] [DOI] (I p)
  4. Daniela Zühlke, Kirsten Dörries, Jörg Bernhardt, Sandra Maaß, Jan Muntel, Volkmar Liebscher, Jan Pané-Farré, Katharina Riedel, Michael Lalk, Uwe Völker, Susanne Engelmann, Dörte Becher, Stephan Fuchs, Michael Hecker
    Costs of life - Dynamics of the protein inventory of Staphylococcus aureus during anaerobiosis.
    Sci Rep: 2016, 6;28172
    [PubMed:27344979] [] [DOI] (I e)
  5. Daniel H Haft, Jeremy D Selengut, Roland A Richter, Derek Harkins, Malay K Basu, Erin Beck
    TIGRFAMs and Genome Properties in 2013.
    Nucleic Acids Res: 2013, 41(Database issue);D387-95
    [PubMed:23197656] [] [DOI] (I p)
  6. Robert D Finn, Jody Clements, Sean R Eddy
    HMMER web server: interactive sequence similarity searching.
    Nucleic Acids Res: 2011, 39(Web Server issue);W29-37
    [PubMed:21593126] [] [DOI] (I p)
  7. Robert D Finn, Penelope Coggill, Ruth Y Eberhardt, Sean R Eddy, Jaina Mistry, Alex L Mitchell, Simon C Potter, Marco Punta, Matloob Qureshi, Amaia Sangrador-Vegas, Gustavo A Salazar, John Tate, Alex Bateman
    The Pfam protein families database: towards a more sustainable future.
    Nucleic Acids Res: 2016, 44(D1);D279-85
    [PubMed:26673716] [] [DOI] (I p)
  8. Ross Overbeek, Tadhg Begley, Ralph M Butler, Jomuna V Choudhuri, Han-Yu Chuang, Matthew Cohoon, Valérie de Crécy-Lagard, Naryttza Diaz, Terry Disz, Robert Edwards, Michael Fonstein, Ed D Frank, Svetlana Gerdes, Elizabeth M Glass, Alexander Goesmann, Andrew Hanson, Dirk Iwata-Reuyl, Roy Jensen, Neema Jamshidi, Lutz Krause, Michael Kubal, Niels Larsen, Burkhard Linke, Alice C McHardy, Folker Meyer, Heiko Neuweger, Gary Olsen, Robert Olson, Andrei Osterman, Vasiliy Portnoy, Gordon D Pusch, Dmitry A Rodionov, Christian Rückert, Jason Steiner, Rick Stevens, Ines Thiele, Olga Vassieva, Yuzhen Ye, Olga Zagnitko, Veronika Vonstein
    The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes.
    Nucleic Acids Res: 2005, 33(17);5691-702
    [PubMed:16214803] [] [DOI] (I e)
  9. Paramvir S Dehal, Marcin P Joachimiak, Morgan N Price, John T Bates, Jason K Baumohl, Dylan Chivian, Greg D Friedland, Katherine H Huang, Keith Keller, Pavel S Novichkov, Inna L Dubchak, Eric J Alm, Adam P Arkin
    MicrobesOnline: an integrated portal for comparative and functional genomics.
    Nucleic Acids Res: 2010, 38(Database issue);D396-400
    [PubMed:19906701] [] [DOI] (I p)
  10. Pavel S Novichkov, Alexey E Kazakov, Dmitry A Ravcheev, Semen A Leyn, Galina Y Kovaleva, Roman A Sutormin, Marat D Kazanov, William Riehl, Adam P Arkin, Inna Dubchak, Dmitry A Rodionov
    RegPrecise 3.0--a resource for genome-scale exploration of transcriptional regulation in bacteria.
    BMC Genomics: 2013, 14;745
    [PubMed:24175918] [] [DOI] (I e)
  11. Stephan Fuchs, Daniela Zühlke, Jan Pané-Farré, Harald Kusch, Carmen Wolf, Swantje Reiß, Le Thi Nguyen Binh, Dirk Albrecht, Katharina Riedel, Michael Hecker, Susanne Engelmann
    Aureolib - a proteome signature library: towards an understanding of staphylococcus aureus pathophysiology.
    PLoS One: 2013, 8(8);e70669
    [PubMed:23967085] [] [DOI] (I e)
    Ulrike Mäder, Pierre Nicolas, Maren Depke, Jan Pané-Farré, Michel Debarbouille, Magdalena M van der Kooi-Pol, Cyprien Guérin, Sandra Dérozier, Aurelia Hiron, Hanne Jarmer, Aurélie Leduc, Stephan Michalik, Ewoud Reilman, Marc Schaffer, Frank Schmidt, Philippe Bessières, Philippe Noirot, Michael Hecker, Tarek Msadek, Uwe Völker, Jan Maarten van Dijl
    Staphylococcus aureus Transcriptome Architecture: From Laboratory to Infection-Mimicking Conditions.
    PLoS Genet: 2016, 12(4);e1005962
    [PubMed:27035918] [] [DOI] (I e)