- 1 AureoWiki - The Wiki way of Staphylococcus aureus annotation
- 2 The S.aureus Pan Genome and the AureoWiki Pan Genome Pages
- 3 The Strain Specific Pages
- 3.1 Summary
- 3.2 Genome View
- 3.3 Gene
- 3.4 Protein
- 3.5 Expression and regulation
- 3.6 Biological material
- 3.7 Other information
- 3.8 References
- 4 References
- 5 Mediawiki Help and support
⊟AureoWiki - The Wiki way of Staphylococcus aureus annotation[edit source]
AureoWiki is the first tool to support curated annotation especially of S.aureus biological entities in a Wiki based and community driven fashion. At the currently available and initial state AureoWiki provides information for a first draft of the S. aureus pan genome. Furthermore information on S. aureus strains COL, NCTC8325 (HG001), N315, Newman and USA300_FPR3757 is provided. These five strains were selected because of their detailed annotation (N315) or due to their importance for the scientific community (NCTC8325, COL). The main objective of AureoWiki is a step by step and community driven description and annotation of information for S. aureus biological entities and to force the process to make knowledge from lab based research available for the interested public. Some direct links may give an impression how a typical gene / protein page is structured: hld, eno, hlY, splA, pflB, saeR, sarA
Every interested scientist is invited to complement the shown information. This can occure by using the edit icon on top right of each page or the [Edit] links besides the subheaders or inside some chapters. Changes can be made as a registered user (you can setup your own account by using the button with the small down arrow) left hand side from the [login] Link on top the every page or by answering a Staphylococcus research field associated expert question before the saving process is initiated. Any user, who is experienced in editing Wikipedia articles will easily be able to add information to the AureoWiki. Aureowiki Pages show a lot of information from a variety of sources.
Information preceeded by a filled bullet point is feeded from a data base behind the AureoWiki and cannot be changed. This is because we propose, that especially sequence based infromation will not be changed very often and will be maintatined in a more or less static database. An [Edit] link makes it possible to add new user generated AureoWiki content. User added information is indicated by white bullets. At some places we added placeholders to motivate the user for further interesting input (e.g. phenotype section of the strain specific pages) to complement the currently available information. Iteratively the AureoWiki pages will grow depending of the engagement of the community.
The Main Page presents a clear and easy to use Search mask. Here the interested user can search for locus tags, gene symbols or keywords. The [Getting Started] button brings you to this page.
The bar of register tabls of the AureoWiki-Pages gives the user the opportunity to switch between the Pan Genome Pages and the Strain specific Pages. The pages are linked. This occurs via an orthologue reference table that is used behind the scenes and was constructed within our work. What kind of information you can find in the AureoWiki and where this information comes from will be explained in the following chapters:
⊟The S.aureus Pan Genome and the AureoWiki Pan Genome Pages[edit source]
For a comparative analysis of S.aureus strains an overall alignment of a public available set of 32 strain specific S. aureus genomes has been performed.
On this basis it became possible to construct a positionally corrected and all genes containing pan genome. This was used to assign homologous genes (at least 50% identity on DNA and 70% identitiy on protein level) of different S. aureus strains to so called unified pan genome gene IDs and pan genome gene symbols which may simplify the knowledge transfer between scientists working with different S. aureus strains.
Finding these unified pan genome based gene symbols was the major challenge in our efforts. In a first round the gene names with an intuitively to understand meaning were extracted from three S. aureus strains and assigned to the corresponding pan genome genes. The best annotated N315 strain followed by the COL, NCTC8325 and USA300_FPR3757 strains served as basis for initial pan genome gene naming. In a second round names with still no easy to understand symbols were specified by using the whole overall S.aureus alignment and iteratively complemented by gene names from S. epidermidis, B.subtilis and at last from names of other bacterial homologues stored in Genbank.
The result of our work is presented inside AureoWiki on the so called pan genome pages, which can be accessed by using the most left register tab on top of the AureoWiki pages. The pan genome pages present the following information:
A unique pan genome ID resulting from the order of all genes within the S.aureus pan genome, followed by the pan genome gene symbol and the source of its naming (from the N315, COL, NCTC8325 strain, other S.aureus or other bacterial sources). This is complemented by a list of genes descriptions that were extracted from all available S. aureus genomes and in part independent annotation efforts. Because of the in part independent annotations this is a suitable collection for possible functions especially for genes with vague functional annotation.
Within the positionally standardized pan genome you can find the strand and the pan genome start and end positions of the pan genome genes as well as the so called synteny blocks, those IDs define phylogenetically conserved genome regions. How often a gene has orthologues genes in the 32 analyzed S.aureus genomes which have been used for the pan genome construction is shown in the occurence entry. 100% means, that the gene is part of the core genome, lower values defines genes of the dispensible fraction of the pan genome and genes with occurences less than 6% maybe considered as orphans.
|Meta Function||Gene Functional Class (TIGRFam Main Role)||Color Code|
|Cellular processes||Cellular processes|
|Metabolism||Amino acid biosynthesis|
|Biosynthesis of cofactors, prosthetic groups, and carriers|
|Central intermediary metabolism|
|Fatty acid and phospholipid metabolism|
|Purines, pyrimidines, nucleosides, and nucleotides|
|Transport and binding proteins|
|Genetic Info processing||DNA metabolism|
|Mobile and extrachromosomal element functions|
|Unknown function||Hypothetical proteins|
A graphical display of the genes' of interest region of the N315, COL, NCTC8325 and USA300_FPR3757 strains highlight phylogenetically conserved genomic regions. The used colors encode for gene functional assignments (see table). Assigning gene sequences to function was performed by using a collection of Hidden Markov models of TIGRFams .
Some technical remarks on the pan genome construction:
The pan genome was determined on the basis of a MAUVE  based total genome alignment followed by an iterative assignment of single genes to the core genome by using GenomeRing  and OrthologyPredator (Linus Backert, unpublished data).
The pan genome alignment was developed in a very close collaboration at the group of Kay Nieselt in Tübingen (L.Backert, A.Hennig, A.Herbig, K.Nieselt).
⊟The Strain Specific Pages[edit source]
Much of the strain specific data comes from GenBank  and is complemented by data from many orther sources explained later. The gene summary condenses the most relevant information of the currently shown gene at the entree of the strain specific gene pages. This is enriched by the unique pan genome gene ID and the pan genome gene symbol.
⊟Genome View[edit source]
For the AureoWiki we developed a space saving genome viewer reducing the genome information to a minimum. It's based on a vectorized (SVG) file format and is initially aligned to the position of the gene currently shown. The genome position in the genome browser can be changed by grabbing the slider or moving the mouse wheel on the left side of the genome browser. Clicking on gene arrows will bring you to the corresponding gene page, this makes a step by step or page by page walking through the genome possible. Colors encode the gene functional categories as explained above.
Here we display much of the basic information on S.aureus genes. This includes the strain specific gene coordinates, the gene length, the gene type (CDS, mRNA ...) and the strain specific gene symbol. Furthermore we exxtracted the description, the strand and the encoding replicon. This genbank based information was enrichted by essentiality information which we extracted from the DEG ;  that collected the essentiality information from ; .
Accession numbers[edit source]
Acc no's link to a selection of relevant external x-references. We plan a step by step extension of this section.
The phenotype of mutants within the gene of interest normally can only be described in "free text". Availabiltiy of phenotype information is very limited in public database ressources. By this reason we would appreciate an active engagment of the S. aureus interested research community to share the knowledge with all interested scientists. Also we plan to extract phenotype information from the literature for genes we are interested in.
This shows the exact DNA sequence of a gene.
General information such as the protein symbol (three letter plus extension capital letters), description, sequence, length in aa were extracted from Genbank and implemented in AureoWiki. MW and pI have been calculated directly from the amino acid sequence.
Functional assignments or data that may help for a functional assingment have been generated and extracted in several ways:
The EC numbers were extracted directly from the Genbank sequence information. Additionally, ECs from a TIGRFam  functional classification have been inserted. EC numbers are complemented by the correspondig reaction equations and, if available, with enzyme names.
TIGRFams have been established by TIGR and are based on a Riley scheme for gene functional classification as first established for Escherichia coli by Riley . TIGRfams deiscribe families of homologues proteins with defined functions in metabolism / physiology of bacteria, e.g. an enzymatic activity, a transporter with defined specificity, a protein with a structural function and many more. For each family a so called TIGRFam Hidden Markov Model (HMM), a general description of family specific sequence features has been established and can be used for a AI (artificial intellegence) such as HMMER  based assignment of sequences to TIGRFam functions. Because TIGRFams are systematically ordered in a hierarchical system of nested functions (enolase is part of glycolysis is part of energy metabolism) these data are displays in a tree like structure as known from computer file system hierarchies. Pressing the [+] sign gives access to all assignments to TIGRFams and the branches of the hierarchical system. For clarity reasons we inserted a meta level summarizing the TIGR main roles with the subordered sub roles and TIGRFam functions. The color codes within the genome viewer are based on these meta roles (orange brown - Metabolism, blue shades - Genetic information processing, green tones - Signal Processing, pink - Cell Wall and Envelope, red - Cellular Processes, black - RNA Genes, grey - Hypothetical and Proteins with no known function). The display of TIGRFams is ordered according their HMMER Scores giving a measure on the quality of the gene <-> TIGR HMM alignment. Especially for unknown-functions-genes also low scores may give a hint for a function.
TheSeed  is an open project for specialists' driven functional classification of DNA/protein sequences. As similar to TIGRFams this system is hierachically structured. Multiple assignments can be accessed by expanding the entry by pressing the [+] sign. Because the assignment is based on manual curation, no further quality measure is available.
As similarly described for TIGRFams the assignment of sequences to protein families PFAM  is based on HMMs. Due to the organization of Pfams in Clans the results have been displayed hierarchically with the Pfam clans on the top level and pfams as sublevels. The HMM scores reflect the quality of the assignments with best HMM scores shown on top.
Structure, modifications and interactions[edit source]
Information on domains, protein modifications and cofactors are still not included but will be incorporated soon.
Some results from N315 based studies on regulators and those effectors, those results have been stored in Regprecise  are shown. Effectors are biomolecules modulating the activity of other biomolecules, in this cas of regulators. The N315 results have been transfered to the other strains on the basis of sequence homology of the other S.aureus strains' regulators.
A study in strain MRSA252 , this strain is until now not included into AureoWiki revealed a set of several thousand protein interactions. By using orthologue mapping these interactions have been inclued for the available S.aureus strains and indicated correspondingly.
There are several algorithms available predicting the (extra)cellular localization of proteins. We applied three (PSORTb , locateP, SignalP) of them and present their reuslts. The TMHMM  Tool predicts the number of transmembrane domains, that is shown. Clicking [+] will reveal more detailed data which served as the basis to predict the proteins' theoretical localization.
Accession numbers[edit source]
A collection of accession numbers or cross references
Protein sequence from genbank
If there is experimental evidence, that the protein as a whole or one or more peptides of it are really synthesized or accumulated this is stated. For this purpose a collection of global proteome analysis papers have been analyzed. The Papers are given as references.
⊟Expression and regulation[edit source]
Operons have been imported from Microbes Online . All operon members are clickable and give access to their AureoWiki pages.
Regulatory data have been extracted from RegPrecise . Regprecise currently maintains exclusively S.aureus N315 regulatory data. By homology analysis the N315 data have been transfered to the other strains of AureoWiki, that is indicated.
Transcription pattern[edit source]
Protein Synthesis[edit source]
Half Life[edit source]
⊟Biological material[edit source]
⊟Other information[edit source]
A list of papers from where data have been inserted into the Aureowiki. These references are indicated on the Wikipages by the [Citations symbol]. By "mouse over" the corresponding citation will be shown or highlighted.
Further reading[edit source]
Here a list, especially of papers of general analyses concerning the presented gene / protein is shown.
Aaron C E Darling, Bob Mau, Frederick R Blattner, Nicole T Perna
Mauve: multiple alignment of conserved genomic sequence with rearrangements.
Genome Res.: 2004, 14(7);1394-403
[PubMed:15231754] [WorldCat.org] [DOI] (P p)
A Herbig, G Jäger, F Battke, K Nieselt
GenomeRing: alignment visualization based on SuperGenome coordinates.
Bioinformatics: 2012, 28(12);i7-15
[PubMed:22689781] [WorldCat.org] [DOI] (I p)
Hao Luo, Yan Lin, Feng Gao, Chun-Ting Zhang, Ren Zhang
DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements.
Nucleic Acids Res.: 2014, 42(Database issue);D574-80
[PubMed:24243843] [WorldCat.org] [DOI] (I p)
R Allyn Forsyth, Robert J Haselbeck, Kari L Ohlsen, Robert T Yamamoto, Howard Xu, John D Trawick, Daniel Wall, Liangsu Wang, Vickie Brown-Driver, Jamie M Froelich, Kedar G C, Paula King, Melissa McCarthy, Cheryl Malone, Brian Misiner, David Robbins, Zehui Tan, Zhan-yang Zhu Zy, Grant Carr, Deborah A Mosca, Carlos Zamudio, J Gordon Foulkes, Judith W Zyskind
A genome-wide strategy for the identification of essential genes in Staphylococcus aureus.
Mol. Microbiol.: 2002, 43(6);1387-400
[PubMed:11952893] [WorldCat.org] (P p)
Roy R Chaudhuri, Andrew G Allen, Paul J Owen, Gil Shalom, Karl Stone, Marcus Harrison, Timothy A Burgis, Michael Lockyer, Jorge Garcia-Lara, Simon J Foster, Stephen J Pleasance, Sarah E Peters, Duncan J Maskell, Ian G Charles
Comprehensive identification of essential Staphylococcus aureus genes using Transposon-Mediated Differential Hybridisation (TMDH).
BMC Genomics: 2009, 10;291
[PubMed:19570206] [WorldCat.org] [DOI] (I e)
M H Serres, S Gopal, L A Nahum, P Liang, T Gaasterland, M Riley
A functional update of the Escherichia coli K-12 genome.
Genome Biol.: 2001, 2(9);RESEARCH0035
[PubMed:11574054] [WorldCat.org] (I p)
Robert D Finn, Jody Clements, Sean R Eddy
HMMER web server: interactive sequence similarity searching.
Nucleic Acids Res.: 2011, 39(Web Server issue);W29-37
[PubMed:21593126] [WorldCat.org] [DOI] (I p)
Artem Cherkasov, Michael Hsing, Roya Zoraghi, Leonard J Foster, Raymond H See, Nikolay Stoynov, Jihong Jiang, Sukhbir Kaur, Tian Lian, Linda Jackson, Huansheng Gong, Rick Swayze, Emily Amandoron, Farhad Hormozdiari, Phuong Dao, Cenk Sahinalp, Osvaldo Santos-Filho, Peter Axerio-Cilies, Kendall Byler, William R McMaster, Robert C Brunham, B Brett Finlay, Neil E Reiner
Mapping the protein interaction network in methicillin-resistant Staphylococcus aureus.
J. Proteome Res.: 2011, 10(3);1139-50
[PubMed:21166474] [WorldCat.org] [DOI] (I p)
Stephan Fuchs, Daniela Zühlke, Jan Pané-Farré, Harald Kusch, Carmen Wolf, Swantje Reiß, Le Thi Nguyen Binh, Dirk Albrecht, Katharina Riedel, Michael Hecker, Susanne Engelmann
Aureolib - a proteome signature library: towards an understanding of staphylococcus aureus pathophysiology.
PLoS ONE: 2013, 8(8);e70669
[PubMed:23967085] [WorldCat.org] [DOI] (I e)
⊟Mediawiki Help and support[edit source]
⊟Wikimedia help[edit source]
Consult the User's Guide for information on using the wiki software.