From AureoWiki
Jump to: navigation, search

AureoWiki - The Wiki way of Staphylococcus aureus annotation[edit source]

AureoWiki is the first tool to support curated annotation especially of S.aureus biological entities in a Wiki based and community driven fashion. At the currently available and initial state AureoWiki provides information for a first draft of the S. aureus pan genome. Furthermore information on S. aureus strains COL, NCTC8325 (HG001), N315, Newman and USA300_FPR3757 is provided. These five strains were selected because of their detailed annotation (N315) or due to their importance for the scientific community (NCTC8325, COL). The main objective of AureoWiki is a step by step and community driven description and annotation of information for S. aureus biological entities and to force the process to make knowledge from lab based research available for the interested public. Some direct links may give an impression how a typical gene / protein page is structured: hld, eno, hlY, splA, pflB, saeR, sarA

Every interested scientist is invited to complement the shown information. This can occure by using the edit icon on top right of each page or the [Edit] links besides the subheaders or inside some chapters. Changes can be made as a registered user (you can setup your own account by using the button with the small down arrow) left hand side from the [login] Link on top the every page or by answering a Staphylococcus research field associated expert question before the saving process is initiated. Any user, who is experienced in editing Wikipedia articles will easily be able to add information to the AureoWiki. Aureowiki Pages show a lot of information from a variety of sources.

Information preceeded by a filled bullet point is feeded from a data base behind the AureoWiki and cannot be changed. This is because we propose, that especially sequence based infromation will not be changed very often and will be maintained in a more or less static database. An [Edit] link makes it possible to add new user generated AureoWiki content. User added information is indicated by white bullets. At some places we added placeholders to motivate the user for further interesting input (e.g. phenotype section of the strain specific pages) to complement the currently available information. Iteratively the AureoWiki pages will grow depending of the engagement of the community.

The Main Page presents a clear and easy to use Search mask. Here the interested user can search for locus tags, gene symbols or keywords. The [Getting Started] button brings you to this page.

The bar of register tabls of the AureoWiki-Pages gives the user the opportunity to switch between the Pan Genome Pages and the Strain specific Pages. The pages are linked. This occurs via an orthologue reference table that is used behind the scenes and was constructed within our work. What kind of information you can find in the AureoWiki and where this information comes from will be explained in the following chapters:

The S.aureus Pan Genome and the AureoWiki Pan Genome Pages[edit source]

For a comparative analysis of S.aureus strains an overall alignment of a public available set of 32 strain specific S. aureus genomes has been performed.

NC_007793 USA300_FPR3757 NC_010079 USA300_TCH1516 NC_002951 COL NC_007795 NCTC_8325
NC_016912 VC40 NC_009641 Newman NC_017331 TW20 NC_017341 JKD6008
NC_017347 T0131 NC_017351 11819_97 NC_002953 MSSA476 NC_003923 MW2
NC_013450 ED98 NC_017343 ECT_R_2 NC_009487 JH9 NC_009632 JH1
NC_017340 04_02981 NC_002745 n315 NC_002758 Mu50 NC_009782 Mu3
NC_016928 M013 NC_017349 LGA251 NC_007622 RF122 NC_017337 ED133
NC_016941 MSHR1132 NC_017338 JKD6159 NC_017763 HO_5096_0412 NC_002952 MRSA252
NC_017342 TCH60 NC_017673 71193 NC_017333 ST398 NC_018608 08BA02176

On this basis it became possible to construct a positionally corrected and all genes containing pan genome. This was used to assign homologous genes (at least 50% identity on DNA and 70% identitiy on protein level) of different S. aureus strains to so called unified pan genome gene IDs and pan genome gene symbols which may simplify the knowledge transfer between scientists working with different S. aureus strains.

Finding these unified pan genome based gene symbols was the major challenge in our efforts. In a first round the gene names with an intuitively to understand meaning were extracted from three S. aureus strains and assigned to the corresponding pan genome genes. The best annotated N315 strain followed by the COL, NCTC8325 and USA300_FPR3757 strains served as basis for initial pan genome gene naming. In a second round names with still no easy to understand symbols were specified by using the whole overall S.aureus alignment and iteratively complemented by gene names from S. epidermidis, B.subtilis and at last from names of other bacterial homologues stored in Genbank.

The result of our work is presented inside AureoWiki on the so called pan genome pages, which can be accessed by using the most left register tab on top of the AureoWiki pages. The pan genome pages present the following information:

A unique pan genome ID resulting from the order of all genes within the S.aureus pan genome, followed by the pan genome gene symbol and the source of its naming (from the N315, COL, NCTC8325 strain, other S.aureus or other bacterial sources). This is complemented by a list of genes descriptions that were extracted from all available S. aureus genomes and in part independent annotation efforts. Because of the in part independent annotations this is a suitable collection for possible functions especially for genes with vague functional annotation.

Within the positionally standardized pan genome you can find the strand and the pan genome start and end positions of the pan genome genes as well as the so called synteny blocks, those IDs define phylogenetically conserved genome regions. How often a gene has orthologues genes in the 32 analyzed S.aureus genomes which have been used for the pan genome construction is shown in the occurence entry. 100% means, that the gene is part of the core genome, lower values defines genes of the dispensible fraction of the pan genome and genes with occurences less than 6% maybe considered as orphans.

Meta Function Gene Functional Class (TIGRFam Main Role) Color Code
Envelope Cell envelope
Cellular processes Cellular processes
Metabolism Amino acid biosynthesis
Biosynthesis of cofactors, prosthetic groups, and carriers
Central intermediary metabolism
Energy metabolism
Fatty acid and phospholipid metabolism
Purines, pyrimidines, nucleosides, and nucleotides
Transport and binding proteins
Genetic Info processing DNA metabolism
Mobile and extrachromosomal element functions
Protein fate
Protein synthesis
Regulation Regulatory functions
Signal transduction
Unknown function Hypothetical proteins
Unknown function
RNAs RNA genes

A graphical display of the genes' of interest region of the N315, COL, NCTC8325 and USA300_FPR3757 strains highlight phylogenetically conserved genomic regions. The used colors encode for gene functional assignments (see table). Assigning gene sequences to function was performed by using a collection of Hidden Markov models of TIGRFams [1].

Some technical remarks on the pan genome construction:

The pan genome was determined on the basis of a MAUVE [2] based total genome alignment followed by an iterative assignment of single genes to the core genome by using GenomeRing [3] and OrthologyPredator (Linus Backert, unpublished data).

The pan genome alignment was developed in a very close collaboration at the group of Kay Nieselt in Tübingen (L.Backert, A.Hennig, A.Herbig, K.Nieselt).

The Strain Specific Pages[edit source]

Summary[edit source]

Much of the strain specific data comes from GenBank [4] and is complemented by data from many orther sources explained later. The gene summary condenses the most relevant information of the currently shown gene at the entree of the strain specific gene pages. This is enriched by the unique pan genome gene ID and the pan genome gene symbol.

Genome View[edit source]

For the AureoWiki we developed a space saving genome viewer reducing the genome information to a minimum. It's based on a vectorized (SVG) file format and is initially aligned to the position of the gene currently shown. The genome position in the genome browser can be changed by grabbing the slider or moving the mouse wheel on the left side of the genome browser. Clicking on gene arrows will bring you to the corresponding gene page, this makes a step by step or page by page walking through the genome possible. Colors encode the gene functional categories as explained above.

Gene[edit source]

General[edit source]

Here we display much of the basic information on S.aureus genes. This includes the strain specific gene coordinates, the gene length, the gene type (CDS, mRNA ...) and the strain specific gene symbol. Furthermore we exxtracted the description, the strand and the encoding replicon. This genbank based information was enrichted by essentiality information which we extracted from the DEG [5]; [6] that collected the essentiality information from [7]; [8].

Accession numbers[edit source]

Acc no's link to a selection of relevant external x-references. We plan a step by step extension of this section.

Phenotype[edit source]

The phenotype of mutants within the gene of interest normally can only be described in "free text". Availabiltiy of phenotype information is very limited in public database ressources. By this reason we would appreciate an active engagment of the S. aureus interested research community to share the knowledge with all interested scientists. Also we plan to extract phenotype information from the literature for genes we are interested in.

Sequence[edit source]

This shows the exact DNA sequence of a gene.

Protein[edit source]

General[edit source]

General information such as the protein symbol (three letter plus extension capital letters), description, sequence, length in aa were extracted from Genbank and implemented in AureoWiki. MW and pI have been calculated directly from the amino acid sequence.

Function[edit source]

Functional assignments or data that may help for a functional assingment have been generated and extracted in several ways:
The EC numbers were extracted directly from the Genbank sequence information. Additionally, ECs from a TIGRFam [9] functional classification have been inserted. EC numbers are complemented by the correspondig reaction equations and, if available, with enzyme names.

TIGRFams have been established by TIGR and are based on a Riley scheme for gene functional classification as first established for Escherichia coli by Riley [10]. TIGRfams deiscribe families of homologues proteins with defined functions in metabolism / physiology of bacteria, e.g. an enzymatic activity, a transporter with defined specificity, a protein with a structural function and many more. For each family a so called TIGRFam Hidden Markov Model (HMM), a general description of family specific sequence features has been established and can be used for a AI (artificial intellegence) such as HMMER [11] based assignment of sequences to TIGRFam functions. Because TIGRFams are systematically ordered in a hierarchical system of nested functions (enolase is part of glycolysis is part of energy metabolism) these data are displays in a tree like structure as known from computer file system hierarchies. Pressing the [+] sign gives access to all assignments to TIGRFams and the branches of the hierarchical system. For clarity reasons we inserted a meta level summarizing the TIGR main roles with the subordered sub roles and TIGRFam functions. The color codes within the genome viewer are based on these meta roles (orange brown - Metabolism, blue shades - Genetic information processing, green tones - Signal Processing, pink - Cell Wall and Envelope, red - Cellular Processes, black - RNA Genes, grey - Hypothetical and Proteins with no known function). The display of TIGRFams is ordered according their HMMER Scores giving a measure on the quality of the gene <-> TIGR HMM alignment. Especially for unknown-functions-genes also low scores may give a hint for a function.

TheSeed [12] is an open project for specialists' driven functional classification of DNA/protein sequences. As similar to TIGRFams this system is hierachically structured. Multiple assignments can be accessed by expanding the entry by pressing the [+] sign. Because the assignment is based on manual curation, no further quality measure is available.

As similarly described for TIGRFams the assignment of sequences to protein families PFAM [13] is based on HMMs. Due to the organization of Pfams in Clans the results have been displayed hierarchically with the Pfam clans on the top level and pfams as sublevels. The HMM scores reflect the quality of the assignments with best HMM scores shown on top.

Structure, modifications and interactions[edit source]

Information on domains, protein modifications and cofactors are still not included but will be incorporated soon.

Some results from N315 based studies on regulators and those effectors, those results have been stored in Regprecise [14] are shown. Effectors are biomolecules modulating the activity of other biomolecules, in this cas of regulators. The N315 results have been transfered to the other strains on the basis of sequence homology of the other S.aureus strains' regulators.

A study in strain MRSA252 [15], this strain is until now not included into AureoWiki revealed a set of several thousand protein interactions. By using orthologue mapping these interactions have been inclued for the available S.aureus strains and indicated correspondingly.

Localization[edit source]

There are several algorithms available predicting the (extra)cellular localization of proteins. We applied three (PSORTb [16], locateP[17], SignalP[18]) of them and present their reuslts. The TMHMM [19] Tool predicts the number of transmembrane domains, that is shown. Clicking [+] will reveal more detailed data which served as the basis to predict the proteins' theoretical localization.

Accession numbers[edit source]

A collection of accession numbers or cross references

Sequence[edit source]

Protein sequence from genbank

Peptides[edit source]

If there is experimental evidence, that the protein as a whole or one or more peptides of it are really synthesized or accumulated this is stated. For this purpose a collection of global proteome analysis papers have been analyzed. The Papers are given as references.

Expression and regulation[edit source]

Operon[edit source]

Operons have been imported from Microbes Online [20]. All operon members are clickable and give access to their AureoWiki pages.

Regulation[edit source]

Regulatory data have been extracted from RegPrecise [21]. Regprecise currently maintains exclusively S.aureus N315 regulatory data. By homology analysis the N315 data have been transfered to the other strains of AureoWiki, that is indicated.

Transcription pattern[edit source]

Protein Synthesis[edit source]

A large set of protein data from S.aureus COL comes from AureoLib [22] [23]

Half Life[edit source]

Biological material[edit source]

Other information[edit source]

References[edit source]

Citations[edit source]

A list of papers from where data have been inserted into the Aureowiki. These references are indicated on the Wikipages by the [Citations symbol]. By "mouse over" the corresponding citation will be shown or highlighted.

Further reading[edit source]

Here a list, especially of papers of general analyses concerning the presented gene / protein is shown.

References[edit source]

  2. Aaron C E Darling, Bob Mau, Frederick R Blattner, Nicole T Perna
    Mauve: multiple alignment of conserved genomic sequence with rearrangements.
    Genome Res.: 2004, 14(7);1394-403
    [PubMed:15231754] [] [DOI] (P p)
  3. A Herbig, G Jäger, F Battke, K Nieselt
    GenomeRing: alignment visualization based on SuperGenome coordinates.
    Bioinformatics: 2012, 28(12);i7-15
    [PubMed:22689781] [] [DOI] (I p)
  6. Hao Luo, Yan Lin, Feng Gao, Chun-Ting Zhang, Ren Zhang
    DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements.
    Nucleic Acids Res.: 2014, 42(Database issue);D574-80
    [PubMed:24243843] [] [DOI] (I p)
  7. R Allyn Forsyth, Robert J Haselbeck, Kari L Ohlsen, Robert T Yamamoto, Howard Xu, John D Trawick, Daniel Wall, Liangsu Wang, Vickie Brown-Driver, Jamie M Froelich, Kedar G C, Paula King, Melissa McCarthy, Cheryl Malone, Brian Misiner, David Robbins, Zehui Tan, Zhan-yang Zhu Zy, Grant Carr, Deborah A Mosca, Carlos Zamudio, J Gordon Foulkes, Judith W Zyskind
    A genome-wide strategy for the identification of essential genes in Staphylococcus aureus.
    Mol. Microbiol.: 2002, 43(6);1387-400
    [PubMed:11952893] [] (P p)
  8. Roy R Chaudhuri, Andrew G Allen, Paul J Owen, Gil Shalom, Karl Stone, Marcus Harrison, Timothy A Burgis, Michael Lockyer, Jorge Garcia-Lara, Simon J Foster, Stephen J Pleasance, Sarah E Peters, Duncan J Maskell, Ian G Charles
    Comprehensive identification of essential Staphylococcus aureus genes using Transposon-Mediated Differential Hybridisation (TMDH).
    BMC Genomics: 2009, 10;291
    [PubMed:19570206] [] [DOI] (I e)
  10. M H Serres, S Gopal, L A Nahum, P Liang, T Gaasterland, M Riley
    A functional update of the Escherichia coli K-12 genome.
    Genome Biol.: 2001, 2(9);RESEARCH0035
    [PubMed:11574054] [] (I p)
  11. Robert D Finn, Jody Clements, Sean R Eddy
    HMMER web server: interactive sequence similarity searching.
    Nucleic Acids Res.: 2011, 39(Web Server issue);W29-37
    [PubMed:21593126] [] [DOI] (I p)
  15. Artem Cherkasov, Michael Hsing, Roya Zoraghi, Leonard J Foster, Raymond H See, Nikolay Stoynov, Jihong Jiang, Sukhbir Kaur, Tian Lian, Linda Jackson, Huansheng Gong, Rick Swayze, Emily Amandoron, Farhad Hormozdiari, Phuong Dao, Cenk Sahinalp, Osvaldo Santos-Filho, Peter Axerio-Cilies, Kendall Byler, William R McMaster, Robert C Brunham, B Brett Finlay, Neil E Reiner
    Mapping the protein interaction network in methicillin-resistant Staphylococcus aureus.
    J. Proteome Res.: 2011, 10(3);1139-50
    [PubMed:21166474] [] [DOI] (I p)
  23. Stephan Fuchs, Daniela Zühlke, Jan Pané-Farré, Harald Kusch, Carmen Wolf, Swantje Reiß, Le Thi Nguyen Binh, Dirk Albrecht, Katharina Riedel, Michael Hecker, Susanne Engelmann
    Aureolib - a proteome signature library: towards an understanding of staphylococcus aureus pathophysiology.
    PLoS ONE: 2013, 8(8);e70669
    [PubMed:23967085] [] [DOI] (I e)

Mediawiki Help and support[edit source]

Wikimedia help[edit source]

Consult the User's Guide for information on using the wiki software.

Getting started[edit source]