Progress 07/01/12 to 06/30/17
Outputs Target Audience:The main target audience for our work is research scientists working on molecular functions of microbes and microbiomes. We also hope to have reached scientists interested in evaluating functional effects of genomic variants regardless of organism Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?In the course of this project one Ph.D. student has defended his thesis on the basis of his fusion and mi-faser work. One more Ph.D. thesis in collaboration with a group at Technical University of Munich is in progress. Multiple undergrads were involved in developing various components of the tools (including two undergrads, whose names appear on resulting publications). How have the results been disseminated to communities of interest?All tools were presented at relevant conferences and appeared in corresponding publications. All are also available online for free to non-commercial entities at services.bromberglab.org What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
Overall, we accomplished all goals of this project -- but with different names of resources and slightly different implementations of the main ideas. The summery of the accomplishments are below: 1. We built a predictor, fun-TRP (functional toggle rheostat predictor), for finding functional residues in the proteins. We identified two types of functional residues -- those that are critical for function (on/off toggle switches) and those that re necessary for fine tuning the activity (rheostats). We showed that our approach facilitates prediction of functional effects of variants in protein sequences, allows tracing evolutionary history of molecular functions, and facilitates targeted synthetic biology construction of specific function in enzymes. 2. Rather than focusing on functional signatures of proteins, we opted for building a predictor of functional similarity between proteins. This one was optimized to compare short peptides (which may or may not carry functional signatures) to full-length proteins to their identify functional similarity of peptide-parent proteins. This approach makes our method, faser (functional annotation of sequence reads), applicable for the analysis of metagenomic data, potentially leading to discovery of new functions. 3. We built fusionDB (functional-- a database of functional clusters of all bacterial proteins by first comparing them for functional similarity and then clustering the network of functionally similar proteins. This approach allows for mapping new microorganism sequences into a framework of older/already-annotated bacteria to better understand molecular functionality encoded in the new genomes.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2017
Citation:
Zhu, C., Mahlich, Y., Miller, M., Bromberg, Y. (2017) fusionDB: assessing microbial diversity and environmental preferences via functional similarity networks. Database: http://services.bromberglab.org/fusiondb. Nucleic Acids Research, gkx1060.
- Type:
Journal Articles
Status:
Published
Year Published:
2017
Citation:
Miller, M., Bromberg, Y., Swint-Kruse, L. (2017) Variant effect prediction methods fail for rheostat positions. Nat Scientific Reports 7, 41329
- Type:
Journal Articles
Status:
Published
Year Published:
2017
Citation:
Zhu, C., Miller, M., Marpaka, S., Vaysberg, P., Ruhlemann, M.C., Wu, G., Heinsen, F.A., Tempel, M., Zhao, L., Lieb, W., Franke A., Bromberg, Y. (2017) Functional sequencing read annotation for high precision microbiome analysis. Nucleic Acids Research, gkx1209
|
Progress 10/01/15 to 09/30/16
Outputs Target Audience:A wide range of microbiologists interested in accessing functionality of their proteins and researchers interested in type III secretion systems and bacterial pathogenicity. We also organized a microbiology workshop at the Pacific Symposium on Biocomputing, which likely attracted a diverse audience. Changes/Problems:We have decided to go away from functional signature approach to functional annotation of proteins and to switch to a more reliable alignment-based technique, which allows for exploration of microbiome contents. This is a major change in algorithms, but conceptually a very similar approach as to what has been originally described in the proposal. What opportunities for training and professional development has the project provided?There is a graduate student and two visiting research scientists working on the project. The graduate student will use this work to defend his thesis at the end of this academic year. The research scientists will be applying the developed tools in their home labs in Germany, helping advance their own research. How have the results been disseminated to communities of interest?Via published journal articles and the organized workshop mentioned above (as well as via our lab website) What do you plan to do during the next reporting period to accomplish the goals?We will continue with development and refinement of the software andthe database.
Impacts What was accomplished under these goals?
We have have demonstrated that there are two types of functionally important positions in the protein sequence -- rheostats and toggles. We have demonstrated that these have different effects on function when mutated and recognizing the type of position prior to further analysis is both very necessary and not currently accomplished by any available method. We are in the process of building a computational classifier to recognize these types of positions. We are also continuing with our efforts to build read-based annotation software and to create a database of functions available to microbes and microbiomes in our training sets. These will be accessible via an interface that is also currently under development
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2016
Citation:
Greene, C.S., Foster, J.A., Stanon, B.A., Hogan, D.A., Bromberg, Y. (2016) Computational approaches to study microbes and microbiomes. Pac Symp Biocomput 2016. :557-567
- Type:
Journal Articles
Status:
Published
Year Published:
2016
Citation:
Goldberg, T., Rost, B., Bromberg, Y. (2016). Computational prediction shines light on type III secretion origins. Nat Sci Rep, 6, 34516.
- Type:
Journal Articles
Status:
Published
Year Published:
2016
Citation:
Rost, B., Radivojac, P, Bromberg, Y. (2016) Protein function in precision medicine: deep understanding with machine learning. FEBS Lett. 590(15): p. 2327-41
|
Progress 10/01/14 to 09/30/15
Outputs Target Audience:Microbiologists interested in analyzing functional similarity of bacteria as encoded by bacterial proteins Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?
Nothing Reported
How have the results been disseminated to communities of interest?Via a publication and the database referenced above What do you plan to do during the next reporting period to accomplish the goals?We will continue with development and refinement of the database
Impacts What was accomplished under these goals?
We built a new system for recognizing functional similarity of microorganisms on the basis of the proteins that their genomes encode. While not a direct goal for this project, it is a useful contribution to the goals. This system is located at http://bromberglab.org/databases/fusiondb and was developed by Chengsheng Zhu at the Bromberg Lab. fusionDB currently contains 1,374 bacterial genomes annotated with temperature, oxygen requirement and habitat metadata. Bacterial proteins are assigned to functional clusters, and each organism is thus mapped to a set of functions. fusionDB allows searching for organism names combined with specific environment metadata, and creates an XML-formatted network file (fusion+ network, C. Zhu et. al. ) of selected organisms that can be visualized by Gephi. In fusion+ networks, organisms cluster on the basis of shared function, which allows for exploration of the specific environmental factor(s) that drives microbial diversification. It offers a fast and simple way to detect pan-function (all functions of a set of organisms) and core-function (all functions found in every organism of a set) repertoires, as well as traces of horizontal gene transfer.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2015
Citation:
Zhu C, Delmont TO, Vogel TM, Bromberg Y (2015) Functional Basis of Microorganism Classification. PLoS Comput Biol 11(8): e1004472. doi: 10.1371/journal.pcbi.1004472
|
Progress 10/01/13 to 09/30/14
Outputs Target Audience: Protein scientists, interested in in-depth annotation of their proteins or peptides Changes/Problems: We found more promising directions in identifying per-residue protein activity from structural alignments. We hope to be able to transition the information we learn from these alignments into a sequence-based annotation, but we have not yet developed a framework as to how this will be done. This new direction will contribute to the overall goals of the project, but may prevent us from completing the aims as described. We are also experiencing issues recruiting students/post-docs interested in and qualified for computational tool development as described in this project. We hope to overcome this challenge by reaching out to bioinformatics communities world-wide, but, at this point, most of the project work is accomplished via collaborations with other labs. What opportunities for training and professional development has the project provided?
Nothing Reported
How have the results been disseminated to communities of interest? Results have been disseminated via publications, conferences, and via an informal report at the VarI-SIG'14 (former SNP-SIG), a meeting co-organized by PI-Bromberg and attended by >100 computational biologists in Boston, Jul 2014. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
Further work was done in development of the metric (SaHLe) for measuring functional similarity of protein structural folds. Once functionally similar folds are identified, we hope to be able to transfer this knowledge into sequence-based identification of active sites and motifs. This work is not along the originally outlined project goals -- but, we believe, will contribute signficantly to the ultimate goals of the project -- per residue classification of protein activity. Additionally, we implemented the visualization of SNAP predictions for whole protein in silico mutagenesis as part of the PredictProtein pipeline. The pipeline is freely available to all academic researchers and could be used for in depth study of specific proteins.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2014
Citation:
Yachdav, Guy, et al. "PredictProteinan open resource for online prediction of protein structural and functional features." Nucleic acids research (2014): gku366.
|
Progress 10/01/12 to 09/30/13
Outputs Target Audience:
Nothing Reported
Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?
Nothing Reported
How have the results been disseminated to communities of interest? Results have been disseminated via publications in major journals, and via an informal report at the SNP-SIG'13, a meeting co-organized by PI-Bromberg andattended by >100 computational biologists in Berlin, Jul 2013. What do you plan to do during the next reporting period to accomplish the goals? We will continue developing our methods for identifying protein functional sites.
Impacts What was accomplished under these goals?
Significant progress was made for the goals of aim 1 -- we have developed a method for annotating protein functional site residues. We als started working towards identifying the functional site signatures -- so far in structure and limited to metal containing proteins, but moving towards broader sequence-based annotation.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2013
Citation:
Senn, S., Nanda, V., Falkowski, P., and Bromberg, Y. (2013). Function-based assessment of structural similarity measurements using metal co-factor orientation. Proteins.
- Type:
Journal Articles
Status:
Published
Year Published:
2013
Citation:
Bromberg, Y., Kahn, P.C., and Rost, B. (2013). Neutral and weakly nonneutral sequence variants may define individuality. Proc Natl Acad Sci U S A 110, 14255-14260.
- Type:
Book Chapters
Status:
Published
Year Published:
2013
Citation:
Bromberg, Y. (2013). Chapter 15: disease gene prioritization. PLoS Comput Biol 9, e1002902.
- Type:
Journal Articles
Status:
Published
Year Published:
2013
Citation:
Bromberg, Y. (2013). Building a Genome Analysis Pipeline to Predict Disease Risk and Prevent Disease. J Mol Biol 425, 3993-4005.
- Type:
Journal Articles
Status:
Published
Year Published:
2013
Citation:
Hecht, M., Bromberg, Y., and Rost, B. (2013). News from the Protein Mutability Landscape. J Mol Biol.
- Type:
Journal Articles
Status:
Published
Year Published:
2013
Citation:
Capriotti, E., Altman, R.B., and Bromberg, Y. (2013). Collective judgment predicts disease-associated single nucleotide variants. BMC Genomics 14 Suppl 3, S2.
|
Progress 10/01/11 to 09/30/12
Outputs OUTPUTS: As a direct result of my research for the FuSeS project, the annual SNP-SIG meeting that I co-chair (2012 edition, Long Beach, CA) had a specific subfocus on impact of mutations in functionally significant sites. PARTICIPANTS: Yana Bromberg -- Principle Investigator, Rutgers Chris Rusnak -- Undergraduate student, Rutgers; Data extraction and data model building. Burkhard Rost -- non-formal collaborator, Technical University of Munich Christian Schaefer -- Co-supervised Ph.D. Student, in the lab of Dr. Burkhard Rost, Technical University of Muinch; Data collection, manuscript write-up TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.
Impacts The first stage of the FuSeS approach requires identifying functionally significant residues in protein sequences. We started this project with the assumption that, in human proteins, sequence positions often altered in disease are likely functionally significant. We further looked for a way to quantify and augment this significance with computational methods (which could be further used in non-human organisms). Relatively few human mutations, reported in databases such as OMIM, PMD and Swiss-Prot, are experimentally assessed for their disease causing impact. We made computational predictions of functional impact of disease-annotated mutations and non-disease variants collected from these databases using SNAP, our in-house neural network based annotation program. Most disease-causing mutations were predicted to severely impact protein function. In fact, the raw prediction scores for disease-causing mutations were higher than the scores for the function-altering data set originally used for SNAP development. This finding means that, on average, disease-mutations are severely deleterious to the affected protein function, as indicated by the absolute value of the SNAP score. The neutral SNAP score enrichment in the set of nsSNPs not currently linked to disease suggests that strong disease associations among these are unlikely. Our research suggests that (1) disease-causing nsSNPs are well identified by SNAP, even though it was developed to predict the impact of mutations on protein function and (2) screening naturally-occurring variants (whether in wild-type or phenotypically different organisms) for high SNAP scores suggests initial filtering for functionally significant sites. Using a gold standard set of functional site residues, extracted from the Catalytic Site Atlas and Swiss-Prot, we will compare the computational site predictions made in this manner to the approach we proposed in the initial write-up and are currently developing (i.e. in silico mutagenesis). We expect our in silico mutagenesis technique to outperform this baseline method. (Note that SNAP scores for all mutants used in this study are available via SNPdbe, a database developed previously at http://www.rostlab.org/services/snpdbe/)
Publications
- Schaefer, C., Bromberg, Y., Achten, D., and Rost, B. (2012) Disease-related mutations predicted to impact protein function. BMC Genomics 13 Suppl 4, S11
|
|