Progress 05/01/14 to 08/31/17
Outputs Target Audience:The target audiences include the following groups: 1) researchers in agricuture fields that use various genomics approaches in their research and development 2) students and teachers that use genome data, especially genome data from agricuture-related species, as teaching materials The target audiences have been reached through meetings and conferences, journal papers and directoy software support. July 2014, Kansas City, JAM conference and joint NIFA PD meeting, we made poster presentation to other PDs and researchers in the field. Oct 2015, DC, Gathering On Functional Annotation of ANimal Genomes workshop (GO-FAANG), we presented our development to the meeting participants. Jan 2016, San Diego, Plant and Animal Genomics (PAG) conference, we did a poster presentation and also a live computer demostrataion to large audience. Jan 2016, San Diego, NIFA PD meeting, we made poster presentation to other PDs and researchers in the field. Jan 2017, San Diego, Plant and Animal Genomics (PAG) conference, we did a poster presentation. Jan 2017, San Diego, NIFA PD meeting, we gave a talk and made poster presentation to other PDs and researchers. 2016, The main paper describing the web portal that was published in BMC genomics and our web portal was fully released to the community. Some other papers are also published. Since the portal release, we supported many users in their data analysis through emails by solving possible software issues, adding reference data, adding new tools etc. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?One undergraduate student from University of California, San Diego has worked as an intern in this project. His major is computer science and bioinformatics. He was trained and gained more professional skills in genomics, computational biology, software development through this project. He is an co-author of in one of the publication. How have the results been disseminated to communities of interest?As described in the target Audience section, we demonstrated the web portal and software tools to communities of interest at the Plant and Animal Genomics (PAG) conferences (2016, 2017), NIFA PD meetings 2014-2017 and related workshop (2015) with oral, poster presentations and computer demo. We also tried to reach the communities through our own website and third party website (e.g. the Galaxy project web site, Youtube) by providing the communities with general information of our software and documentations. We published the paper describing our project in BMC genomics in 2016. We have communicated with our users through emails and provided user support for researchers that used our web portal. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
The goal of this project is to develop a web portal with integrated tools for RNA-seq based gene expression analysis for agriculturally important animal species. We originally proposed 18 tasks under these 3 major objectives: 1) improve genome annotation of agriculturally important animal species, including (but not limiting to) cattle, pig, chicken, turkey, horse, sheep, and goat as well as catfish; 2) develop and integrate needed bioinformatics tools and pipelines, visualization interfaces, and statistical methods; 3) build a web portal that enable RNA-seq based transcriptomics analysis in aforementioned animal species. These tasks are Task 1.1: Integrate existing genomic data for agricultural animal, Task 1.2: Set update methods for regularly updating public genomic data, Task 1.3: Search and obtain third party genomic data, Task 1.4: In-house RNA-seq sequencing, Task 1.5: Improve genome assembly and annotation, Task 1.6: Distribute integrated and improved genome data, Task 2.1: Download, install, configure and test individual computational tools, Task 2.2: Parallelize the algorithms so that they can run on a computer cluster, Task 2.3: Implement a configurable RNA-seq read mapping pipeline, Task 2.4: Implement a configurable pipeline for de novo RNA-seq assembly, Task 2.5: Implement a configurable pipeline for genome dependant RNA-seq assembly, Task 2.6: Implement a post-analysis pipeline, Task 2.7: Distribute integrated software tools, Task 3.1: Design the complete web portal and backend systems, Task 3.2: Implement web interface to run the configurable pipelines, Task 3.3: Implement web interface for standalone tools, Task 3.4: Implement programmable web services, Task 3.5: Distribute integrated software tools and user support. Within the first part of the project, we finished tasks 1.1, 1.2, 1.3, 1.4, 2.1, 2.2, 2.3, 2.4 and 3.1 as planned. For Task 1.2, we have used ENSEMBL as the primary source for genome data and have set up regular update schedule. For Task 1.3, we downloaded 3rd party goat genome and annotation from Kunming Institute of Zoology in China (http://goat.kiz.ac.cn/GGD/) and International Goat Genome Consortium (http://www.goatgenome.org/home.html). For Task 1.4, we generated additional RNA-seq data. Over 120 RNA-seq samples were sequenced with >2GB /sample. The sequenced species include Caprine, Ovine, Bovine, Porcine and Human from various tissue types. We implemented read mapping pipeline and assembly pipelines (tasks 2.3 & 2.4). In order to improve the compute infrastructure and reduce the cost, we have adopted Amazon cloud resources as development and production environment. We further developed the workflow engine to run under cloud. This significantly improved our efficiency in computer hardware and software maintained efforts, through utilizing modern computer cloud management software Starcluster. For task 3.1, after extensive testing and validation, we selected Galaxy to implement our web portal and used our in-house workflow tools for pipeline management. In the mid term of the project, we continued tasks 1.5. 2.5, 2.6, 3.2, 3.3. For task 1.5, all the genome data we downloaded and further processed, including the formatted and indexed genome data with bwa, bowtie2, STAR, RESM, BLASTN, BLASTP and IGV were made available for download through both our web and FTP servers. We spent most our efforts in pipeline and portal development including task 2.5, genome dependent RNA-seq assembly; task 2.6 post-analysis pipeline; task 3.2 & 3.3 web interface for workflow and standalone tools. We further integrated the pipelines and tools and reorganize them into three end-to-end workflows. The first workflow utilizes Tuxedo (Tophat, Cufflink,Cuffmerge and Cuffdiff suite of tools). The second workflow deploys Trinity for de novo assembly and uses RSEM for transcript quantification and EdgeR for differential analysis. The third combines STAR, RSEM, and EdgeR for data analysis. All these workflows support multiple samples and multiple groups of samples and perform differential analysis between groups in a single workflow job submission. In the final stage of the project, we continued all the recurring tasks, including the regular updates of genomic data. Since the publication of our BMC genomics paper, lots of users started to utilize our web portal and download the data sets. We put major efforts in user support, web portal maintenances. We improved user data upload interface. Throughout the project, we continued the recurring tasks such as reference database update (task 1.2) and user support (task 3.5). We also made efforts in sharing the software to the public (task 3.5) with the release of the web portal. For task 1.5 (improve genome annotation), this project contributed to this goal by providing the workflow tools to the communities and providing supports to users. Besides original proposed goals, we made significant new developments. Besides the three major workflows, Tuxedo, Trinity and STAR, we added another workflow based on HISAT. All these workflows support multiple samples and multiple groups of samples and perform differential analysis between groups in a single workflow job submission.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2016
Citation:
Weizhong Li, R. Alexander Richter, Yunsup Jung, Qiyun Zhu and Robert W. Li. Web-based bioinformatics workflows for
end-to-end RNA-seq data computation and analysis in agricultural animal species. BMC Genomics (2016) 17:761. DOI
10.1186/s12864-016-3118-z. PMID: 27678198, PMCID:PMC5039875.
- Type:
Journal Articles
Status:
Published
Year Published:
2016
Citation:
Robert W. Li, Weizhong Li, Jiajie Sun, Peng Yu, Ransom L. Baldwin, Joseph F. Urban. The effect of helminth infection on the microbial composition and structure of the caprine abomasal microbiome. Scientific Reports (2016) 6:20606
|
Progress 05/01/16 to 04/30/17
Outputs Target Audience:The target audiences include the following groups: 1) Researchers in agriculture fields that use various genomics approaches in their research and development 2) Students and teachers that use genome data, especially genome data from agriculture-related species as teaching materials In this period, we demonstrated the web portal we developed to researchers and students at the Plant and Animal Genomics (PAG) conference (San Diego, Jan 2017). We did a poster presentation in this conference to a large audience. The PD made oral presentation and poster presentation to colleagues in NIFA's PD meeting (San Diego, Jan 2017). The paper describing the web portal that was published last year also reached our audiences and many researchers started to utilized the web portal. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?
Nothing Reported
How have the results been disseminated to communities of interest?We demonstrated the web portal and software tools to communities of interest at the Plant and Animal Genomics (PAG) conference (San Diego, Jan 2017). We did a poster presentation to a large audience. We also presented our software to researchers in NIFA's PD meeting. We tried to reach the communities through our own website and third party website (e.g. the Galaxy project web site, Youtube) by providing the communities with general information of our software and documentations. We published the paper describing our project in BMC genomics in 2016. We have communicated with our users through emails and provided user support for researchers that used our web portal. What do you plan to do during the next reporting period to accomplish the goals?After this reporting period, we will have 4 months before the end of the project. For the rest 4 months, we will focus on supporting the portal users and making the software more accessible to the community. We are currently working on a few manuscripts, and plan to finish these manuscript to further increase the impact of the project.
Impacts What was accomplished under these goals?
In this project period, we continued all the recurring tasks, including the regular updates of genomic data. Since the publication of our BMC genomics paper, lots of users started to utilize our web portal and download the data sets. We put major efforts in user support, web portal maintenances. We improved user data upload interface. Besides the three major workflows, Tuxedo, Trinity and STAR, we added another workflow based on HISAT. All these workflows support multiple samples and multiple groups of samples and perform differential analysis between groups in a single workflow job submission.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2016
Citation:
Weizhong Li, R. Alexander Richter, Yunsup Jung, Qiyun Zhu and Robert W. Li. Web-based bioinformatics workflows for end-to-end RNA-seq data computation and analysis in agricultural animal species. BMC Genomics (2016) 17:761. DOI 10.1186/s12864-016-3118-z. PMID: 27678198, PMCID:PMC5039875.
|
Progress 05/01/15 to 04/30/16
Outputs Target Audience:The target audiences will include the following groups: 1) researchers in agricuture fields that use various genomics approaches in their research and development 2) students and teachers that use genome data, especially genome data from agricuture-related species, as teaching materials In this period, we demonstrated the web portal we developed to researchers and students at the Plant and Animal Genomics (PAG) conference (San Diego, Jan 2016). We did a poster presentation and also a computer demostrataion in this conferences to a large audience. During the last year, we showed our development to colleagues in PAG meeting, in NIFA's PD meeting (San Diego, Jan 2016), and in Gathering On Functional Annotation of ANimal Genomes workshop (GO-­FAANG, DC October 2015). Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?One undergraduate student from University of California, San Diego has worked as an intern in this project last year. His major is computer science and bioinformatics. He was trained and gained more professional skills in genomics, computational biology, software development through this project. How have the results been disseminated to communities of interest?We demonstrated the web portal and software tools to communities of interest at the Plant and Animal Genomics (PAG) conference (San Diego, Jan 2016). We did a poster presentation and also a computer demostrataion in this conferences to a large audience. We also presented our software to researchers in NIFA's PD meeting, and Animal genomics annotation workshop. We tried to reach the communities through our own website and third party website (e.g. the Galaxy project web site, Youtube) by providing the communities with general information of our software and documentations. What do you plan to do during the next reporting period to accomplish the goals?We will continue maintaining and developing the web portal and the underlying software tools for RNA-seq data analysis. We will provide communities with software support. We will continue updating the genome reference databases to serve the computational pipelines with the web portal. We will closely watch for new animial genomes being sequenced and add them into our portal when them become available. We will add new data analysis tools to the web portal according to the need and feeback from user communities.
Impacts What was accomplished under these goals?
Impact of the project The software tools and pipelines for RNA-seq data analysis for animal species have been developed and released through our web portal for public use. It provides researchers world-wide in animal field with effective tools to utilize genomics approach in animal study, research and development. Accomplishments of the project We originally proposed 18 tasks under these 3 major objectives. In the previous report periods, we reported the completion of several tasks, including task 1.1, integrate existing agricultural animal genomic data; task 1.2, quarterly update of genomic data; Task 1.3, obtain 3rd party genomic data; Task 1.4, in-house RNA-seq sequencing; task 2.1, master individual computational tools; task 2.2, parallelize of tools, task 2.3 & 2.4, read mapping pipeline and assembly pipeline; and task 3.1, portal design. In this project period, we continued the recurring task 1.2, quarterly update of genomic data. All the animal genome data are up to date including chicken, cow, duck, goat, pig, horse, rabbit, sheep, turkey, as well as several other model organisms. All the genome data we downloaded and further processed, including the formatted and indexed genome data with bwa, bowtie2, STAR, RESM, BLASTN, BLASTP and IGV. All these data are available for download through both our web and FTP servers (task 1.6). We spent most our efforts in pipeline and portal development including task 2.5, genome dependent RNA-seq assembly; task 2.6 post-analysis pipeline; task 3.2 & 3.3 web interface for workflow and standalone tools. We further integrated the pipelines and tools and reorganize them into three end-to-end workflows. The first workflow utilizes Tuxedo (Tophat, Cufflink, Cuffmerge and Cuffdiff suite of tools). The second workflow deploys Trinity for de novo assembly and uses RSEM for transcript quantification and EdgeR for differential analysis. The third combines STAR, RSEM, and EdgeR for data analysis. All these workflows support multiple samples and multiple groups of samples and perform differential analysis between groups in a single workflow job submission. This largely reduces the time and efforts for users to use our web portal. To further improve the performance of web portal and reduce the compute cost for large scale RNA-seq data processing, we significantly improved our computer cyber infrastructure under Amazon cloud environment. We utilized Galaxy and Starcluster software tools and also further developed our in-house workflow engine.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2016
Citation:
Robert W. Li, Weizhong Li, Jiajie Sun, Peng Yu, Ransom L. Baldwin, Joseph F. Urban. The effect of helminth infection on the microbial composition and structure of the caprine abomasal microbiome. Scientific Reports (2016) 6:20606
- Type:
Journal Articles
Status:
Under Review
Year Published:
2016
Citation:
Weizhong Li, R. Alexander Richter, Yunsup Jung, Robert W Li. Web-based bioinformatics workflows for end-to-end RNA-seq
data computation and analysis in agricultural animal species. BMC Genomics. under review
|
Progress 05/01/14 to 04/30/15
Outputs Target Audience: During the NIFA Joint Animal Nutrition, Growth and Lactation; Feed Efficiency; and Animal Genomics PD meeting in conjunction with the 2014 JAM in July 2014, we made poster presentation on the project, pipeline and computational tools to other PDs and researchers in the field. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?
Nothing Reported
How have the results been disseminated to communities of interest?
Nothing Reported
What do you plan to do during the next reporting period to accomplish the goals? We will continue the project according to our original development plans and timelines to implement the software and portal, conduct the research and make proposed deliverables.
Impacts What was accomplished under these goals?
We originally proposed 18 tasks under these 3 major objectives and the time frame of development. In the last report period, we reported the completion of several tasks: including task 1.1, integrate existing agricultural animal genomic data; task 1.2, quarterly update of genomic data; task 2.1, master individual computational tools; task 2.2, parallelize of tools, task 2.3 & 2.4, read mapping pipeline and assembly pipeline; and task 3.1, portal design. In this project period, we continued the tasks that are recurring and also performed new tasks. Task 1.2, quarterly update of genomic data. We have been used ENSEMBL as the primary source for genome data and have updated quarterly. Task 1.3, obtain 3rd party genomic data. Goat genome and annotation were downloaded from Kunming Institute of Zoology in China (http://goat.kiz.ac.cn/GGD/) and International Goat Genome Consortium(http://www.goatgenome.org/home.html ). We explored the literatures and repositories for available animal RNA-seq data. We have found about 2000 animal RNA-seq runs from NCBI SRA and have download several datasets and will use these for other tasks. Task 1.4, in-house RNA-seq sequencing. Besides the RNA-seq data from public sources and the data that have been generated in our previous studies, we have generated additional RNA-seq data. Over 120 RNA-seq samples were sequenced with >2GB /sample. The sequenced species include Caprine, Ovine, Bovine, Porcine and Human from various tissue types. Task 2.3 & 2.4, read mapping pipeline and assembly pipeline. In order to make these pipelines more robust and more scalable, after extensive testing and development, we have adopted Amazon cloud resources as development and production environment. We further developed the workflow engine to run under cloud. This significantly improved our efficiency in computer hardware and software maintained efforts, through utilizing modern computer cloud management software Starcluster. Task 3, portal design. We continued to optimize web portal development by utilizing new public software tools, which have been rapidly evolving. We found that Galaxy provides more extensive features and functions in web portal implementation, data sharing, user management than the some of the old framework we used earlier (e.g. CAMERA cyber infrastructure). We have tested Galaxy software in our project and have applied Galaxy as the main portal.
Publications
- Type:
Journal Articles
Status:
Under Review
Year Published:
2015
Citation:
R Li, S Wu, C Li, W Li, and S Schroeder. Splice variants and regulatory networks associated with host resistance to the intestinal worm Cooperia oncophora in cattle. Veterinary Parasitology, under review
|