Source: UNIV OF MINNESOTA submitted to
BIOINFORMATICS TOOLS AND METHODOLOGY FOR GENOMIC APPLICATION TOWARDS LIVESTOCK IMPROVEMENT
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
REVISED
Funding Source
Reporting Frequency
Annual
Accession No.
1005870
Grant No.
(N/A)
Project No.
MIN-16-041
Proposal No.
(N/A)
Multistate No.
(N/A)
Program Code
(N/A)
Project Start Date
Feb 19, 2015
Project End Date
Sep 30, 2018
Grant Year
(N/A)
Project Director
Da, YA, .
Recipient Organization
UNIV OF MINNESOTA
(N/A)
ST PAUL,MN 55108
Performing Department
Animal Science
Non Technical Summary
Genomic data with SNP genotypes and sequence information are growing at a fast pace due to the rapidly decreasing DNA sequencing cost and the industry acceptance of genomic selection. This data growth presents unprecedented opportunity for genomic application towards livestock genetic improvement but also presents tremendous bioinformatic challenges of big data analysis. The overall objective is to deliver community bioinformatics tools for big data analysis of genomic discovery and application towards livestock genetic improvement with the capability to address the most challenging data analysis problems and to address the needs of broad applications in research, education and animal breeding practice. Specific aims supporting the overall objective include the following: 1)Develop mixed model methods for genomic prediction and variance component estimation of complex genetic effects, and 2)Develop computing tools for genomic prediction and variance component estimation of complex genetic effects.
Animal Health Component
0%
Research Effort Categories
Basic
30%
Applied
70%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
30374101080100%
Knowledge Area
303 - Genetic Improvement of Animals;

Subject Of Investigation
7410 - General technology;

Field Of Science
1080 - Genetics;
Goals / Objectives
The overall objective of this proposed research is to deliver community bioinformatics tools for big data analysis of genomic application towards livestock genetic improvement with the capability to address the most challenging data analysis problems and to address the needs of broad applications in research, education and animal breeding practice. Specific aims supporting the overall objective include:Aim 1: Develop mixed model methods for genomic prediction and variance component estimation of complex genetic effects.Aim 2: Develop computing tools for genomic prediction and variance component estimation of complex genetic effects.
Project Methods
A traditional quantitative genetics model with complex genetic effects will be used as the genomic model for deriving mixed model methods for genomic application. The computer package implementing the new mixed model methods will be developed using Message Passing Interface (MPI) parallel computing.

Progress 10/01/16 to 09/30/17

Outputs
Target Audience:Research community of animal genetics and genomics, animal breeding industry. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?A 50% graduate student How have the results been disseminated to communities of interest?Jour article and conference presentations. What do you plan to do during the next reporting period to accomplish the goals?Continue work on Aims 1 and 2.

Impacts
What was accomplished under these goals? The work on Aim 1 focused on data analysis using our methods and computing tools for analyzing genomic data. We spent much of our research time analyzing the USDA Holstein genomic data, the largest of its kind in the world, for genomic discovery and prediction. We conducted a large-scale genome-wide association study (GWAS) using 294,079 cows, the largest sample size ever used for animal GWAS. The GWAS results had rich information about genetic variants affecting dairy traits and for understanding the genetic mechanism of those genetic variants. We then integrated the GWAS results with the results of selection signature analysis from a previously NIFA funded project, and this integrated analysis provided novel and unique findings about how genetic selected affected the genome and genetic variants affecting dairy traits. We conducted preliminary analysis of genomic selection using our novel method of haplotype analysis and our own computing tool. We produced about half dozen test samples with small to medium sample sizes and SNP densities from the 60K to HD, and we obtained encouraging results. In addition to the Holstein genomic data, we completed and published the results of joint genomic prediction and GWAS using our own methods and computing tools. The work on Aim 2 focused on developing a computing pipeline for genomic prediction and discovery using our novel method of haplotype analysis, and solid progress was made on this computing pipeline.

Publications

  • Type: Journal Articles Status: Published Year Published: 2017 Citation: an,C., Z. Wu, J. Ren, Z. Huang, D. Liu, X. He, D. Prakapenka, R. Zhang, N. Li, Y. Da, and X. Hu. 2017. Genome-wide association study and accuracy of genomic prediction for teat number in Duroc pigs using genotyping-by-sequencing. Genetics Selection Evolution 49:35.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2017 Citation: Dzianis Prakapenka, Li Ma and�Yang Da. Chromosome-specific genomic relationships using haplotypes for genomic prediction and variance component estimation. Poster presentation at Plant and Animal Genome XXV, January 13-18, 2017, San Diego.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2017 Citation: Cheng Tan, Dzianis Prakapenka, Li Ma, Zhenfang Wu, Xiaoxiang Hu, Yang Da. JBLUP: the joint best linear unbiased prediction using BLUP and GBLUP solutions. Poster presentation at Plant and Animal Genome XXV, January 13-18, 2017, San Diego.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2017 Citation: Li Ma, Jicai Jiang, Dzianis Prakapenka, Melvin E. Tooker, Paul M. VanRaden, John B. Cole, Yang Da. Large-scale GWAS reveals reason for the DGAT1 significance and identifies new SNP effects in Holstein cattle. Poster presentation at Livestock High-Throughput Phenotyping and Big Data Analytics. November 13-14, 2017, Beltsville MD
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2017 Citation: J. Jiang, L. Ma, D. Prakapenka, M.E. Tooker, P.M VanRaden, J.B. Cole, Y. Da. Extreme antagonistic pleiotropy effects of DGAT1 on fat, milk and protein yields. Conference proceedings article accepted by World Congress on Genetics Applied to Livestock Production, February 11-16, 2018, Auckland, New Zealand.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2017 Citation: Li Ma, Yang Da, Joao Durr, Paul Vanraden, and John Cole. Analytics of the large U.S. dairy cattle genomics and phenotype database. Oral presentation at Livestock High-Throughput Phenotyping and Big Data Analytics November 13-14, 2017, Beltsville MD
  • Type: Conference Papers and Presentations Status: Other Year Published: 2017 Citation: Yang Da. Selection limits in dairy cattle. Invited talk at The 19th Conference on Animal Breeding and Genetics, October 14-16, Nanjing, China.


Progress 10/01/15 to 09/30/16

Outputs
Target Audience:Research community of animal genetics and genomics, animal breeding industry. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The project supported one BICB graduate suddent,and hosted two visiting graduate students and a visting scholar on their own financial support. How have the results been disseminated to communities of interest?Journal publications and conference presentations. What do you plan to do during the next reporting period to accomplish the goals?Continue work on Aims 1 and 2.

Impacts
What was accomplished under these goals? The work on Aim 1 was the development of a new method for genomic prediction using genomic and pedigree information simultaneously as an alternative to single-step genomic prediction. This new approach was named the joint best linear unbiased prediction using BLUP and GBLUP solutions abbreviated as 'JBLUP'. Considerable amount of work on the evaluation ofJBLUP was done using swine genomic selection data. The results led to a presentationsubmitted to and accepted bythe 2017 Plant and Animal Genomemeeting. The work on Aim 2 was in two areas: the development of a computing pipeline for genomic prediction and estimation using haplotypes of genome-wide SNP markers, and the development of computing tools to implement JBLUP. The GVCHAP program was the main program of the computingpipeline for haplotype analysis and is fully functional. A number of supportingand utility programs were developed. The GVCHAP program was extensively tested using the Framingham Heart Study data with 500,000 SNPs. The JBLUP program was close to being completed and was used to generate some results for the 2017 PAG presentation on the JBLUP results.Supercomputer data analysis for the purpose of validation and evaluation of our methods and tools was a major component of our research activity, andthis workused about 160,000 supercomputer hours mostly on the UMN Mesabi machine, the most capable supercomputer at UMN. We applied our own methods and tools to the analysis of wild and captive panda populations. We found that wild panda from the four largest habitats were genetically unrelated, most pandas 200km apart shared no common ancestral alleles, and the Qinling wild panda population known to have habitat loss and the Linagshan wild panda population being one of the smallest wild populations had high levels of inbreeding. This genomic analysis of wild panda populations cautioned the potential existence of hidden-inbreeding in current panda breeding practice and called for genome-guided breeding and conservation. The analysis of the captive panda population showed that the small wild panda population were severely under-represented in the captive population, and the current breeding recommendations would either continue this trend or furtherdescrease the representation of the smallest wild panda populations in the captive population. We developed three hapbitat-controlled breeding plans to minimize the risk of hidden-inbreeding and to increase the representation of the smallest wild panda populations in the captive population.

Publications

  • Type: Journal Articles Status: Published Year Published: 2016 Citation: Yang, J., F. Shen, R. Hou, and Y. Da. 2016. Genetic composition of captive panda population. BMC Genetics: 17(1):1-9. doi: 10.1186/s12863-016-0441-y.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2016 Citation: Da, Y., C. Tan, and D. Prakapenka. Integrated SNP-haplotype genomic selection based on the invariance property of GBLUP and GREML to duplicate SNPs. The 2016 Joint Annual Meeting ASAS-ADSA-CSAS-WSASAS, July 19-23. Salt Lake City.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2016 Citation: Tan, C., Y. Da, Z. Wu, D. Liu, N. Li, and X. Hu. Genome-wide association study and accuracy of genomic prediction for teat number in Duroc pigs using genotyping by sequencing. The 2016 Joint Annual Meeting ASAS-ADSA-CSAS-WSASAS, July 19-23. Salt Lake City.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2016 Citation: Garbe, J. R., D. Prakapenka, J. Yang, C. Tan, C. Wang, and Y. Da. Genomic Inbreeding and Relationships in Wild Panda Populations. Abstract P0694, Plant and Animal Genome XXIV Conference, January 9-13, 2016. San Diego.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2016 Citation: Prakapenka, D. GVCHAP: A Computer Package for Genomic Prediction and Estimation Using Haplotypes and Single SNPs. Abstract P0365, Plant and Animal Genome XXIV Conference, January 9-13, 2016. San Diego.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2016 Citation: Tan, C., Y. Da, Z. Wu, D. Liu, N. Li, and X. Hu. Genotyping-By-Sequencing for Genomic Evaluation in Pigs. Abstract P0365, Plant and Animal Genome XXIV Conference, January 9-13, 2016. San Diego.
  • Type: Journal Articles Status: Published Year Published: 2016 Citation: Garbe, J.R., D. Prakapenka, C. Tan, and Y. Da. 2016. Genomic inbreeding and relatedness in wild panda populations. PLoS ONE 11(8): e0160496. doi:10.1371/journal.pone.0160496.


Progress 02/19/15 to 09/30/15

Outputs
Target Audience:Scientists and researchers in genetics and genomics. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The program was training two Ph.D. students and a M.S. student. How have the results been disseminated to communities of interest?The new theory and method for using functional and structural information in genomic selection was published, the validation results were reported at two major conferences (PAG and ADSA/ASAS). The Locusmaps program was applied to improved the swine genome assembly and just started planing for analysis to improve the cattle genome assembly at the request of a cattle genome coordinator, with the caution that accessing the USDA dairy genomic databases is still at the discussion stage and we could not guarantee to be granted data access. Regardles of the result of our request for data access, the request to use the Locusmaps program bythe cattle genome coordinator shows theusefulness of the program for improving genome assembly. What do you plan to do during the next reporting period to accomplish the goals?Utilizing functional genomic information for genomic selection will be the focus for the next year. We expect to release or publish most of the computing tools we developed this past year.

Impacts
What was accomplished under these goals? For this Aim 1, we published our new theory and methods for integrating functional and structural genomic information in genomic selection, and we conducted extensive validation studiesyielded promisingresults from our new theory and methods as we reported at PAG-2015 and ADSA/ASAS-2015. The amount of functional genomic information has been growing rapidly but remains largely unused in genomic selection. Genomic prediction and estimation using haplotypes in genome regions with functional elements such as all genes of the genome can be an approach to integrate functional and structural genomic information for genomic selection. Towards this goal, this article develops a new haplotype approach for genomic prediction and estimation. A multi-allelic haplotype model treating each haplotype as an 'allele' was developed for genomic prediction and estimation based on the partition of a multi-allelic genotypic value into additive and dominance values. Each additive value is expressed as a function of h-1 additive effects, where h=number of alleles or haplotypes, and each dominance value is expressed as a function of h(h-1)/2 dominance effects. For a sample of q individuals, the limit number of effects is 2q-1 for additive effects and is the number of heterozygous genotypes for dominance effects. Additive values are factorized as a product between the additive model matrix and the h-1 additive effects, and dominance values are factorized as a product between the dominance model matrix and the h(h-1)/2 dominance effects. Genomic additive relationship matrix is defined as a function of the haplotype model matrix for additive effects, and genomic dominance relationship matrix is defined as a function of the haplotype model matrix for dominance effects. Based on these results, a mixed model implementation for genomic prediction and variance component estimation that jointly use haplotypes and single markers is established, including two computing strategies for genomic prediction and variance component estimation with identical results. The multi-allelic genetic partition fills a theoretical gap in genetic partition by providing general formulations for partitioning multi-allelic genotypic values and provides a haplotype method based on the quantitative genetics model towards the utilization of functional and structural genomic information for genomic prediction and estimation. We conducted extensive validation studies to evaluate our new theory and method described above. We evaluated five types of haplotype analysis separately or jointly with single-SNP analysis: haplotypes of autosome genes, trait-related genes reported in the literature, ChIPseq blocks, evenly divided haplotype blocks, and high-heritability SNP block identified by our own formulations for estimating the heritability of each SNP or haplotype block. The results showed that the joint haplotype and single-SNP analysis improved the prediction accuracy in almost all cases, haplotype analysis performed better than single-SNP analysis in most cases, and the high-heritability SNPs and haplotype blocks had the highest prediction accuracies. In collaboration with China Agricultural University, we also conducted analysis of genomic selection using genotyping by sequencing (GBS) in swine using our new method and existing methods. This swine analysis provided additional promising results from our new method. For Aim 2, we continued the development of the GVCHAP computer program that implements our new mixed model method for joint analysis of haplotypes and single SNPs for genomic prediction and estimation. This program is now fully functional and has been used in our extensive validation studies. Several utility programs in support of the GVCHAP program were also developed. Work continued to develop additional functions of the Locusmaps program for genome-wide linkage analysis to improve the genome assembly, and to develop a new version of the Pedigraph program that has been used in many species.

Publications

  • Type: Conference Papers and Presentations Status: Published Year Published: 2015 Citation: Ma L, Sonstegard TS, VanTassell CP, Cole JB, Wiggans GR, Crooker BA, Ponce De Leon FA, Da Y. Selection signature analysis in Holstein cattle identified genes known to affect reproduction. ADSA/ASAS 2015, Orlando, July 12-16 2015. Abstract T102. http://aipl.arsusda.gov/publish/jds/2015/JDS98_Suppl2_350_AbstrT102-T103.pdf
  • Type: Conference Papers and Presentations Status: Published Year Published: 2015 Citation: Tan C, Ren J, Zhuolin Huang Z, Yiqiang Zhao Y, Da Y, Xaioxiang Hu X. An improved approach for swine SNP genotyping using Genotyping-by-Sequencing. ADSA/ASAS 2015, Orlando, July 12-16 2015. Abstract M78. http://m.jtmtg.org/abs/t/65275
  • Type: Journal Articles Status: Published Year Published: 2015 Citation: Da, Y. Multi-allelic haplotype model based on genetic partition for genomic prediction and variance component estimation using SNP markers. BMC Genetics 2015, 16:144.
  • Type: Journal Articles Status: Published Year Published: 2015 Citation: Ma, L., J.R. O'Connell, P.M. VanRaden, B. Shen, A. Padhi, C. Sun, D.M. Bickhart, J.B. Cole, D.J. Null, G.E. Liu, Y. Da, and G.R. Wiggans. 2015. Cattle sex-specific recombination and genetic control from a large pedigree analysis. PLoS Genet 11(11): e1005387.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2015 Citation: Da Y, Wang C, Tan C, Prakapenka D, Shigematsu M, Garbe JR, Ma L. Multi-allelic haplotype model to integrate functional genomic information with genomic prediction and estimation. Abstract P1176. Plant and Animal Genome XXIII, January 10-14, 2015. San Diego. https://pag.confex.com/pag/xxiii/webprogram/Paper14435.html
  • Type: Conference Papers and Presentations Status: Published Year Published: 2015 Citation: Tan C, Prakapenka D, Wang C, Ma L, Garbe JR, Ma L. Integration of haplotype analysis of functional genomic information with single SNP analysis improved accuracy of genomic prediction. ADSA/ASAS 2015, Orlando, July 12-16 2015. Abstract M84. http://m.jtmtg.org/abs/t/65063.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2015 Citation: Da, Y. Multi-allelic haplotype model based on genetic partition for genomic prediction and variance component estimation using SNP markers. ADSA/ASAS 2015, Orlando, July 12-16 2015. Abstract 540. http://www.jtmtg.org/JAM/2015/abstracts/577.pdf