Source: UNIV OF WISCONSIN submitted to
DEVELOPMENT OF EFFICIENT DESIGN AND STATISTICAL ANALYSIS STRATEGIES FOR GENOME-WIDE ASSOCIATION STUDIES IN LIVESTOCK
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0218979
Grant No.
(N/A)
Project No.
WIS01433
Proposal No.
(N/A)
Multistate No.
(N/A)
Program Code
(N/A)
Project Start Date
Oct 1, 2009
Project End Date
Sep 30, 2013
Grant Year
(N/A)
Project Director
Rosa, G.
Recipient Organization
UNIV OF WISCONSIN
21 N PARK ST STE 6401
MADISON,WI 53715-1218
Performing Department
Dairy Science
Non Technical Summary
The identification of genes affecting complex phenotypes such as disease resistance, production and reproduction traits, is of extreme importance for a better understanding of the genetic architecture of complex traits and for the development of efficient strategies for the genetic improvement of livestock. The advent of high-throughput SNP genotyping technologies has made genome-wide association studies (GWAS) feasible, providing a powerful tool for the identification of genetic variants associated with complex traits. The potential of GWAS, however, can only be fully exploited with efficient experimental design planning, which should include the calculation of adequate sample sizes for attaining a desired statistical power to detect important genes, and a powerful data mining methodology for mapping such genes. In this research project, we will develop efficient experimental design techniques, such as selective genotyping strategies and statistical methodology for power and sample size calculations, as well as specific statistical models for an appropriate data mining of such experiments.
Animal Health Component
(N/A)
Research Effort Categories
Basic
(N/A)
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
3043999209020%
3047310209080%
Goals / Objectives
In this research project, we will develop efficient experimental design techniques, such as selective genotyping strategies and statistical methodology for power and sample size calculations, as well as specific statistical models for an appropriate data mining of genome-wide association studies in livestock. As an instructional byproduct, through the development of the project, a PhD fellow will be trained and undergraduate students will also have the opportunity to further develop their scientific skills and genetic knowledge.
Project Methods
The first applications of genome-wide association studies (GWAS) in livestock were able to identify some important QTL and also demonstrated the potential of such approach for gene mapping. However, they utilized very simplistic statistical methodologies, most often with significance tests considering each locus (or marker) at a time, which do not fully exploit the potential of GWAS especially to study the gene-gene interaction. Moreover, no formal statistical power and sample size calculation were conducted or provided. In this project we will develop optimal design and data analysis strategies for an efficient application of GWAS in genetics studies with livestock, for an optimal use of funding and animal resources. For power and sample size calculations, methods will be developed within a false discovery rate (FDR) framework and considering alternative experimental design strategies, such as selective genotyping and selective phenotyping. From a data analysis stand-point, appropriate data mining techniques will be developed, with focus on different distributions of the phenotypic traits, and on dimension reduction techniques using machine learning approaches.

Progress 10/01/09 to 09/30/13

Outputs
Target Audience: Geneticists and breeders who use high throughput genotyping technology for gene discovery and for prediction of complex traits may benefit from the models and design strategies developed in this project. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? As an instructional byproduct, through the development of the project, a graduate student (Huihui Duan) was trained with the development of a research work directly related to the project, as well as a number of visiting scholars and post-doctoral fellows who were partially involved in the development of specific objectives of the project (e.g., Boligon AA, Okut H, Wu XL, Sun C, and Perez-Cabal MA). How have the results been disseminated to communities of interest? The results have been disseminated to communities of interest mainly through the publication of peer-reviewed articles in scientific journals, and the presentation of seminars and conference talks. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? The project has been satisfactorily developed with the accomplishment of many goals, even beyond what had been initially proposed, including the development and comparison of selective genotyping strategies for genome-enable prediction of complex traits, and the development of statistical models for the appropriate data mining of genome-wide association studies in livestock. In addition, many machine-learning approaches have been adapted for use in livestock genomics data, and a number of data sets on dairy cattle, beef cattle and poultry have been analyzed using such methodologies. Lastly, a software for implementation of Bayesian regularized neural networks have been developed in the statistical/programming language R, which is freely available at the Comprehensive R Archive Network (CRAN) website.

Publications

  • Type: Theses/Dissertations Status: Published Year Published: 2013 Citation: Duan, H. Whole Genome Prediction Within and Across Environments: An Application to Wheat Yield. MS Dissertation, University of Wisconsin-Madison, 2012.
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Okut, H., Wu, X.-L., Rosa, G. J. M., Bauck, S., Woodward, B. W., Schnabel, R. D., Taylor, J. F. and Gianola, D. Predicting expected progeny difference for marbling score in Angus cattle using artificial neural networks and Bayesian regression models. Genetics Selection Evolution 45: 34, 2013.
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Morota, G., Koyama, M., Rosa, G. J. M., Weigel, K. A. and Gianola, D. Predicting complex traits using a diffusion kernel on genetic markers with an application to dairy cattle and wheat data. Genetics Selection Evolution 45: 17, 2013.
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Pe�agaricano, F., Weigel, K. A., Rosa, G. J. M. and Khatib, H. Inferring quantitative trait pathways associated with bull fertility from a genome-wide association study. Frontiers in Genetics 3: 307, 2013. doi: 10.3389/fgene.2012.00307.
  • Type: Other Status: Other Year Published: 2013 Citation: Rosa, G. J. M. Statistical and Computational Approaches for Whole-Genome Prediction of Complex Traits. Workshop on Genomic Tools for Improving Beef Cattle Production, Jaboticabal - Brazil, August 12-13, 2013.
  • Type: Other Status: Other Year Published: 2013 Citation: Rosa, G. J. M. Whole-Genome Prediction of Complex Phenotypes. 58th Annual Meeting of the Brazilian Region (RBRAS), Campina Grande - Brazil, July 22-26, 2013.


Progress 01/01/12 to 12/31/12

Outputs
OUTPUTS: During the reporting period of the project we worked on statistical and computational methods applied to prediction of complex traits using whole genome molecular marker information. Two specific topics investigated in the reporting period were imputation of missing genotypes using an ensemble-based approach, and implementation of parallel Markov chain Monte Carlo for high-performance Bayesian computation in genomic selection and GWAS. Other two areas of research developed during this year were a comparison of different selective genotyping strategies for genomic selection, and alternative cross-validation design for genomic selection model assessment. PARTICIPANTS: Dr. Guilherme J. M. Rosa: Dr. Rosa is the PI of the project, working on the development of statistical and computational models for gene mapping, transcriptional profiling, and prediction of complex traits in poultry and other livestock species. Collaborators in the project include Drs. D. Gianola, K. Weigel, and X. Wu from the University of Wisconsin - Madison. Some additional collaborators who benefited from training during the reporting period include graduate students (Arione Boligon and Huihui Huang) and postdoctoral fellows (Drs. C. Sun and M. Perez-Cabal). TARGET AUDIENCES: Geneticists and breeders who use high throughput genotyping technology for gene discovery and for prediction of complex traits would benefit from the models and design strategies we are developing in our project. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
The identification of genes affecting complex phenotypes, such as disease resistance, production and reproduction traits, is of extreme importance for a better understanding of the genetic architecture of complex traits and for the development of efficient strategies for the genetic improvement of livestock. The advent of high-throughput SNP genotyping technologies has made genome-wide association studies (GWAS) feasible, providing a powerful tool for the identification of genetic variants associated with complex traits. In addition, molecular marker information can be used on genome-wide marker assisted selection approaches (the so-called genomic selection; GS), to improve accuracy of genetic merit prediction for selection purposes in agriculture. The potential of GWAS and of GS, however, can only be fully exploited with an efficient experimental design planning for data collection, and a powerful data mining methodology for mapping major genes or for prediction of genetic merit of selection candidates. This research project is focused precisely in this area, aiming the development of efficient experimental and data mining approaches for efficient GWAS and GS applications in livestock.

Publications

  • Boligon, A. A., Long, N., Albuquerque, L. G., Weigel, K. A., Gianola, D. and Rosa, G. J. M. Comparison of selective genotyping strategies for prediction of breeding values in a population undergoing selection. Journal of Animal Science, 2012 (in press)
  • Vazquez, A. I., de los Campos, G., Klimentidis, Y. C., Rosa, G. J. M., Gianola, D., Yi, N. and Allison, D. B. A Comprehensive genetic approach for improving prediction of skin cancer risk in humans. Genetics 192: 1493-1502, 2012.
  • Wu, X.-L., Sun, C., Beissinger, T. M., Rosa, G. J. M., Weigel, K. A., de Leon, N. and Gianola, D. Parallel Markov chain Monte Carlo - bridging the gap to high-performance Bayesian computation in animal breeding and genetics. Genetics Selection Evolution 44: 29, 2012.
  • Sun, C., Wu, X. L., Weigel, K. A., Rosa, G. J. M., Bauck, S., Woodward, B. W., Schnabel, R. D., Taylor, J. F. and Gianola, D. An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle. Genetics Research 94: 133-150, 2012.
  • Perez-Cabal, M. A., Vazquez, A. I., Gianola, D., Rosa, G. J. M., and Weigel, K. A. Accuracy of genome-enabled prediction in a dairy cattle population using different cross-validation layouts. Frontiers in Genetics 3:27, 2012.


Progress 01/01/11 to 12/31/11

Outputs
OUTPUTS: During the reporting period of the project, we worked on machine learning techniques such as artificial neural networks and support vector regression for prediction of complex traits using whole genome molecular marker information, to take into account non-additive genetic effects affecting the phenotypic traits. In addition, we performed a series of Monte Carlo simulation to assess the long-term impacts of genomic selection in breeding programs, and developed high-throughput computing strategies for the implementation of genomic selection in livestock. Additional work was also developed in the area of QTL mapping with multiple traits using a factor analysis approach. PARTICIPANTS: Dr. Guilherme J. M. Rosa: Dr. Rosa is the PI of the project, working on the development of statistical and computational models for gene mapping, transcriptional profiling, and prediction of complex traits in poultry and other livestock species. Collaborators in the project include Drs. D. Gianola, K. Weigel, H. Khatib and X. Wu from the University of Wisconsin - Madison. Some additional collaborators who benefited from training during the reporting period include graduate students (Nanye Long and Huihui Huang) and postdoctoral fellows (Drs. Fabyano Silva and Hayrettin Okut). TARGET AUDIENCES: Geneticists and breeders who use high throughput genotyping technology for gene discovery and for prediction of complex traits would benefit from the models and design strategies we are developing in our project. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
The identification of genes affecting complex phenotypes such as disease resistance, production and reproduction traits, is of extreme importance for a better understanding of the genetic architecture of complex traits and for the development of efficient strategies for the genetic improvement of livestock. The advent of high-throughput SNP genotyping technologies has made genome-wide association studies (GWAS) feasible, providing a powerful tool for the identification of genetic variants associated with complex traits. In addition, molecular marker information can be used on genome-wide marker assisted selection approaches (the so-called genomic selection; GS), to improve accuracy of genetic merit prediction for selection purposes in agriculture. The potential of GWAS and of GS, however, can only be fully exploited with an efficient experimental design planning for data collection, and a powerful data mining methodology for mapping major genes or for prediction of genetic merit of selection candidates. This research project is focused precisely in this area, aiming the development of efficient experimental and data mining approaches for efficient GWAS and GS applications in livestock.

Publications

  • Okut, H., Gianola, D., Rosa, G. J. M., and Weigel, K. A. 2011. Prediction of body mass index in mice using dense molecular markers and a regularized neural network. Genet. Res. 93: 189-201.
  • Silva, F. F., Rosa, G. J. M., Guimaraes, S. E. F., Lopes, P. S. and de los Campos, G. 2011. Three-step Bayesian factor analysis applied to QTL detection in crosses between outbred pig populations. Livestock Science 142: 210-215.
  • Silva, F. F., Varona, L., Resende, M. D. V., Bueno Filho J. S. S., Rosa, G. J. M. and Viana, J. M. S. 2011. A note on accuracy of Bayesian LASSO regression in GWS. Livestock Science 142: 310-314.
  • Wu, X.-L., Beissinger, T. M., Bauck, S., Woodward, B., Rosa, G. J. M., Weigel, K. A., de Leon, N. and Gianola, D. 2011. A primer on high-throughput computing for genomic selection. Frontiers in Genetics 2:4. (doi: 10.3389/fgene.2011.00004)
  • Gianola, D., Okut, H., Weigel, K. A. and Rosa, G. J. M. 2011. Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC Genetics 12:87.
  • Long, N., Gianola, D., Rosa, G. J. M., and Weigel, K. A. 2011. Application of support vector regression to genome-assisted prediction of quantitative traits. Theor. Appl. Genet. 123: 1065-1074.
  • Long, N., Gianola, D., Rosa, G. J. M., and Weigel, K. A. 2011. Dimension reduction and variable selection for genomic selection: application to predicting milk yield in Holsteins. J. Anim. Breed. Genet. 128: 247-257.
  • Long, N., Gianola, D., Rosa, G. J. M., and Weigel, K. A. 2011. Long-term impacts of genome-enabled selection. J. Appl. Genetics 52(4): 467-480.
  • Long, N., Gianola, D., Rosa, G. J. M., and Weigel, K. A. 2011. Marker-assisted prediction of non-additive genetic values. Genetica 139(7): 843-854.


Progress 01/01/10 to 12/31/10

Outputs
OUTPUTS: During the reporting period of the project, we developed a statistical methodology for evaluating candidate gene effects by combining results from multiple experiments within a meta-analysis approach. In addition, we performed a genome-wide association study using a selective DNA pooling strategy for the identification of candidate markers for fertility in Holstein cattle. Moreover, in the context of genomic selection, we conducted a study to assess the predictive ability of subsets of single nucleotide polymorphisms (SNP) with and without parent average in US Holsteins, and the accuracy of direct genomic values derived from imputed SNP genotypes in Jersey cattle. Additional work was also developed in the area of machine learning and data mining strategies for genomic selection applications, including semi-parametric methods based on reproducing kernel Hilbert spaces, radial basis function regression approaches, and L-2-Boosting algorithms. PARTICIPANTS: Dr. Guilherme J. M. Rosa: Dr. Rosa is the PI of the project, working on the development of statistical and computational models for gene mapping, transcriptional profiling, and prediction of complex traits in poultry and other livestock species. Collaborators in the project include Drs. D. Gianola, K. Weigel, H. Khatib and X. Wu from the University of Wisconsin - Madison. Some additional collaborators who benefited from training during the reporting period include current and former graduate students (Ana Vazquez, Gustavo de los Campos, Nanye Long, and Huihui Huang) and postdoctoral fellow (Dr. Oscar Gonzalez-Recio). TARGET AUDIENCES: Geneticists and breeders who use high throughput genotyping technology for gene discovery and for prediction of complex traits would benefit from the models and design strategies we are developing in our project. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
The identification of genes affecting complex phenotypes such as disease resistance, production and reproduction traits, is of extreme importance for a better understanding of the genetic architecture of complex traits and for the development of efficient strategies for the genetic improvement of livestock. The advent of high-throughput SNP genotyping technologies has made genome-wide association studies (GWAS) feasible, providing a powerful tool for the identification of genetic variants associated with complex traits. The potential of GWAS, however, can only be fully exploited with an efficient experimental design planning, which should include the calculation of adequate sample sizes for attaining a desired statistical power to detect important genes, and a powerful data mining methodology for mapping such genes. In addition, molecular marker information can be used on genome-wide marker assisted selection approaches (the so-called genomic selection), to improve accuracy of genetic merit prediction for selection purposes in agriculture. This research project refers to the development of efficient experimental and data mining approaches for efficient GWAS and genomic selection applications in livestock.

Publications

  • de los Campos, G., Gianola, D., Rosa, G. J. M., Weigel, K. A. and Crossa, J. 2010. Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genetics Research 92:295-308.
  • Weigel, K. A., de los Campos, G., Vazquez, A. I., Rosa, G. J. M., Gianola, D. and Van Tassell, C. P. 2010. Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle. Journal of Dairy Science 93:5423-5435.
  • Long, N. Y., Gianola, D., Rosa, G. J. M., Weigel, K. A., Kranis, A. and Gonzalez-Recio, O. 2010. Radial basis function regression methods for predicting quantitative traits using SNP markers. Genetics Research 92:209-225.
  • Gonzalez-Recio O, Weigel KA, Gianola D, Naya H, Rosa GJM. 2010. L-2-Boosting algorithm applied to high-dimensional problems in genomic selection. Genetics Research 92:227-237.
  • Wu, X. L., Gianola, D., Rosa, G. J. M. and Weigel, K. A. 2010. Bayesian model averaging for evaluation of candidate gene effects. Genetica 138:395-407.
  • Vazquez, A. I., Rosa, G. J. M., Weigel, K. A., de los Campos, G., Gianola, D. and Allison, D. B. 2010. Predictive ability of subsets of single nucleotide polymorphisms with and without parent average in US Holsteins. Journal of Dairy Science 93:5942-5949.
  • Huang, W., Kirkpatrick, B. W., Rosa, G. J. M. and Khatib, H. 2010. A genome-wide association study using selective DNA pooling identifies candidate markers for fertility in Holstein cattle. Animal Genetics 41:570-578.


Progress 01/01/09 to 12/31/09

Outputs
OUTPUTS: During the first two month of this project, we developed study to assess the predictive ability of genomic selection models using selected subsets of genetic markers from a high density panel of single nucleotide polymorphism markers, and also compared classification methods for detecting associations between such markers and binary traits. PARTICIPANTS: Dr. Guilherme J. M. Rosa: Dr. Rosa is the PI of the project, working on the development of statistical and computational models for gene mapping, transcriptional profiling, and prediction of complex traits in poultry and other livestock species. TARGET AUDIENCES: Geneticists and breeders who use high throughput genotyping technology for gene discovery and for prediction of complex traits would benefit from the models and design strategies we are developing in our project. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
The identification of genes affecting complex phenotypes such as disease resistance, production and reproduction traits, is of extreme importance for a better understanding of the genetic architecture of complex traits and for the development of efficient strategies for the genetic improvement of livestock. The advent of high-throughput SNP genotyping technologies has made genome-wide association studies (GWAS) feasible, providing a powerful tool for the identification of genetic variants associated with complex traits. The potential of GWAS, however, can only be fully exploited with an efficient experimental design planning, which should include the calculation of adequate sample sizes for attaining a desired statistical power to detect important genes, and a powerful data mining methodology for mapping such genes. This research project refers to the development of efficient experimental design techniques, such as selective genotyping strategies and statistical methodology for power and sample size calculations, as well as specific statistical models for an appropriate data mining of such experiments.

Publications

  • Weigel, K. A., de los Campos, G., Gonzalez-Recio, O., Naya, H., Wu, X. L., Long, N., Rosa, G. J. M. and Gianola, D. 2009. Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. J. Dairy Sci. 92: 5248-5257.
  • Wang, X., Schutzkus, V., Huang, W., Rosa, G. J. M. and Khatib, H. 2009. Analysis of segregation distortion and association of the bovine FGF2 with fertilization rate and early embryonic survival. Anim. Genetics 40, 722-728.
  • De los Campos, G., Gianola, D. and Rosa, G. J. M. 2009. The linear model of quantitative genetics is a reproducing kernel Hilbert spaces regression. J. Anim. Sci. 87: 1883-1887.
  • Long, N., Gianola, D., Rosa, G. J. M., Weigel, K. A. and Avendano, S. 2009. Comparison of classification methods for detecting associations between SNPs and chick mortality. Genetics Selection Evolution 41:18.
  • Gonzalez-Recio, O., Gianola, D., Rosa, G. J. M., Weigel, K. A. and Kranis, A. 2009. Genome-assisted prediction of a quantitative trait measured in parents and progeny: application to food conversion rate in chickens. Genetics Selection Evolution 41:3.
  • Driver, A. M., Huang, W., Gajic, S., Monson, R. L., Rosa, G. J. M. and Khatib H. 2009. Effects of the progesterone receptor variants on fertility traits in cattle. Journal of Dairy Science, 92: 4082-4085.