Source: UNIVERSITY OF FLORIDA submitted to
IMPROVING BREEDING EFFICIENCY IN AUTOTETRAPLOIDS WITH GENOME-WIDE PREDICTION
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
EXTENDED
Funding Source
Reporting Frequency
Annual
Accession No.
1004074
Grant No.
2014-67013-22418
Project No.
FLAW-2014-04349
Proposal No.
2014-04349
Multistate No.
(N/A)
Program Code
A1141
Project Start Date
Sep 1, 2014
Project End Date
Aug 31, 2018
Grant Year
2014
Project Director
Munoz, P.
Recipient Organization
UNIVERSITY OF FLORIDA
207 GRINTER HALL
GAINESVILLE,FL 326110001
Performing Department
AG-AGRONOMY
Non Technical Summary
Alfalfa, potato, and blueberry are major crops for the U.S. agricultural economy, with a combined farm-gate value of $15 billion in 2012. Despite their differences, these three crops are autotetraploids, meaning that four homologous chromosomes are present in each linkage group. The complex inheritance of autotetraploids limits the rate of genetic improvement as compared with diploids. While the use of genomic information has revolutionized diploid breeding, these successes are not transferable to autotetraploids. The primary goal of this proposal is to accelerate autotetraploid breeding by developing the capacity to predict complex traits with genomic information. To reach this goal we will 1) Optimize the use of next-generation sequencing to accurately call the number of homologous alleles (dosage) present at a given gene/loci; 2) Develop prediction models that incorporate genomic information with dosage, and; 3) Create user friendly softwares that incorporate these advances and train breeders in their use. This project is a unique collaboration among plant breeders from different species with expertise in genomics, who are united by the goal of improving the efficiency of variety development in autotetraploid crops. We expect then to benefit all alfalfa, potato, and blueberry breeding programs through the development and release of software and tutorials in newly created methods.
Animal Health Component
0%
Research Effort Categories
Basic
60%
Applied
10%
Developmental
30%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2011120108120%
2011310108140%
2011640108140%
Goals / Objectives
Our primary goal is to accelerate autotetraploid breeding by developing the capacity to predict complex traits with genome-wide markersThe following three objectives were formulated to enable us to achieve our primary goal of more efficient tetraploid breeding:1. Optimize the use of next-generation sequencing to assign high-quality tetraploid genotypes in alfalfa, potato, and blueberry.2. Develop genome-wide prediction methods appropriate for autotetraploids and compare accuracy in populations of alfalfa, potato, and blueberry.3. Create user-friendly software that includes SNP calling and GWS methods for autotetraploid crops, and train agronomic and horticultural plant breeders in its application.
Project Methods
The project will utilize next-generation sequencing technology to generate millions of sequencing reads for each alfalfa, blueberry and potato individual genotyped . These sequencing reads will be used to call SNPs markers as well as its dosage. Dosage calling will be validated with high-resolution melting.The information above will be tested in different genome-wide selection approaches using diploid and tetraploid genomic relationship matrices to evaluate if dosage can improve model accuracy to predict phenotypes.All methods will be compiled in an R package and uploaded in the web for the scientific community to use.

Progress 09/01/16 to 08/31/17

Outputs
Target Audience:Plant breeders, plant geneticist and graduate students working in autotetraploid species. In the long term our results will impact autotetraploid plant breeding, and thus development of new cultivars/varieties, which will impact the crops producers in the field. Changes/Problems:An one-year non-cost extension was requested an approved. Final report should be submitted by the end of the new deadline. What opportunities for training and professional development has the project provided?The project has supported the training of two graduate students and two postdocs at UW-Madison during the reporting period, focusing on field research and data analysis methods. At UF, the project has supported the training of one graduate student, two undergraduate students, and two postdocs How have the results been disseminated to communities of interest?During the reporting period, one new paper has been published in an important journal for the plant breeding and genetics community (Theor. Appl. Genetics), and four presentations were made at scientific conferences. What do you plan to do during the next reporting period to accomplish the goals?A postdoc at UW-Madison is completing the analysis of genotyping-by-sequencing data on 80 potato clones and comparing it against SNP array data. A manuscript and conference presentation on this research are planned for the next reporting period. At UF, analysis of the blueberry and alfalfa genomic selection data should be completed, and the manuscript submitted. Conference presentations are planned to report results of using improved models to the community.

Impacts
What was accomplished under these goals? 1. Optimize the use of next­generation sequencing to assign high­quality tetraploid genotypes. Progress on this objective was hampered by the sudden dismissal of a UW-Madison PhD student due to disruptive behavior. A postdoc was hired to complete this research during the one-year extension period. 2. Develop genome-wide prediction methods appropriate for autotetraploids. New theory was developed to include dominance effects in tetraploid genome-wide prediction models and used to assess the potential for genomic selection to improve potato breeding. The training set contained 544 elite clones, phenotyped over five years and genotyped with an Infinium SNP array to produce 5278 markers with accurate allele dosage information. The reliability of breeding value predictions in the training set was consistently higher than the narrow-sense heritability, but the results in unselected F1 populations were more variable, underscoring the need for further research to optimize the design of potato breeding programs. Genomic selection methods were implemented and no significant differences were detected compared to the diploid models. However, there is indication of non-additive effects, which will be explored in the next stage. A manuscript is being prepared and we expect to submit it in Spring 2018. 3. Create software for SNP calling and genome-wide prediction methods and train others in its application. New software to make tetraploid genotype calls from SNP array data was developed, which uses hierarchical clustering and multiple F1 populations to calibrate the relationship between signal intensity and allele dosage.?The R package is named ClusterCall and can be downloaded, along with a reference manual and two vignettes, from potatobreeding.cals.wisc.edu/software.

Publications

  • Type: Journal Articles Status: Published Year Published: 2017 Citation: Schmitz Carley CA, Coombs JJ, Douches DS, Bethke PC, Palta JP, Novy RG, Endelman JB. 2017. Automated tetraploid genotype calling by hierarchical clustering. Theor Appl Genet 130:717-726. doi: 10.1007/s00122-016-2845-5
  • Type: Conference Papers and Presentations Status: Published Year Published: 2017 Citation: Schmitz Carley C, Endelman JB (2017) Genomic selection in tetraploid potato. Annual Meeting of the National Association of Plant Breeders, Aug. 89, Davis, CA.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2017 Citation: Schmitz Carley C, Endelman JB (2017) Genome-wide selection accuracy in tetraploid potato F1 populations. Potato Association of America Annual Meeting, July 2426, Fargo, ND.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2017 Citation: Endelman J, Schmitz Carley C (2017) Combining marker and pedigree information for genome-wide prediction in potato. 20th Triennial Conference of the European Association for Potato Research. July 9-14, Versailles, France.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2017 Citation: Oliveira de Bem I, Cellon C, Amadeu R, Resende M, Endelman J, Olmstead J, and Patricio M (2017). Improving breeding efficiency in autotetraploids with genome-wide prediction. Annual Meeting of the National Association of Plant Breeders, Aug 710, Davis, CA.


Progress 09/01/15 to 08/31/16

Outputs
Target Audience:Plant breeders,plant geneticist and graduate students were reached through the poster and seminar presentations, as well as the scientific manuscript published. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?This project has provided multiple opportunities for training: designing experiments, establishing field experiments, phenotyping commercially important traits, performing analysis in phenotypes and genomic data. Catherine Cellon has been trained in advanced methods to perform selection in blueberries using REML/BLUP analysis. Mehul Bhakta, post-doc, is being trained in bioinformatics of autotetraploid species. Schuyler Smith began his PhD training in Plant Breeding and Plant Genetics at UW-Madison in June 2015. We successfully held the 2015 version of the training workshop at the University of Florida. Project results to date were summarized by PI P. Munoz at the workshop. How have the results been disseminated to communities of interest?We successfully held the 2015 version of the training workshop at the University of Florida. This workshop is streamed online to more than 500 people around the globe of more than 50 countries and more than 80 people attending in person. Project results to date were summarized by PI P. Munoz at the workshop. We have also presented a six poster in different venues and published two scientific papers. What do you plan to do during the next reporting period to accomplish the goals?During next reporting period we will keep working as scheduled. We will keep analyzing data to report results of SNP calling and imputation, as well as Genomic selection models with diploid and tetraploid genotyping. We will also perform validation of tetraploid calling using HRM as proposed. We will continue training breeders and geneticist in the use of the software developed in this project.

Impacts
What was accomplished under these goals? AIM I I-1. Genotyping POTATO:Year 2. Genotyping and data curation for the three potato F1 populations was completed using the potato Infinium array, with population sizes of N = 65, 70, 70. ALFALFA: Year 2. DNA of parental lines was extracted and sent for genotyping in 2016, using the sequence-capture method. BLUEBERRY: Year 2. Genotyping of the blueberry training population was finished in the first quarter 2016. In addition, a biparental population including 96 individuals from an interspecific hybridization between blueberry species were genotyped with GBS data and the same sequence capture technology for comparison. Neither of these genotyping projects utilized funds from this project but have been made available for research use. The strategy of using lower number of probes with higher depth worked well, with an average read depth of over 50x for the 30,000 probes utilized. I-2. Dosage Calling Year 2. -The ClusterCall package for autotetraploid genotype calling with Infinium array data has been completed and is available online (http://potatobreeding.cals.wisc.edu/software). A manuscript describing the software and results with potato has been submitted. --For NGS marker data, a bioinformatics pipeline was developed to produce tetraploid genotype calls in Variant Call Format from FASTQ input, using bwa for alignment and GATK for variant calling. The software was tested using 91 potato samples, and results were presented at the 2016 Annual Meeting of the National Association of Plant Breeders. I-3. Marker Imputation on Tetraploid NGS data Year 2. Using tetraploid potato SNP array data, three imputation algorithms were compared: k-Nearest Neighbors (kNN), Random Forest (RF), and a Hidden Markov Model (HMM). The lowest imputation error was observed with RF (19%), followed by HMM (46%), and then kNN (54%) AIM II II-1. Phenotyping Populations Year 2. Phenotyping was completed forthe bluberry populatiosn for the most important traits. Alfalfa phenotypes for parental populationswas completed. Phenotyping alfalfa selections will start in spring 2017.We completed the second year of phenotyping for the potato populations, focusing on total yield, tuber size distribution, and specific gravity. Phenotyping is now complete for the potato populations. II-3. GWS with Dosage Dosage was called for the blueberry population and genomic selection models were fit. No significant improvement was detected when the model included a tetraploid instead of a diploid parameterization. Results will be published in a scientific journal during 2017. AIM III III-1. Software Development Two R packages were released in FY16: AGHmatrix: this R-package construct autotetraploid and diploid relationship matrices out of pedigree information accounting for double reduction. ClusterCall: this R-package uses bi-parental populations to train tetraploid SNP dosage calling. III-2. Training of Breeders. Catherine Cellon has been trained in advanced methods to perform selection in blueberries using REML/BLUP analysis. Ms. Cellon finished her MS and was hired in as a vegetable breeder. Mehul Bhakta, post-doc, was trained in bioinformatics and breeding of autotetraploid species. Dr. Bhakta finished his training and was hired as a crop breeder. Schuyler Smith began his PhD training in Plant Breeding and Plant Genetics at UW-Madison in June 2015.

Publications

  • Type: Conference Papers and Presentations Status: Published Year Published: 2015 Citation: Amadeu R (u), C. Cellon, A. Garcia, J. Olmstead, P. Munoz. 2015. AGHmatrix: R package to compute and analyze relationship matrices for diploid and autotetraploid species. 61th Congreso Brasileiro de Genetica. September 8-11, Sao Paulo, Brazil. Poster Presentation
  • Type: Conference Papers and Presentations Status: Published Year Published: 2015 Citation: Cellon C, R. Amadeu, M. Kirst, P. Munoz and J. Olmstead. 2015. Establishing genome-wide selection for Vaccinium corybosum. National Association of Plant Breeding (NAPB) July 28-30 2015, Pullman, Washington State, USA.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2016 Citation: Schmitz Carley C, Palta J, Coombs J, Douches DS, Endelman JB (2016) Automated tetraploid genotype calling by hierarchical clustering. Potato Association of America Annual Meeting, July 31Aug 4, Grand Rapids, MI.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2016 Citation: Smith SD, Endelman JB (2016) Development and application of a bioinformatics pipeline for genotyping-by-sequencing of autotetraploid potato. Annual Meeting of the National Association of Plant Breeders, Aug 1518, Raleigh, NC.
  • Type: Journal Articles Status: Published Year Published: 2016 Citation: Amadeu R., C. Cellon, J. Olmstead, A. Garcia, M. Resende, P. Munoz. 2016. AGHmatrix: R package to construct relationship matrices for autotetraploid and diploid species, a Blueberry Example. The Plant Genome 9(3). doi:10.3835/plantgenome2016.01.0009
  • Type: Journal Articles Status: Submitted Year Published: 2016 Citation: Schmitz Carley CA, Coombs JJ, Douches DS, Bethke PC, Palta JP, Novy RG, Endelman JB (submitted) Automated tetraploid genotype calling by hierarchical clustering.
  • Type: Theses/Dissertations Status: Published Year Published: 2015 Citation: Cellon C. 2015. Estimation of genetic parameters of economically important traits in southern highbush blueberries. MS thesis. University of Florida. Horticultural Science Department. Gainesville, FL. 111p.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2016 Citation: Munoz P, Endelman J, and Olmstead J. 2016. Improving Breeding Efficiency in Autotetraploids with genome-wide prediction. Plant and Animal Genome XXIV (PAG) January 08-13 2016, San Diego, California, USA.


Progress 09/01/14 to 08/31/15

Outputs
Target Audience:Plant breeders, plant geneticist and graduate students working in autotetraploid species. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Catherine Cellon has been trained in advanced methods to perform selection in blueberries using REML/BLUP analysis. Mehul Bhakta, post-doc, is being trained in bioinformatics of autotetraploid species. Schuyler Smith began his PhD training in Plant Breeding and Plant Genetics at UW-Madison in June 2015. We successfully held the 2015 version of the training workshop at the University of Florida. Project results to date were summarized by PI P. Munoz at the workshop. Personnel Rodrigo Amadeu, Undergraduate at UF under Munoz supervision, worked in creating the R package to estimate the relationship matrices in autotetraploid species. Catherine Cellon, MSc student at UF under Olmstead supervision, worked in the baseline establishment for blueberry during fall 2015. Mehul Bhakta, Post doctoral associate at UF under Munoz supervision, was hired to work in the bioinformatics for alfalfa and blueberry fall 2015. Schuyler Smith, PhD student at UW under the supervision of Endelman, was recruited in summer 2015 to work on the potato populations and dosage calling. How have the results been disseminated to communities of interest?We successfully held the 2015 version of the training workshop at the University of Florida. This workshop is streamed online to more than 500 people around the globe of more than 50 countries and more than 80 people attending in person. Project results to date were summarized by PI P. Munoz at the workshop. We have also presented a poster in the Plant and Animal Genetics meeting in San Diego January 2015. See below: Olmstead J, Cellon C, Amadeu R, Munoz P. 2015. Toward Genomic Selection in Blueberry. Poster presented at PAG 2015. What do you plan to do during the next reporting period to accomplish the goals?During next reporting period we will keep working as scheduled and release 2 r-packages and 2 scientific publications. We will test the method proposed in the proposal.

Impacts
What was accomplished under these goals? Alfalfa, potato, and blueberry are major crops for the U.S. agricultural economy, with a combined farm-gate value of $15 billion in 2012. Despite their differences, these three crops are autotetraploids, meaning that four homologous chromosomes are present in each linkage group. The complex inheritance of autotetraploids limits the rate of genetic improvement as compared with diploids. While the use of genomic information has revolutionized diploid breeding, these successes are not transferable to autotetraploids. The primary goal of this proposal is to accelerate autotetraploid breeding by developing the capacity to predict complex traits with genomic information. To reach this goal we will 1) Optimize the use of next-generation sequencing to accurately call the number of homologous alleles (dosage) present at a given gene/loci; 2) Develop prediction models that incorporate genomic information with dosage, and; 3) Create user friendly softwares that incorporate these advances and train breeders in their use. This project is a unique collaboration among plant breeders from different species with expertise in genomics, who are united by the goal of improving the efficiency of variety development in autotetraploid crops. We expect then to benefit all alfalfa, potato, and blueberry breeding programs through the development and release of software and tutorials in newly created methods. AIM I I-1. Genotyping POTATO: For the three biparental potato populations developed for this project, DNA is being extracted and will be sent for genotyping in Oct 2015. Using other funding, a single plate of 95 elite tetraploid breeding lines was genotyped by GBS at Cornell at 48-plex. The PhD student hired for this project in June 2015 has begun analyzing this GBS data in preparation for the analysis of the marker data for the biparental populations. ALFALFA: Alfalfa genotyping will be ready by spring 2016. BLUEBERRY: Genotyping of blueberry will be finished by September 2015 for the training population. A biparental population including 96 individuals were genotyped with GBS data previously and was not part of the project but will be used for comparisons. NOTE: At our first meeting held at the 2015 Plant and Animal Genome Meeting, we discussed whether to increase the average sequence depth to 50X instead of 40X as originally planned for blueberry. Considering the steep rise in the call error rate below 40X (Figure 1 of the proposal), Olmstead agreed to either reduce the number of total probes or individuals in an attempt to reach an average of 50X depth. I-2. Dosage Calling In 2014, Endelman developed software to make diploid SNP calls from NGS read counts in variant call format (VCF). Endelman developed an R package "ClusterCall" to significantly improve tetraploid dosage for autotetraploids using the 12K potato Infinium array. The ability to call dosage in autotetraploids will be implemented in fall 2015 and applied to the NGS data as it emerges. I-3. Marker Imputation on Tetraploid NGS data Using diploid data for exploratory studies, Endelman has found that linear discriminant analysis (LDA) performs very well as a new method for marker imputation. The accuracy of this method will be tested on the tetraploid marker data in FY16. I-4. Validation No work completed in 2014-2015. AIM II II-1. Phenotyping Populations Phenotype data were collected for the blueberry training population in 2014 and a second year is planned for 2015. The traits evaluated are the primary selection attributes in the current phenotypic recurrent selection program (flower bud density, yield rating, pedicel scar diameter, and fruit color, diameter, firmness, soluble solids, titratable acidity, and weight. The potato populations were phenotyped using an augmented design in 2014, with data collected for vine maturity, total yield, specific gravity, and fry color. Field trials were planted in Spring 2015 using larger (12-plant) plots, and phenotyping will continue through Fall 2015 with the assistance of a newly hired PhD student. Alfalfa phenotypes for parents have been collected for flowering, dry matter yield, disease, pest and forage quality. Crosses to generate the training population are in progress and will be collected during 2016. II-2. Baseline We reconstructed the whole pedigree back to 1908 in blueberry. We created an R package to calculate the numerator relationship matrix for autotetraploids using different double reductions levels. These data were fit in ASReml using the phenotypic data for blueberry described above. For most traits, the best-fit model contained no double reduction, while a value of 0.16 was found for fruit firmness. This analysis is establishing the baseline for blueberry using phenotypic-BLUP. No work yet in potato or alfalfa. II-3. GWS with Dosage No work planned or executed this year. II-4. Effect of Imputation on GWS No work planned or executed this year AIM III III-1. Software Development Two R packages have been developed since the beginning of the project, which will be released in FY16: AGHmatrix: this R-package construct autotetraploid and diploid relationship matrices out of pedigree information accounting for double reduction. ClusterCall: this R-package uses bi-parental populations to train tetraploid SNP dosage calling. III-2. Training of Breeders. Catherine Cellon has been trained in advanced methods to perform selection in blueberries using REML/BLUP analysis. Mehul Bhakta, post-doc, is being trained in bioinformatics of autotetraploid species. Schuyler Smith began his PhD training in Plant Breeding and Plant Genetics at UW-Madison in June 2015. We successfully held the 2015 version of the training workshop at the University of Florida. Project results to date were summarized by PI P. Munoz at the workshop.

Publications

  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2015 Citation: Olmstead J, Cellon C, Amadeu R, Munoz P. 2015. Toward Genomic Selection in Blueberry. Poster presented at PAG 2015.