Construction of a high density genetic map of pineapple for genome sequencing and marker-assisted selection

CONSTRUCTION OF A HIGH DENSITY GENETIC MAP OF PINEAPPLE FOR GENOME SEQUENCING AND MARKER-ASSISTED SELECTION

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

TERMINATED

Funding Source

SPECIAL GRANT

Reporting Frequency

Annual

Accession No.

0219176

Grant No.

2009-34135-20097

Project No.

HAW01825-09G

Proposal No.

2009-04965

Multistate No.

(N/A)

Program Code

Project Start Date

Sep 1, 2009

Project End Date

Aug 31, 2012

Grant Year

2009

Project Director
Paull, R. E.

Recipient Organization
UNIV OF HAWAII
3190 MAILE WAY
HONOLULU,HI 96822

Performing Department
Tropical Plant & Soil Science

Non Technical Summary
Pineapple is the No. 1 fruit crop in Hawaii. It is the third most important tropical fruits in world production after banana and citrus. However, the genetic and genomic resources for pineapple improvement are very limited. The only genetic map available was constructed using a combination of three types of markers, RAPDs, AFLPs, and inter simple sequence repeats (ISSRs). This map consists of 30 linkage groups and 157 markers. Due to the limited number of markers and the small segregating population used for mapping, this map covers only about 31% of the Ananas comosus genome. We propose constructing a pineapple high-density genetic map as a tool for pineapple improvement.

Animal Health Component

(N/A)

Research Effort Categories

Basic

50%

Applied

25%

Developmental

25%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
201	1020	1040	100%

Knowledge Area
201 - Plant Genome, Genetics, and Genetic Mechanisms;

Subject Of Investigation
1020 - Pineapple;

Field Of Science
1040 - Molecular biology;

Keywords

Goals / Objectives
Our objectives are to: (1) Develop 5,000 microsatellite markers by mining the pineapple genomic sequences generated using the next generation sequencing technology and the published EST database; (2) Map 1,000 microsatellite markers to construct a high-density genetic map of pineapple, about two markers per Mb; and (3) Select a core set of evenly distributed microsatellite markers to fine map the agronomically important trait of leaf margin spines. The genome sequence generated by this proposed project will significantly advance our understanding of the genome organization of pineapple, will simplify the process of isolation of homologous genes of interest, and will develop a better understanding of plant evolution. The high density genetic map constructed by the proposed project will have profound impact on pineapple improvement through better understanding of relevant biology and direct application of genetic and genomic tools in breeding programs. Results of the proposed research will significantly advance the development of genomic tools and knowledge for pineapple improvement. The microsatellite markers will be shared with pineapple researchers worldwide and the genetic maps generated from different mapping populations will be linked together by a common set of microsatellite markers. The genetic markers associated with the leaf margin spine will be directly applied in pineapple breeding program. The genetic location and markers associated with the leaf margin spine will provide essential information for cloning the major gene controlling the leaf margin spine in pineapple. The genetic and genomic tools developed in this project will serve as a basis for future grant proposals to conduct fundamental research on CAM pathway photosynthesis related to the effects of global climate change on plant productivity.

Project Methods
We chose Smooth Cayenne cultivar F153 for genome sequencing. This cultivar was used for pineapple BAC library construction. We will isolate nuclei from young pineapple leaf tissue and then release nuclei DNA from nuclei to minimize the organelle genome DNA contamination. The purified nuclei DNA will be sequenced using 454 FLX pyrosequencing technology. One 454 run will generate about 1x of pineapple genome sequences. The reads will be assembled using Newbler software and the resulting contigs will be used in BLAST analysis to identify the sequences containing genes of interest. The SSR Finder tool will be used to locate SSRs, design primers surrounding SSR sequences, and remove the redundant primers. We will only target hypervariable class II SSR repeats (≥20bp). The repeat region and surrounding sequences (about 150 bases to each side) will be extracted and used in Primer 3 (Whitehead Institute, Cambridge, MA) for primer design. One segregating pseudo-testcross (true F1) population has been made by performing reciprocal crosses between F153 (A. comosus, 2n=2x=50) and HANA64 (A. bracteatus, 2n=2x=50). Newly designed SSR markers will be tested for polymorphism using the two parents (F153 and HANA64) and three F1s from each of the mapping populations. Polymorphic markers will be selected for high density genetic mapping. The PCR reactions will be performed using ABI PCR system 9700 thermocycler. PCR products will be run on Super 120 system (www.6mgel.com) with 4 % super fine resolution (SFR) agarose gels (Amresco) and stained with Ethidium Bromide. The high quality polymorphic SSRs from the survey will be used to test the whole mapping populations. The linkage map will be constructed using the JoinMap (version 3.0) program. The F1 population derived from F153 (A. comosus, 2n=2x=50) and HANA64 (A. bracteatus, 2n=2x=50) is ideal for studying the controlling leaf margin spine in pineapple. The established Smooth Cayenne cultivar, F153, has smooth leaves, while, HANA 64 exhibit large, completely spiny leaves. The leaf margin spine in this F1 population segregates as a qualitative trait. We will be able to map the major gene directly on the high density SSR linkage map. Additional SSR markers around this region will then be used for fine mapping of the target gene using a larger population of 300 plants. The physical map of the targeted region will be constructed. The BAC clones in the targeted region will be end-sequenced. The BAC end sequences will be searched into the whole genome sequence generated by 454 FLX pyrosequencing technology. Fine mapping will be carried out to narrow down the region containing the gene controlling leaf margin spine in pineapple.

Progress 09/01/09 to 08/31/12

Outputs
OUTPUTS: As proposed, we sequenced the two parental genomes, F153 (A. comosus, 2n=2x=50) and HANA64 (A. bracteatus, 2n=2x=50), to develop microsatellite markers and to obtain the genomic sequence for pineapple genome structure study and comparative genomic analysis. The sequence reads were assembled into contigs and the quality of genome assemblies were assessed for the completeness by estimating the coverage of ultra-conserved eukaryotic. From the assembled contigs, we designed 8,542 pairs of SSR primers. After excluding redundant primers and duplicated primers we designed from EST sequences, we obtained 7,967 pairs of unique primers, about 3,000 more primers than we proposed. Novel pineapple specific repeat were identified and a customized library of repeat elements compiled. Comprehensive repeat analysis identified 34.8% and 26.63% of assembled genome covered by repeats for F153 and HANA 64, respectively. The most abundant repeats in both genomes were unclassified indicating these novel repeats to be specific to the pineapple genome. Retrotransposon) elements were the major elements with LTR elements being the most abundant. Of these LTRs, gypsy type LTRs occupied 4.78% and 2.62% of the genomes and copia type LTRs occupied 2.32% and 1.69% of F153 and Hana64 respectively. We also identified tandem repeats in F153 and HANA 64 genomes, respectively We used the new technology, RAD-Seq, to construct a high-density genetic map of pineapple. Two genetic maps were constructed, one for each of the parental genomes. A linkage map of F153 was constructed using 973 RAD-Seq markers. This map consisted of 29 linkage groups and spanned a total length of approximately 1630 cM, with an average interval of 1.68 cM. Another linkage map, composed of 2048 RAD-Seq markers in 28 linkage groups and covered a total length of 1373.9 cM, was constructed for HANA 64 genome. Larger number of markers were detected in HANA 64 than F153, suggesting a higher heterozygosity in HANA 64 genome. To map the trait of leaf margin spine, we created a F2 mapping population with 492 individuals. The F2 mapping population was used to map the loci of major genes controlling the leaf margin spine. PARTICIPANTS: Not relevant to this project. TARGET AUDIENCES: Not relevant to this project. PROJECT MODIFICATIONS: Not relevant to this project.

Impacts
Pineapple is the No. 1 fruit crop in Hawaii and the third most important commercial tropical fruit crop in world production after banana and citrus. Besides the commercial value of its fruits, its leaves are the most promising source for nano-cellulose materials that may be used as a plastic alternative in the future. As a crassulacean acid metabolism (CAM) plant species, pineapple is the best representative of this under-explored node of angiosperms. However, very little molecular genetics and genomics research has been carried out on this crop. The sequence data we have generated is the largest data set for pineapple. And the genetic maps we have constructed are the most saturated genetic maps of pineapple so far. The genome sequence generated by this proposed project will significantly advance our understanding of the genome organization of pineapple, will simplify the process of isolation of homologous genes of interest, and will develop a better understanding of plant evolution. The high density genetic map constructed by the proposed project will have profound impact on pineapple improvement through better understanding of relevant biology and direct application of genetic and genomic tools in breeding programs. Results of the proposed research will significantly advance the development of genomic tools and knowledge for pineapple improvement.

Publications

No publications reported this period

Progress 09/01/10 to 08/31/11

Outputs
OUTPUTS: As we proposed, we sequenced the two parent genomes, F153 (A. comosus, 2n=2x=50) and HANA64 (A. bracteatus, 2n=2x=50), using Roche 454 1 kb beta test kit to develop microsatellite markers and to obtain the genomic sequence for pineapple genome structure study and comparative genomic analysis. The average read length was 453 for F153 and 497 for HANA64 with 800.2Mbp and 721.7 Mbp read, respectively. From the assembled contigs, we designed 8,542 pairs of SSR primers. After excluding redundant primers and duplicated primers we designed from EST sequences, we obtained 7,967 pairs of unique primers, which are about 3,000 more primers than we proposed. Besides the genomic sequence, we also generated EST sequences for pineapple using Illumina RNA-seq. The average read length was 115 and we read 10.9 Mbp for F153 and 25Mbp for HANA 63. We are annotating the genomic sequences. Comparative genomic analysis and heterozygosity study of pineapple genome will be carried out soon. Using next generation DNA sequencing, a recent method called restriction site associated DNA sequencing (RAD-seq) allows the detection of thousands of sequence-based markers for a reference as well as a non-reference genome at reasonable costs. We are using the new technology (RAD-seq) to construct a high-density genetic map of pineapple with 1000-2000 sequence-based markers (SNPs, InDels, and SSRs). The RAD-seq libraries of two parents and 54 F1 individuals are currently under construction. To map the trait of leaf margin spine, we created a F2 mapping population with 492 individuals. The F2 mapping population will be genotyped using RAD-seq to map the loci of major genes controlling the leaf margin spine. PARTICIPANTS: Not relevant to this project. TARGET AUDIENCES: Not relevant to this project. PROJECT MODIFICATIONS: Not relevant to this project.

Impacts
Pineapple is the No. 1 fruit crop in Hawaii and the third most important commercial tropical fruit crop in world production after banana and citrus. Besides the commercial value of its fruits, its leaves are the most promising source for nano-cellulose materials that may be used as a plastic alternative in the future. As a crassulacean acid metabolism (CAM) plant species, pineapple is the best representative of this under-explored node of angiosperms. However, very little molecular genetics and genomics research has been carried out on this crop. The sequence data we have generated is the largest data set for pineapple. The genome sequence generated by this proposed project will significantly advance our understanding of the genome organization of pineapple, will simplify the process of isolation of homologous genes of interest, and will develop a better understanding of plant evolution. The high density genetic map constructed by the proposed project will have profound impact on pineapple improvement through better understanding of relevant biology and direct application of genetic and genomic tools in breeding programs. Results of the proposed research will significantly advance the development of genomic tools and knowledge for pineapple improvement.

Publications

No publications reported this period

Progress 09/01/09 to 08/31/10

Outputs
OUTPUTS: We have proposed 3 objectives for this project and planned to complete objective 1 and part of objective 2 in Year 1. We have generated about 800 Mb sequence of A. comosus var. F153 genome using Roche 454 1 kb beta test kit. The new 454 1 kb beta test kit produce longer reads than the Titanium kit, but the cost is the same as the Titanium kit. The reads were assembled using CLC Genomics Workbench. The total length of assembled contigs is 147,685,460 bp and the average length of each contig is 860 bp. A total of 197 contigs is above 10 kb and the longest contig is 74 kb. Based on the genome sizes of A. comosus var. comosus and A. comosus var. bracteatus estimated at 526 and 444 Mbp, respectively by Arumuganathan and Earle (1991), the coverage of assembled contigs would be 28% ~ 33% of the pineapple genome. The total length of singletons is 93,398,241. Including singletons, the coverage would be 46% ~ 54% of the pineapple genome. From the assembled contigs, we designed 8,542 pairs of SSR primers. After excluding redundant primers and duplicated primers we designed from EST sequences, we obtained 7,967 pairs of unique primers, which is about 3,000 more primers than we proposed. Besides the genomic sequence, we generated EST sequences for pineapple using Illumina Solexa 115 bp paired ends. We have assembled the F153 ESTs using Velvet, Edena, and SOAPdenovo. A total of 3776 contigs were obtained with an average length of 883 bp and a total bases of 3,295,921 bp. The longest contig is 5,119 bp and smallest contig is 500 bp. From the assembled EST contigs, we designed 83 pairs of SSR primers. We have finished the genomic DNA isolation for 34 individuals of mapping population. Two parents and 6 F1 individuals were selected for SSR polymorphism survey. We have finished survey for 83 primer pairs. Out of the 83 primer pairs, 10 of them showed polymorphism. PARTICIPANTS: Not relevant to this project. TARGET AUDIENCES: Not relevant to this project. PROJECT MODIFICATIONS: Not relevant to this project.

Impacts
Pineapple is one of the most important tropical fruit crops. Pineapple is the No. 1 fruit crop in Hawaii and the third most important commercial tropical fruit crop in world production after banana and citrus. However, very little molecular genetics and genomics research has been conducted on this crop. The sequence data we have generated is the largest data set for pineapple. The genome sequence generated by this proposed project will significantly advance our understanding of the genome organization of pineapple, will simplify the process of isolation of homologous genes of interest, and will develop a better understanding of plant evolution. The high density genetic map constructed by the proposed project will have profound impact on pineapple improvement through better understanding of relevant biology and direct application of genetic and genomic tools in breeding programs. Results of the proposed research will significantly advance the development of genomic tools and knowledge for pineapple improvement.

Publications

No publications reported this period