Source: AGRICULTURAL RESEARCH SERVICE submitted to
MAIZE GENOME DATABASE
Sponsoring Institution
Agricultural Research Service/USDA
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0409149
Grant No.
(N/A)
Project No.
3622-21000-026-00D
Proposal No.
(N/A)
Multistate No.
(N/A)
Program Code
(N/A)
Project Start Date
Oct 1, 2004
Project End Date
Nov 2, 2007
Grant Year
(N/A)
Project Director
SCHAEFFER M L
Recipient Organization
AGRICULTURAL RESEARCH SERVICE
(N/A)
COLUMBIA,MO 65211
Performing Department
(N/A)
Non Technical Summary
(N/A)
Animal Health Component
(N/A)
Research Effort Categories
Basic
50%
Applied
30%
Developmental
20%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
20115101080100%
Knowledge Area
201 - Plant Genome, Genetics, and Genetic Mechanisms;

Subject Of Investigation
1510 - Corn;

Field Of Science
1080 - Genetics;
Goals / Objectives
Provide bioinformatics for users in support of MaizeGDB from literature and genome projects. Document gentic and physical map data; genes; gene products; clones; and traits. Include new data types, e.g., physical maps, BACs, SNPs, and ESTs. Keep data current and authoritative by curation (assembly, compilation, integration, and amendment) and by delivery tools. Offer on the World Wide Web. Develop interoperability with gene expression databases, and with other species for comparative maps, DNA and protein functions, and traits.
Project Methods
Integrate data from genome projects and the scientific literature on genes; genome segments; gene products; phenotypes; trait analyses and inheritance; and data that document genetic and integrated maps. Coact with projects to facilitate accurate delivery and integration of BAC, EST, SNP, map, QTL, and marker data. Prepare for new datatypes anticipated with new initiatives. Design and present BAC contigs, physical maps, EST assemblies, microarray gene expression, and proteomics. Attribute all data to source authority. Link sequence records from external databases (GenBank, TIGR, Gramene, and SwissProt) by automatic updates. Continue links to MedLine for abstracts and full-text journal articles. Update reciprocal links with GRIN for germplasm. Link cross-species markers used in maize to relevant genome databases, such as GrainGenes, Gramene, and the International Rice Genome Project database.

Progress 10/01/04 to 11/02/07

Outputs
Progress Report Objectives (from AD-416) Provide bioinformatics for users in support of MaizeGDB from literature and genome projects. Document gentic and physical map data; genes; gene products; clones; and traits. Include new data types, e.g., physical maps, BACs, SNPs, and ESTs. Keep data current and authoritative by curation (assembly, compilation, integration, and amendment) and by delivery tools. Offer on the World Wide Web. Develop interoperability with gene expression databases, and with other species for comparative maps, DNA and protein functions, and traits. Approach (from AD-416) Integrate data from genome projects and the scientific literature on genes; genome segments; gene products; phenotypes; trait analyses and inheritance; and data that document genetic and integrated maps. Coact with projects to facilitate accurate delivery and integration of BAC, EST, SNP, map, QTL, and marker data. Prepare for new datatypes anticipated with new initiatives. Design and present BAC contigs, physical maps, EST assemblies, microarray gene expression, and proteomics. Attribute all data to source authority. Link sequence records from external databases (GenBank, TIGR, Gramene, and SwissProt) by automatic updates. Continue links to MedLine for abstracts and full-text journal articles. Update reciprocal links with GRIN for germplasm. Link cross-species markers used in maize to relevant genome databases, such as GrainGenes, Gramene, and the International Rice Genome Project database. Accomplishments Accomplishment: The initial SNP diversity maps from the Maize Diversity Project were integrated into MaizeGDB. The documentation and characterization of maize diversity at the sequence and phenotypic level is underway by researchers at several institutions in the US and involves new methodology, in particular high- throughput SNP genotyping. A large sequence dataset is anticipated, and is currently being released at GenBank and a project World Wide Web site. We have integrated into MaizeGDB a first release of the maps, called IBM SNP 2007, together with alleles for some 26 lines. Systematic integration of the data into MaizeGDB will provide a single portal for these and other genetic mapping information for maize, including agronomic trait phenotypes. Integration of these data at a single location facilitates technology transfer to basic researchers, and to plant breeders seeking favorable genetic variations in diverse germplasm for improving US crop performance, yield and quality. This accomplishment addresses both National Program 301 Component 2. Crop Information, Genomics and Genetic Analysis, Problem Statement 2A and suppports NP301 component 3. Genetic Improvement of Crops, Problem Statement 3B. Accomplishment. All Plant Ontology Terms are now represented in MaizeGDB. A Consortium has developed Plant Ontology descriptors terms and database identifiers for anatomy, growth and developmental stages for higher plants, with an initial focus on maize, rice and Arabidopsis. The goal is to facilitate functional and genome comparisons among all higher plants, especially at the level of database interoperability. In prior years, only the subset required to support current phenotype annotation was mapped to maize terms, and integrated into MaizeGDB. This year all terms were integrated into MaizeGDB and mapping files provided to the Plant Ontology website maintained at Cold Spring Harbor, NY. If used by major plant genome databases, these ontologies will aid in integrating knowledge gained in all plant species, including model species, such as Arabidopsis and thereby aid hypotheses-based research into US crop improvement. This accomplishment addresses both National Program 301 Component 2. Crop Information, Genomics and Genetic Analysis, Problem Statement 2A. Accomplishment. New 2006 public map data on the IBM population have been computed onto a single frame called cIBM2006 and integrated into MaizeGDB. A high resolution public mapping population, inter-mated B73 x Mo17 maize lines, together with raw data accessible at MaizeGDB, is being used to map both large groups of genes, and also single/few genes, thereby creating a small number of maps each year for each research group. The new community map, cIBM2006 adds 1198 loci, complete with links to full documentation for the maps. This product provides a single map frame, consistent with the higher resolution IBM2 maps, and thereby facilitates lookups of loci and access to ordered mapscore data. Integrating the IBM map also keeps current MaizeGDB as a primary repository for maize mapping data. This product is being maintained in response to stakeholder requests from US plant breeders and researchers, and it supports rapid computation of the IBM Neighbors map (see below). This accomplishment addresses National Program 301 Component 2. Crop Information, Genomics and Genetic Analysis, Problem Statement 2A and supports NP301 component 3. Genetic Improvement of Crops, Problem Statement 3B. Accomplishment. A new Consensus IBM Neighbors Map has been computed and integrated into MaizeGDB. Numerous genetic maps exist, and each year more have been created, each with distinct coordinates and loci. To find all nearby loci and probes for a given chromosome region requires tedious, map-by-map lookups. A few years ago, this project developed an algorithm that orders onto a single coordinate system all mapped loci and with consideration of statistical support on the cooperator provided maps. Using this algorithm, this year some 1800 new loci have been added to the IBM2 neighbors map. The Neighbors map is used by several external databases (Panzea, Gramene, GenBank) and various research projects as a primary map for maize map information. It provides a genetic platform for candidate gene discovery towards improvement of the US corn crop, both in yield and value added traits. This accomplishment addresses both National Program 301 Component 2. Crop Information, Genomics and Genetic Analysis, Problem Statement 2A and supports NP301 component 3. Genetic Improvement of Crops, Problem Statement 3B. Technology Transfer Number of Web Sites managed: 2

Impacts
(N/A)

Publications

  • Coe, Jr, E.H., Schaeffer, M.L. 2005. Genetic, Physical, Maps, and Database Resources for Maize. Maydica. 50:285-303.
  • Coe, Jr, E.H., Schaeffer, M.L. 2006. Uncaging Mutants: Moving From Menageries to Menages. Maydica. 51(2):263-267.
  • Lawrence, C.J., Schaeffer, M.L., Seigfried, T.E., Campbell, D.A., Harper, L.C. 2007. MaizeGDB's New Data Types, Resources, and Activities. Nucleic Acids Research. 35:D895-D900.
  • Pujar, A., Jaiswal, P., Kellogg, E., Ilic, K., Vincent, L., Avraham, S., Stevens, P., Zapata, F., Reiser, L., Rhee, S., Sachs, M.M., Schaeffer, M.L. , Stein, L., Ware, D., McCouch, S. 2006. Whole-plant growth stage ontology for angiosperms and its application in plant biology. Plant Physiology. 142:414-428.
  • Ilic, K., Kellogg, E., Jaiswal, P., Zapata, F., Stevens, P., Vincent, L., Avraham, S., Reiser, L., Pujar, A., Sachs, M.M., Whitman, N., Mccouch, S., Schaeffer, M.L., Ware, D., Stein, L., Rhee, S. 2007. Plant Structure Ontology, Unified Vocabulary of Anatomy and Morphology of a Flowering Plant. Plant Physiology. 143:587-599.


Progress 10/01/05 to 09/30/06

Outputs
Progress Report 1. What major problem or issue is being resolved and how are you resolving it (summarize project aims and objectives)? How serious is the problem? Why does it matter? This project is aligned with NP 301, PPlant Genetic Resources, Genomics and Genetic Improvement. Limits are placed on obtaining optimum crop efficiency, productivity, and stability, or on deriving maize with new, valuable properties, until great depth of genetic knowledge and molecular-genetic tools for maize is made available, as well as for other crops. The tools and insights from genetics must yet fulfill the promise of new technologies to join fully into plant breeding and agronomic practices. For the scientist conducting research, or the student of today, information overload and retrospective amnesia plague the process. Information about genetics, molecular biology, biochemistry, biotechnology, growth, and development for maize is widely distributed in published materials, exceeding 1,300 research papers per year and increasing rapidly. This problem is magnified by the accumulation of scattered data sites that lack systematic form or informatics tools for linking to other resources of information. Much information is only available in electronic form, and may be scattered in various project databases, or laboratory notebooks. It includes nucleotide sequences and their polymorphisms; physical map components; physical and genetic marker details; and raw mapping data. These invaluable current and past resources of scientific information, combined with scientist-to-scientist communication in scientific meetings and other means of communication, are the engine of research advance. This project is directed toward assembling scientific information, currently and retrospectively, in systematic form into the Maize Genome Database. If this is not done, research on maize will be, in some cases, unnecessarily redundant or fail to encompass critical areas of knowledge. Often research has been reported or evaluated by scientists and other users without knowledge of past or present information. The result is a generally uncoordinated and uneven advance in knowledge and techniques, an inefficient expenditure of resources, and a burdening of the research scientist with ad hoc systematizations that are only heuristic and useless to others in the field. Integrating systematized data into MaizeGDB, with interfaces that permit creative data mining, is critical to maximize the translation of the maize genome sequence into improvements in the crop. 2. List by year the currently approved milestones (indicators of research progress) A Project Plan for MaizeGDB at Columbia has not been submitted to OSQR, pending establishment of a new CRIS at Ames, Iowa to support the database. The objectives and milestones are based on the 48 month, FY 2003 proposed Project Plan that was submitted to the MWA 9/05/03. Objective 1: Integrate into MaizeGDB systematized experimental genetic information to include genetic maps, with attributes and documentation; allele diversity; and quantitative trait locus (QTL) characterization. Milestone 1. Integrate new genetic maps. FY2004: Automate and enter data from community IBM and related IBM maps. FY2004-05: Compile a high resolution gene map. FY2005-07: Update the community IBM map, and enter all other critical genetic maps. Milestone 2. Integrate molecular diversity data. FY2004: Backlog of SSR allele date entered. FY2005-07: Develop automated data entry strategies from CIMMYT and Gramene. Milestone 3. Refine current data. FY2004-07: Continuous. Milestone 4. Community data entry, and the published literature. FY2004: Engage community. FY2005-06: Implement new or refined strategies. FY2006: Assess literature curation priority needs, with community inputs. FY2004- 07: Database staff curate at a constant, moderate level. Milestone 5. QTL documentation, a community curation activity. FY2004: Evaluate current schema and data curation priorities. FY2006-07: Implement data curation forms for QTL. Automate load of large datasets from CIMMYT, Ed Buckler and other researchers. Milestone 6. Continuous collaboration with community on priorities and strategies. FY2004-07: Continuous. Objective 2: Design and maintain complex syntheses of MaizeGDB map and gene data critical to research. Include consensus genetic and QTL maps. Milestone 1. Maintain updated consensus genetic maps, including bins maps and a high-resolution IBM based genetic map. FY2004-06: Each year, update consensus genetic maps: IBM Neighbors and bins. FY2004-06: Analyze recombination events in different populations. Milestone 2. Consensus QTL maps and other tools. FY2004: Harmonize trait descriptor and QTL symbols. FY2004-05: Collaborate with community regarding QTL consensus map tools. FY2006-07: Design and implement QTL analyses tools, working closely with the research community who are developing these tools. Objective 3: Improve interoperability of germplasm, genetic, and genome sequence. Milestone 1. Plant Ontology. FY2004-05: Develop and refine ontologies for maize anatomy and development. Milestone 2. Links. FY2004-07: Update record to record links to major databases. Milestone 3. Complex queries, defined as queries that involve both computation and multiple databases. FY2004: Simplify current prototype, develop graphical interface. FY2005: Test prototype with other genome databases; refine graphical interface. FY2006: Revise prototype. FY2007: Retest Prototype. 4a List the single most significant research accomplishment during FY 2006. High Density Consensus Genetic Map of Maize. This accomplishment aligns with National Program 301 Component "Crop Informatics, Genomics and Genetic Analyses." A consensus genetic map, with over 33,000 gene loci, has been computationally assembled by the Plant Genetics Research Unit at Columbia, MO and combines several thousand loci placed onto 100 documented genetic maps with over 27,000 gene loci, placed on 417 genome islands of the physical map provided by the University of Arizona. This product provides a single framework at MaizedGDB to expedite creative mining for candidate genes, which might otherwise be lost. Candidate gene discovery is important to breeding maize for enhanced productivity and value-added traits. Because maize is a major US crop, this result has high impact on the food and fuel security of the US. 4b List other significant research accomplishment(s), if any. Community Quantitative Trait Locus Curation Module. This accomplishment aligns with National Program 301 Component Crop Informatics, Genomics and Genetic Analyses. It is very difficult to find all public data about the inheritance of a trait or group of traits; and the raw data, that document the experiment, need to be requested from individual researchers. A curation tool was developed to provide high quality integration of trait data into MaizeGDB, both by professional curators, and community experts generating the data. This tool re-creates and improves on the curation tool associated with the legacy MaizeDB and lost with the recent move to Ames, IA. MaizeGDB staff at Columbia, MO designed and tested the utility; staff at Ames, IA performed the programming and deployment. Improved access to information about genome regions and superior germplasms that contribute to a trait will result from use of this tool and is important to plant breeders, both of maize and other cereals. Because maize is a major US crop, facilitating breeding for enhanced productivity and quality will have high impact on the food and fuel security of the US. This accomplishment aligns with National Program 301 components "Genome Databases and Bioinformatics" and "Genomic Characterization and Genetic Improvement." 5. Describe the major accomplishments to date and their predicted or actual impact. MaizeDB/MaizeGDB now provides in one site the integration of maize sequence, genetic and physical map information, with (i) documentation such as references and other sources, raw map data, availability of experimental tools such as genetic stocks and probes; (ii) gene function annotation for gene products and agronomic traits, phenotypes; (iii) interactive functions with key external databases; and (iv) public access over the Internet. It is the primary repository for electronic information related to genetic maps and experimental tools referred to above. In addition to integration of primary data, it assembles secondary products, such as the consensus genetic maps currently used in candidate gene discovery and assembly of anchored genome islands for the maize genome sequencing that is now under way. MaizeGDB protocols have proven robust and readily accommodate new data types with minimal programming, for example, they have been successfully used to add gene silencing constructs and plant ontology annotations. Customers include private and public sector researchers engaged in maize and cereal genomics and plant breeding; also teachers of high school and university classes; and other public databases, such as GenBank, Gramene, UniProt and those associated with high-throughput genomics projects. This accomplishment aligns with National Program 301 components "Genome Databases and Bioinformatics" and "Genomic Characterization and Genetic Improvement." 6. What science and/or technologies have been transferred and to whom? When is the science and/or technology likely to become available to the end- user (industry, farmer, other scientists)? What are the constraints, if known, to the adoption and durability of the technology products? Over 30,000 selectable, public chromosome markers have been computationally assembled onto a consensus chromosome map framework and transferred to the public sector, with full documentation, via MaizeGDB and thence to other databases, including Gramene, Panzea and GenBank. End- users have current access; end-users include industry and other scientists who require this information in the discovery of candidate genes and superior alleles for agronomic traits of importance to the farmer. There are not any constraints on adoption or durability of the information for plant breeding and research in the academic, industrial or government sectors.

Impacts
(N/A)

Publications

  • Schaeffer, M.L., Baran, S.B., Lawrence, C.J. 2006. QTL data at maizeGDB: curation and "then some" [abstract]. 48th Annual Maize Genetics Conference Program and Abstracts. p. 231.


Progress 10/01/04 to 09/30/05

Outputs
1. What major problem or issue is being resolved and how are you resolving it (summarize project aims and objectives)? How serious is the problem? What does it matter? Limits are placed on obtaining optimum crop efficiency, productivity, and stability, or on deriving maize with new, valuable properties, until great depth is made available, of genetic knowledge and molecular-genetic tools for maize as well as for other crops. The tools and insights from genetics must yet fulfill the promise of new technologies, to join fully into plant breeding and agronomic practices. For the scientist conducting research, or the student of today, information overload and retrospective amnesia plague the process. Information about genetics, molecular biology, biochemistry, biotechnology, growth, and development for maize is widely distributed in published materials, exceeding 1,300 research papers per year and increasing rapidly. Other information is only available in electronic form and includes nucleotide sequences and their polymorphisms; physical map components; physical and genetic marker details; and raw mapping data. These invaluable current and past resources of scientific information, combined with scientist-to-scientist communication in scientific meetings and other means of communication, are the engine of research advance. This project is directed toward assembling scientific information, currently and retrospectively, in systematic form in the Maize Genome Database. Current research on maize is often unnecessarily redundant or fails to encompass critical areas of needed knowledge. Often research is reported or evaluated by scientists and other users without knowledge of past or present information. The result is generally uncoordinated and uneven advance in knowledge and techniques, expending resources inefficiently, and burdening the research scientist with ad hoc systematizations that are only heuristic and are not useful to others in the field. This problem is magnified by the accumulation of scattered data sites that lack systematic form or informatics tools for linking to other resources of information. 2. List the milestones (indicators of progress) from your Project Plan. A Project Plan for MaizeGDB at Columbia has not been submitted to OSQR. A 48-month duration Project Plan was submitted to the MWA 9/05/03, but not forwarded to NPS because of the pending establishment of a new CRIS at Ames, Iowa, to support the database. A formal request for postponement of review by OSQR was submitted 3/29/04 to the Associate Administrator for Crop Production, Product Value and Safety at the request of MWA. Objectives and milestones following are based on the FY2003 proposed Project Plan. Objective 1: Integrate into MaizeGDB systematized experimental genetic information to include genetic maps, with attributes and documentation; allele diversity; and quantitative trait locus (QTL) characterization. Milestone 1. Integrate new genetic maps. FY2004: Automate and enter data from community IBM and related IBM maps. FY2004-05: Compile a high resolution gene map. FY2005-07: Update the community IBM map, and enter all other critical genetic maps. Milestone 2. Integrate molecular diversity data. FY2004: Backlog of SSR allele date entered. FY2005-07: Develop automated data entry strategies from CIMMYT and Gramene. Milestone 3. Refine current data. FY2004-07: Continuous. Milestone 4. Community data entry, and the published literature. FY2004: Engage community. FY2005-06: Implement new or refined strategies. FY2006: Assess literature curation priority needs, with community inputs. FY2004-07: Database staff curate at a constant, moderate level. Milestone 5. QTL documentation, a community curation activity. FY2004: Evaluate current schema and data curation priorities. FY2006-07: Implement data curation forms for QTL. Automate load of large datasets from CIMMYT, Ed Buckler and other researchers. Milestone 6. Continuous collaboration with community on priorities and strategies. FY2004-07: Continuous. Objective 2: Design and maintain complex syntheses of MaizeGDB map and gene data critical to research. Include consensus genetic and QTL maps. Milestone 1. Maintain updated consensus genetic maps, including bins maps and a high-resolution IBM based genetic map. FY2004-06: Each year, update consensus genetic maps: IBM Neighbors and bins. FY2004-06: Analyze recombination events in different populations. Milestone 2. Consensus QTL maps and other tools. FY2004: Harmonize trait descriptor and QTL symbols. FY2004-05: Collaborate with community regarding QTL consensus map tools. FY2006-07: Design and implement QTL analyses tools, working closely with the research community who are developing these tools. Objective 3: Improve interoperability of germplasm, genetic, and genome sequence. Milestone 1. Plant Ontology. FY2004-05: Develop and refine ontologies for maize anatomy and development. Milestone 2. Links. FY2004-07: Update record to record links to major databases. Milestone 3. Complex queries, defined as queries that involve both computation and multiple databases. FY2004: Simplify current prototype, develop graphical interface. FY2005: Test prototype with other genome databases; refine graphical interface. FY2006: Revise prototype. FY2007: Retest Prototype. 3a List the milestones that were scheduled to be addressed in FY 2005. For each milestone, indicate the status: fully met, substantially met, or not met. If not met, why. 1. Finish high-resolution gene map. Milestone Fully Met 2. Entry of new community IBM mapped loci and all other maps. Milestone Fully Met 3. Backlog of SSR allele data entered (a FY2004 milestone). Milestone Fully Met 4. Import maize SNP data from Gramene and CIMMYT using automated procedures. Milestone Fully Met 5. Implement new strategies, if warranted for entry of data from literature. Milestone Substantially Met 6. Collaboration with community on priorities and strategies. Milestone Fully Met 7. Update consensus genetics maps: IBM Neighbors and bins. Milestone Fully Met 8. Begin to analyze recombination events in different populations. Milestone Substantially Met 9. Collaborate with community to develop QTL consensus map tool. Milestone Fully Met 10. Develop and refine ontologies for maize anatomy and development and traits. Milestone Fully Met 11. Update record-to-record links to major databases. Milestone Substantially Met 12. Design and implement as funding permits. Milestone Substantially Met 3b List the milestones that you expect to address over the next 3 years (FY 2006, 2007, and 2008). What do you expect to accomplish, year by year, over the next 3 years under each milestone? FY 2006 ______________________________________________________________________ Objective 1: Integrate into MaizeGDB systematized experimental genetic information to include genetic maps, with attributes and documentation; allele diversity; and quantitative trait locus (QTL) characterization. 1. Integrate new genetic maps. Milestone - Entry of new community IBM mapped loci and all other public maps. These will include a map set provided by Genoplante, a French consortium. There will be nearly 1500 map points that were selected to refine the sequence-ready physical map. MaizeGDB is the primary repository and intake site for genetic maps and provides it to other genome databases such as Gramene and NCBI (National Center for Biotechnology). These data are critical to researchers engaged in candidate gene discovery, providing facile access at one location to gene discovery tools and documentation. These data are also used to address Objective 2. If the sequenced human and mouse genomes are precedents, we anticipate that a fully sequenced genome for maize will further the interest in high resolution genetic mapping, with a focus on inherited traits, both agronomic and of basic biological interest. 2. Molecular diversity data (SNP, single nucleotide polymorphism). Milestone - Import new data from Gramene and CIMMYT using automated procedures (continuous with year 2005). These data provide gateways from genomic regions of interest to repositories of sequences and germplasm data (GENBANK, GRIN), providing information about polymorphisms required to develop tools, such as SNP assays, for high resolution mapping of agronomic traits. 3. Community data entry and the published literature. Milestone - Assess and meet literature curation priority needs with community inputs. The main objective is to be sure that the top priority objects in the literature are professionally assessed and integrated into MaizeGDB. The peer-reviewed literature reports careful, empirically determined gene functions, which include the molecular basis for agronomic traits. Having this information integrated into a central database undergirds electronic genome annotation efforts, in particular for plant species. The argument that literature curation does not scale is bogus; the level of this sort of research is growing, but not exponentially. Top priority items include sequenced genes, characterized gene products, maps, including QTL maps and mapping probes. 4. QTL documentation, a community curation activity. Milestone - Implement data curation forms for quantitative trait loci (QTLs); Automate load of data sets from CIMMYT, Dr. Buckler, and others; QTLs identify chromosomal regions that affect key agronomic traits. This information is typically summarized in the published literature without the raw data. Making this data available in a systematic manner, linked to mapping tools and sequences, supports development of consensus QTL maps (objective 2); and provides documentation useful to new analysis of the data by researchers. Milestone - Community collaboration on priorities and strategies. This ensures that emerging datatypes are included in MaizeGDB. In addition to the steering committee meetings, it is also useful to meet with curators working with genome databases for other species, such as mouse, fly, Arabidopsis or rice where there is currently a genome sequence so as to prepare better for the current and future needs of our stakeholders and customers. One desired result is for the community to assist in curation of their project data prior to submission to MaizeGDB Objective 2: Design and maintain complex syntheses of MaizeGDB map and gene data critical to research. Include consensus genetic and QTL maps. 1. Maintain updated consensus genetic maps, including bins maps and a high resolution IBM based genetic map. Milestone - Update consensus genetics maps: IBM Neighbors and bins These consensus maps are used to provide rapid access to all the genetic objects that can be mapped to a region, along with tools used to map to that region; they have been used to provide the foundation for alignment of clones along the chromosome, preparatory to sequencing the maize genome. Based on the experiences of other genomes (mouse, human, etc), there will continue to be new genetic mapping data, in particular high resolution ordering of markers near a candidate gene for an important trait or other biological function. Milestone - Increase analysis of recombination events in different populations. This information, stored in a database, will impact deciphering the molecular basis for a trait. Different mapping populations have better resolution in different regions of the genome. For discovery of genes underlying a trait, some populations will be better for defining candidate genes in particular chromosomal regions. 2. Consensus QTL maps and other tools. Milestone - Design and implement QTL analyses tools working closely with the research community. The retrieval of all QTL data in a chromosomal region, based on either category of trait, eg all insect response data points, or trait, e.g. all grain yield data points, has been defined as a primary need of researchers interested in defining the genetic and molecular basis for a QTL, and also those chromosomal regions with high phenotypic variation for a trait. This understanding will drive modern plant breeding, using either transgenic applications, or more traditional marker assisted selections. Objective 3: Improve interoperability of germplasm, genetic, and genome sequence. 1. Plant ontology. Milestone - Develop and refine and integrate ontologies for plant anatomy and development and traits using the plant ontology model; test ontologies with modest level of curation. Databases that share ontologies, will be able to cross-talk so that data can been more readily accessed by customers interested in both the transfer of traits, e.g. from sorghum to maize, or maize to rice, and also in understanding the biological basis of a trait, where it may have been well-studied in one species, but not in others. In addition, these ontologies are required for consensus QTL maps (Objective 2). 2. Links. Milestone - Update record-to-record links to major databases. These direct MaizeGDB users to relevant data where MaizeGDB is not the primary repository. Currently key external databases include GenBank, Gramene (Cold Spring Harbor), AGI (Arizona), GRIN (germplasm), SwissProt (protein functions). Curating these links has proven useful for multi- species databases, as well as larger genome projects in maize and rice. 3. Complex queries. Milestone - Continue to design. This is a cutting edge research activity towards designing the infra- structure for queries that perform user-specified computations, such as sequence similarity searches coupled to comparative genomics, and which are based on data stored in more than one database. This obviates the need to store and update data from other databases that is the current practice. In addition, it expedites transfer of information to customers of multiple databases, reducing the need to spend substantial research time, or acquire computer science and systems administration skills. It is in collaboration with Toni Kazic and William Lawrence at University of Missouri-Columbia. _________________________________________________________________ FY2007 Objective 1: 1. Integrate new genetic maps. Milestone - Entry of new community IBM mapped loci and all other maps. Continuous. 2. Molecular diversity data. Milestone - Import data from Gramene and CIMMYT using automated procedures. Continuous. 3. Community data entry and the published literature. Milestone - Database staff curate at a constant, moderate level. Continuous. 4. QTL documentation, a community curation activity. Milestone - Implement data curation forms for QTL; Automate load of data sets from CIMMYT, Dr. Buckler, and others; assess strategy. The goal here is to maximize facile data entry. After the year 2006 experience, we will address feedback, and solicit followup feedback, to ensure that the tools are fully functional and make changes if warranted. Milestone - Community collaboration on priorities and strategies. Continuous. Objective 2: Milestone - Define new consensus outputs based on customer needs, and datatypes available. Maintain updated consensus genetic maps, including bins maps and a high resolution IBM based genetic map Milestone - Update consensus genetics maps: IBM Neighbors and bins. Consensus QTL maps and other tools. Milestone - Update consensus QTL maps with new data, in particular from higher resolution mapping populations. QTL data from the inter-mated B73 x Mo17 (IBM) population should be available in MaizeGDB during FY2006 and is expected to permit higher definition consensus maps than previously. Currently the resolution is to a region encompassing, on average, several hundred genes. The higher resolution population will narrow the resolution some 10-fold, to less than a few dozen genes. Objective 3: 1. Ontologies. Milestone - Continuous. 2. Links. Milestone - Continuous. 3. Complex queries. Milestone -Continuous. ________________________________________________________________ FY2008 Objective 1: 1. Integrate new genetic maps. Milestone - Entry of new community IBM mapped loci and all other maps. Continuous. 2. Molecular diversity data. Milestone - Import data from Gramene and CIMMYT using automated procedures. Continuous. 3. Community data entry and the published literature. Milestone - Database staff curate at a constant, moderate level. Continuous. 4. QTL documentation, a community curation activity. Milestone - Automate load of data sets from CIMMYT, Dr. Buckler, and others. Continuous. Milestone - Community collaboration on priorities and strategies. Continuous. Objective 2: 1. Maintain updated consensus genetic maps, including bins maps and a high resolution IBM based genetic map. Milestone - Update consensus genetics maps: IBM Neighbors and bins. Milestone - Analyze recombination events in different populations. Continuous, but also design representation in MaizeGDB useful for customers. 2. Consensus QTL maps and other tools. Milestone - Update consensus QTL maps with new data, in particular from high resolution mapping populations. Objective 3: 1. Ontologies. Milestone - Continuous. 2. Links. Milestone -Continuous. 3. Complex queries. Milestone - Implement and test a prototype per FY2007 results. 4a What was the single most significant accomplishment this past year? The main product of this research is a web-based integrated genetic map that is part of the Maize Genome Database (MaizeGDB), along with documentation for the map, maintained at Iowa State University. The Gramene website, maintained at Cold Spring Harbor Laboratories, routinely includes this map for comparison with rice and other cereal genomes linking it to several other genomics projects. This integration will have a major impact on the improvement of maize breeding by providing (1) finer resolution mapping required to define and isolate genes that are favorable for improvement of an agronomic trait, and (2) marker assisted selections of traits in plant breeding. 4b List other significant accomplishments, if any. Genetic (mutant) 2005 maps: Known genes have been placed onto a comprehensive, integrated framework under a collaboration with the University of Missouri (Ed Coe). This map is compatible with the IBM neighbors computation and improves customer utility in that it includes known genes on the IBM neighbors consensus map. Software design for agronomic trait genomics: A module for community submission of QTL data was designed, tested, and deployed at MaizeGDB. Chromosomal regions that have been recognized as containing key agronomic traits are currently summarized in the peer-reviewed literature without any of the detailed raw data needed by the community. Our newly developed software addresses this need and solves the problems associated with this inadequacy. In addition, our sofware facilitates the collection of QTL map information in a single location. 5. Describe the major accomplishments over the life of the project, including their predicted or actual impact. MaizeDB/MaizeGDB now provides in one site the integration of maize sequence, genetic and physical map information, with (i) documentation such as references and other sources, raw map data, availability of experimental tools such as genetic stocks and probes; (ii) gene function annotation for gene products and agronomic traits, phenotypes; (iii) interactive functions with key external databases; and (iv) public access over the Internet. It is the primary repository for electronic information related to genetic maps and experimental tools referred to above. Easy access to these data were critical to assembling the sequence ready BAC contig map for maize in the maize genome sequencing effort. MaizeGDB protocols have proven robust and readily accommodate new data types with minimal programming, for example, they have been successfully used to add gene silencing constructs. Customers include private and public sector researchers engaged in maize and cereal genomics; also teachers of high school and university classes; other public databases that include those for high-throughput projects. 6. What science and/or technologies have been transferred and to whom? When is the science and/or technology likely to become available to the end- user (industry, farmer, other scientists)? What are the constraints, if known, to the adoption and durability of the technology products? Chromosome walking to clone candidate genes for agronomic traits is now a reality as a result of map-based information that is carefully managed at MaizeGDB. Easy access to tools for basic biological research drives better basic science in maize. Citations of MaizeGDB as a source for information, both in the peer-reviewed literature, and at the annual Maize Genetics Meetings, reflect that the database is providing maize genome data to both private and public sector researchers. MaizeGDB provides information required for conventional breeding, using marker - assisted selections. The largest constraint on the adoption of downstream technology products is sociological and based on fears about genetically modified organisms. MaizeGDB provides a teaching resource that allows for a factual and rational analysis of such public issues.

Impacts
(N/A)

Publications