Source: UNIVERSITY OF CALIFORNIA, BERKELEY submitted to
COGE: AN ON-LINE PLATFORM FOR COMPARATIVE GENOMICS EXPERIMENTS IN SUPPORT OF CROP PLANT RESEARCH
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0223317
Grant No.
(N/A)
Project No.
CA-B-PMB-0037-H
Proposal No.
(N/A)
Multistate No.
(N/A)
Program Code
(N/A)
Project Start Date
Oct 1, 2010
Project End Date
Sep 30, 2015
Grant Year
(N/A)
Project Director
Freeling, M.
Recipient Organization
UNIVERSITY OF CALIFORNIA, BERKELEY
(N/A)
BERKELEY,CA 94720
Performing Department
Plant Biology, Berkeley
Non Technical Summary
This five year project will further develop CoGe, on-line a platform or workbench to support on-the-fly comparative genomic experiments. With all the new genome sequences, most for crop plants, and the new omic and seq data, individual crop plant biologists require help in comparing annotated genomes on-line. Specifically, this proposal is to provide help to researchers who do not use or want to use the command line.
Animal Health Component
(N/A)
Research Effort Categories
Basic
80%
Applied
20%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2011599108070%
2012499108030%
Goals / Objectives
1.All Yr. To load each plant genome version. We create customized datasets with features added that greatly enhance analyses: repeats masked, conserved nonconding sequences, motifs, methyl groups, histone positions, modified histone positions, ChIP-SEQ binding positions... 2.All Yrs. We will integrate CoGe into other community resources as already done with PlantGDB and MaizeGDB. 3.Yrs 3-5. We will prepare CoGe for migration from U. of CA servers. Eric Lyon's, CoGe's lead, will take CoGe with him to his next job. We are presently installing a mirror of CoGe at the U. of GA and iPLANT. Specific 1.Yr 1. We will complete a new CoGe installation, involving bigger, faster, more expensive servers. ,A new version of CoGe, will be installed. Software improves. We always have 4 versions of CoGe: the production copy for public use, the development copy that is the improved version in waiting, and backups. 2.Years 1-3. A 6th CoGe application will be added soon: MotifView. The beginnings of this application can be seen on our Development Server: http://toxic.berkeley.edu/CoGe/MotifView.pl This application takes any number of plant sequences to be compared on-the-fly and graphically represents all experimentally-known product-binding DNA motifs as regular expressions. By combining these motifs with our custom databases, research advances. When ChIP-SEQ locations are added, this application will become more powerful. Finally, we will develop a gene feature pattern matching algorithm which will find similar regulatory patterns irrespective of sequence. 3.Years 2-5. BranchView, the 7th application, is just an idea. It is a possible to build robust gene lineage trees in CoGe via http://www.phylogeny.fr/: http://www.youtube.com/watchv=qdkgrtXySlI&feature=related. We manually color code branches with data of interest, and cut and paste genes from the tree into other applications. We will automate. Clicked genes will immediately be transported to GEvo or MotifView for analyses. Branches are those shared lineages that are inferred to have happened in the evolutionary past. We plan to automate a reconstruction of these branches, of how genes, including their cis control regions, evolved. 4.Years 1-4: Specific algorithm development. Inside useful web applications are the algorithms that only programmers can love. 5.Outreach. We help high school students, and especially underserved students, SEE evolution by graphic representation of the output of comparing genomes. Here is a YouTube video that uses GEvo to compare the genome of a chimp with that of a man: http://www.youtube.com/watchv=I-dUsMuIkMg&feature=player_embedded 6.Supporting CoGe as an agronomic resource. If CoGe usage continues rise, we'll need more funding. AES would be a natural source. At present, the only nonNSF investment in anything involving my lab is some percent of my salary. This grant proposal justifies that proportion of my salary by leveraging far larger investments made by others.
Project Methods
METHODS: The procedures that will be used to attain the objectives of are indicated in the description of the aims.The vocabulary of developers is arcane and the code itself cannot be read except by those fluent individuals, fluency in programming languages, like python, bioperl, postscript and many mark-up languages, like xml and html. The procedures we use are documented in our wiki under documentation: http://synteny.cnr.berkeley.edu/wiki/index.php/Documentation, and the code itself is sometimes shared on websites such as github social coding; here is the gobe page: http://github.com/brentp/gobe. A github submission IS publishing for a creative programmer. JUSTIFICATION: My laboratory, in the process of pursuing Federally funded research on gene regulation and evolution, prodfuces CoGe. My lab has two PIs, Damon Lisch and myself, and we aim to support the agronomic community. Understanding plants, a prerequisite to harnessing plants for food, fiber and energy, is greatly augmented by whole genome sequences. Since the release of the first Arabidopsis sequence 10 years ago, there have been about a dozen higher plant genomes added to the public databases, with many ongoing. My group has been a part of the papaya and sorghum user groups, and is currently an official banana, foxtail millet, sacred lotus, and Cleome pre-release user. CoGe: http://synteny.cnr.berkeley.edu/CoGe/ and its wiki http://synteny.cnr.berkeley.edu/wiki/index.php/Main_Page. We also run a summer intern program to train underserved, local high school students in comparative genomics, and also their teachers. All aspects of biology involving genes are advanced by comparative genome analyses. Breeding is routinely mapping marker-assisted, commodity quality control uses DNA fingerprinting, diseases are diagnosed similarly, ecological health is monitored by sequencing every DNA available and our future on earth could well be dictated by our ability to re-design our domesticates to yield in environments that have changed to be radically different from the environment in which the plants were domesticated. Biotechnology as a field depends on the comparison of whole genomes.

Progress 10/01/10 to 09/30/15

Outputs
Target Audience:There have been no changes. My target audience is all researchrs worldwide-- especially crop plant researchers-- who use whole genome sequences in their research. Such researchers often use the compartive genomics toolbox CoGe. CoGe is a website powered by CyVerse (aka iPlant; NSF-funded project at the University of Arizona and others) containing a group of applications that make comparative genomics easier to do, and without a command line. Comparative genomics supports crop plantresearch, and all research on organisms that have fully sequenced genomes. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest?Each of the many publications includes code deposited in GitHub, and/or data in the Supplementl Information or at FigShare if the data are too large a journal's Supplemental Information. CoGe itslef permits downloads of all data a user can see via the CyVerse authorization portal. The Freeling lab has no private data; inquire via freeling@berkeley.edu.if you want something you can't find. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? All aims accomplished, or a replacement aim accomplished.

Publications


    Progress 10/01/13 to 09/30/14

    Outputs
    Target Audience: My audience has not changed: CoGe is a website powered by iPlant (NAF-funded project at the University of Arizona) containing a group of applications that make comparative genomics easier to do. Comparative genomics supports crop plant research, and all research on organisms that have fully sequenced genomes. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? During this reporting period, my last graduat student, Shabari Subramaniam got his Ph.D., continued on as a postdoc for awhile and then got a job at iPlant (University of Arizona) where he continues to adviance the goals of this project. How have the results been disseminated to communities of interest? CoGe and qTeller are both public web applications. My papers are published with their mandatory datasets, these publically available "forever". What do you plan to do during the next reporting period to accomplish the goals? Using CoGe and qTeller tools, I intend to condunct research on 1) solving the c-value paradox and finding a function for "junk" DNA, and 2) exploring the language of gene regulation using polyploids to perform "fractionation mutagenesis". By using the public tools my lab has helped create and enhance, and publishing in higher impact journals, my lab accomplishes the most important goal of this SAES project.

    Impacts
    What was accomplished under these goals? We are one track this fourth year to accomplish all of our original goals. The goals specifically for this 4th of 5 year CoGe-enhancement project was to tie up loose ends of our original proposal, and to use COGe tools to advance our understanding of polyploidy in the crops maize (corn) and Brassica rapa (Chinese cabage) Additionally, a web application currently running on one of my lab servers-- qTeller (created and maintained by Dr. james Schnable, ex student and now an Asst Prof. University of Nebraska) has been implemented within CoGe at iPlant. That implementation was during this project period. The general idea is that individual labs from around the workd measure RNA levels in specific cellular preparations using direct sequencing of mRNA-templated DNA, called RNAseq. The reads are generally put into a small reads archive along with their descriptive data (metadata). QTeller goes to the small reads archive, and renders the reads into FPKM in the same way, and then allows researchers to graphically compar their results with the results of all others who are working on the same genetic accession (line). qTeller is a powerful tool for comparative genomics research.

    Publications

    • Type: Journal Articles Status: Published Year Published: 2014 Citation: Woodhouse, M.R., Cheng, F., Pires, J.C., Lisch, D., Freeling, M., and Wang, X. (2014). Origin, inheritance, and gene regulatory consequences of genome dominance in polyploids. Proc Natl Acad Sci U S A 111:5283-5288.
    • Type: Journal Articles Status: Published Year Published: 2014 Citation: Bolduc, N., Tyers, R.G., Freeling, M., and Hake, S. (2014). Unequal redundancy in maize knotted1 homeobox genes. Plant Physiol 164:229-238.
    • Type: Journal Articles Status: Published Year Published: 2014 Citation: Burgess, D., and Freeling, M. (2014). The most deeply conserved noncoding sequences in plants serve similar functions to those in vertabrates despite large differences in evolutionary rates. The Plant Cell 26:1-16.
    • Type: Journal Articles Status: Published Year Published: 2014 Citation: Garsmeur, O., Schnable, J.C., Almeida, A., Jourda, C., D'Hont, A., and Freeling, M. (2014). Two evolutionarily distinct classes of paleopolyploidy. Molecular biology and evolution 31:448-454.


    Progress 01/01/13 to 09/30/13

    Outputs
    Target Audience: No changes from year to year. My audience is that group of peer researchers worldwide, thier funding teams, and all world-wide who are interested in our results and in CoGe, a public coparative genomic toolbox especially designed to support crop plant genomic research. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? 30% of my efforts as a professor are funded by NIFA in the form of 30% of my 11 month salary. During this period, 4 graduate student (2 completed Ph.Ds) were trained by me' two were mentored by me 100% and 2 50%. using CoGe and qteller was part of their thesis research as was published in their theses and was or will be published later. These theses were filed after this 9 month report period. i also was part of training a visiting graduate student from France, and a postdoc from China. I trained an NSF-funded postdoc during this period as well. However, the core of my lab consists of two Ph.D. professional researchers. All of us use CoGe and qT, and publish on our findings. During these 9 months I taught 300 undergraduates each year and wrote a short course (as yet untaught; published in late 2013) with Coge being used to help students perform ther own on-the-fly comparative genomics experiments. Undergrads and at least one high school student (underserved, sent to us by project SEED) were doing research in my lab this period. In general, my ex students go one to the appropriate next step in their careers. The Freeling lab continues to be a useful step in the career of younger talent. The limiting step to meaningful research continues to be in the minds of creative biologists, usually distilled into profound hypotheses coupled with perfect controls. Perhaps in the future, every crop scientist will be comfortable with the command line and trained in the bioinformatics and statistics necessary to take reads and similar from the Small Reads Archive and render them as useful, believable ENCODE-like data. Next-gen sequencing is simply the norm, and CoGe helps "the little guy" with great ideas stay in the race. I have no question that my efforts, especially -- 30% funded by NIFA-- back during the development days, but now as well, have been for the public good; they enhance scientific discovery and-- in so doing-- "professional development". Dr. Damon Lisch, an ex graduate student from the 1990's, rejoined by lab in 1999 as a professional researcher, originally paid by my grants and later was given temporary PI stutus at UC-Berkeley. As of October, 2014, about 15 years later, Dr. Lisch will join Purdue as an Associate Professor with a very big raise. Damon, like myself, uses CoGe and qT in his publications, grant proposals, and teaching, and should probably provide my last, and most difinitive bit of evidence that the Freeling Lab is still a proved stepping stone going up. How have the results been disseminated to communities of interest? My communities go to the Supplemental Informations of published papers, or to permanent repositories such as Fig. Share for our data. Website Coge and qTeller are, themselves, designed to disseminate results. In CoGePedia, the support docs of CoGe, there are several tutorials, several on YouTube (as described as Progress in previous reports) . Some of these were even produced by high school kids or the general public. My lab is just not expert with communicating to the uninformed or those who's motivations are largely not scientific. We count on our agencies to talk to these improtant people, and will halp when asked. What do you plan to do during the next reporting period to accomplish the goals? Most of the goals to be reproted during the next project period were published already, but outside of these 9 months. That's just by chance. 1. Student Subramaniam published MotifView in his Ph.D. thesis (December, 2013) and is now submitting for real publication. it is already operative within CoGe. 2. 5 publications are already published or in press as of 2-18-2014. All involve CoGe and qTeller. 3. My lab will help with quality control as qT is absorbed by CoGe. Next year, they will be one., all at iPlant. 4. My lab just recieved a small three year NIFA research grant (Fowler, Scanlon nd Freeling: fractionation mutagenesis and tip growth in root hairs and pollen tubes). My part of this collaborative work is to use CoGe and qTeller to help bioinformatics find cis acting modules with the meaning "tip growth" or "tip growth in hairs" or tip growth in pollen tubes". With luck, we will know something about the gene regulatory language involving "tip growth".

    Impacts
    What was accomplished under these goals? The primary aim of this grant is to help promote and improve CoGe, a toolbox for comparative genomics with an emphasis on higher plants (crops). The key word is "help". While CoGe was originally developed in my laboratory by programmers Thomas, Pederson and -- most improtantly-- graduate student Eric Lyons (I was the original PI), CoGe is now a fully independent service project powered by iPlant, and lead by Eric Lyons, Assistant Professor, University of Arizona and iPlant researcher. it remains funded by NSF to Lyons. My job, and the primary aim of this grant is to be a primary quality controller of Coge, to develope applications that fit into Coge, and to generally inform the world that Coge is a toolbox that has fully morphed from my lab's toolbox to a toolbox for world comparative genome biologists. My contribution is no longer essential, but everyone helps. Once again, I must describe a Cige usage graph becasue I can;t find attachments. Not counting bots, Coge launched in early 2008, and by early 2010 had 2000 visits per month, incleasing to 6000, 6,700. 7,500 early 2011-13, and about 7,000 visits/month for the end date of this progress report. The trend is continuously up. over half are repeat visist, indicating research usage. So, these CoGe usage data testify to some sucess. 1. During these 9 months, an applicatiion for CoGe was coded by a graduate student who got his degree at the end of 2013; I'll report on this next year but it is a significant application that compares visually motif positions around genes of choice and helps search for patterns along with other useful features that can be mapped to the genome. 2. Ex graduate student James Schnable, soon to be a new Assistant Professor at the University of Nebraska, coded a useful application called qTeller while a graduate student with me; it is not yet published, but a number of his websites are present my my lab's servers and are proving useful to the plant research community. The mother of these web applications is qTeller maize: http://qteller.com/qteller3/. The purpose of qTeller is to make available a standardized pipeline that renders RNA-seq DNA reads into FPKMs, and probides graphic tools so these FPKMs can be compared to those devived from the same genotype but by a different lab measuring a different biological endpoint (like control s7cm leaf blades versus draught induced for 6 hrs. qT-maize now has 37 different gene expression endpoints, and that is expected to grow. My lab has contributed to qT-Brassica and qT-arabidopsis during this project period. The web applications are being used, but are not yet published, so it is difficult to really prove Progress. If you go to the Coge homepage, you can read that the developers Lyons (CoGe) abd qTeller (Schnable) have agreed to make qT part of Coge. One stop shopping is probably a very good idea.

    Publications

    • Type: Journal Articles Status: Published Year Published: 2013 Citation: Subramaniam S, Wang X, Freeling M, Pires JC (2013) The Fate of Arabidopsis thaliana Homeologous CNSs and Their Motifs in the Paleohexaploid Brassica rapa. Genome biology and evolution 5: 646-660
    • Type: Journal Articles Status: Published Year Published: 2013 Citation: Turco G, Schnable J, Pedersen B, Freeling M (2013) Automated conserved noncoding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses. Frontiers in Plant Genetics and Genomics 4: 170-180
    • Type: Book Chapters Status: Published Year Published: 2013 Citation: Freeling M (2013) A short course on the impact of gene duplications on the evolution of novelty. In Advances in Botanical Research, Paterson AH (ed), Vol. 69, Chapter 13, 27pp. Elsevier
    • Type: Book Chapters Status: Published Year Published: 2013 Citation: Schnable J, Freeling M (2013) maize (Zea mays ) as a model for studying the impact of gene and regulatory sequence loss following whole-genome duplication. 2013. In Polyploidy and Gene Evolution. Eds. :Soltis, P.S. and D.E. p137-145. Springer. ISBN: 978-3-642-31441-4 (Print) 978-3-642-31442-1 (Online)


    Progress 01/01/12 to 12/31/12

    Outputs
    OUTPUTS: CoGe, or comparative genomics, is a public web-toolbox where genomes may be usefully compared for research purposes. CoGe originated and continues to be improved upon by the Freeling lab. Thousands of new genomes were added this year to a total of 19,000 representing 13,000 different whole genome assemblies. Worldwide usage of Coge continues to rise. We are almost complete in our migration of CoGe from UC-Berkeley to iPlant and the University of Arizona, the site of the CoGe production server, and the location of the CoGe developer, Eric Lyons. The CoGe development server was deactivated on the last day of 23012, and was subsequently moved to U. Arizona. The migration is on schedule. james C. Schnable in the Freeling lab has finished a pipeline that is to stand as a sister to CoGe: qTeller. This public tool is 1) a pipeline that acquires reads from small reads archives deposited by the community and renders them into FPKM, units of RNA abundance, or gene expression. 2) These FPKM values re displayed in two different graphical formats to enhance research. The largest user of both GEvo and qTeller is the maize genetics research community, and MaizeGDB links to and from both of our webtools. qTeller naturally expands over time, nd there are any number of instances of qTeller possible. Now, there is a current instance of qTeller maize, sorghum, arabidopsis and Brassica rapa. qTeller as a single pipeline is maintained by its developer James Schnable, and the Freeling lab maintains qT-arabidopsis and Brassica. PARTICIPANTS: Freeling is the only participant by virtue of part of his UC-Berkeley appointment. TARGET AUDIENCES: CoGe usage worldwide grows each year, almost entirely in the plant breeding, genetics and research community. SynMap application in CoGe is an excellent way for student to compare human and chimp genomes, for example. Coge is on-the-fly, not some static browser. PROJECT MODIFICATIONS: none. PARTICIPANTS: Not relevant to this project. TARGET AUDIENCES: Not relevant to this project. PROJECT MODIFICATIONS: Not relevant to this project.

    Impacts
    Impacts unique to this year's research in the Freeling lab are evidenced in our 11 publications, listed below. Google Scholar automatically tabulates citations, and these quantify-- at least to some extent-- immediate impact to the scientific research community. My best picks for impact is the contribution we made to the Musa (banana) release paper in Nature (CoGe was the featured tool), our PNAS publication characterizing the 2 subgenomes within the maize genome, and, finally, our Current Opinions article adds a new, testable hypothesis to the grand competition to solve heterosis. We think that heterosis is an ephemeral burst of gene activity because small RNA control (epigenetic fine tuning) in the hybrid soma has been disrupted temporarily. The impact of CoGe and the expected impact of qTeller on world-wide research is more significant yet, but I intend to wait for the final report on this project to attempt to quantify this. In the meantime, please visit CoGe and qTeller by Googleing these words.

    Publications

    • D'Hont A, Denoeud F, Aury JM, Baurens FC, Carreel F, Garsmeur O, Noel B, Bocs S, Droc G, Rouard M et al. 2012. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 488(7410): 213-217.
    • Freeling M. 2012. Response to Birchler: heterosis is partly a sub-probem of quantitative genetics, but its solution may depend on understanding the mysterious genetics of quantity. Maydica 57: 96-97.
    • Freeling M, Woodhouse MR, Subramaniam S, Turco G, Lisch D, Schnable JC. 2012. Fractionation mutagenesis and similar consequences of mechanisms removing dispensable or less-expressed DNA in plants. Current Opinion in Plant Biology 15(2): 131-139.
    • Reneker J, Lyons E, Conant GC, Pires JC, Freeling M, Shyu CR, Korkin D. 2012. Long identical multispecies elements in plant and animal genomes. Proceedings of the National Academy of Sciences of the United States of America 109: E1183-91.
    • Schnable JC, Freeling M, Lyons E. 2012a. Genome-wide analysis of syntenic gene deletion in the grasses. Genome biology and evolution 4(3): 265-277.
    • Schnable JC, Wang X, Freeling M, Pires JC. 2012c. Escape from preferential retention following repeated polyoloidies. Frontiers in Plant Genetics and Genomics accepted, June, 2012. Spangler J, Ficklin S, Luo F, Freeling M, Feltus F. 2012a. Conserved noncoding regulatory signatures in arabidopsis co-expressed gene modules. PloS one In press August 2012.
    • Spangler JB, Subramaniam S, Freeling M, Feltus FA. 2012b. Evidence of function for conserved noncoding sequences in Arabidopsis thaliana. The New phytologist 193(1): 241-252.
    • Subramaniam S, Freeling M. 2012. Conserved noncoding sequences in plant genomes. In Plant Genome Diversity, ISBN 978-3-7091-1129-1, Vol Volume 1: Plant genomes, their residents, and their evolutionary dynamics (ed. JF Wendel). Springer.
    • Tang H, Woodhouse MR, Cheng F, Schnable JC, Pedersen BS, Conant G, Wang X, Freeling M, Pires JC. 2012a. Altered patterns of fractionation and exon deletions in Brassica rapa support a two-step model of paleohexaploidy. Genetics 190(4): 1563-1574.
    • Zhang W, Wu Y, Schnable JC, Zeng Z, Freeling M, Crawford GE, Jiang J. 2012. High-resolution mapping of open chromatin in the rice genome. Genome Res 22(1): 151-162.


    Progress 01/01/11 to 12/31/11

    Outputs
    OUTPUTS: OUTPUTS: CoGe, or comparative genomics, is a bundle of webtools, designed originally by Eric Lyons while he was in my laboratory, in order to help researchers compare genes and genomes without hiring their own programmers or and without using the command line. CoGe is also my lab's toolbox. All outputs from my work are available online at http://genomevolution.org/CoGe/ and, in addition, in the massive files associated with published papers, and housed on the publisher's servers. During 2011, public usage of CoGe increased from approximately 5000 significant research visits/month to approximately 6000 visits/month, and we expect usage to continue its increase. Usage is now at a level that systems administration has become a burden on my lab. Fortunately, the iPlant Collaborative at the University of Arizona has agreed to power CoGe, and the first phase of migration happened in early 2012, and will continue over the next few years. PARTICIPANTS: Freeling, the PI, is the only member of his lab supported by the agency, this by virtue of a component of his salary. TARGET AUDIENCES: The target audience of CoGe is the community that visits our website for research purposes. That audience grew approximately 20% in 2011 from the previous year. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

    Impacts
    Our impacts are published in one of the 13 publications below, in CoGe itself, or at http://biocon.berkeley.edu/athaliana. CoGe applications have been significantly enhanced during 2011, as they were in 2010. Pederson et al., 2011 contributed to the graphic algorithm beneath the application GEvo. Tang et al., 2011, contributed to the QuotaAlign algorithm implemented within SynMap. This algorithm is of particular use for crop plant researchers, because polyploidy, both ancient and recent, is widespread among our crops, and plant biologists are routinely comparing genomes of essentially different ploidies; SynMap makes this possible by organizing comparative data in line with ploidies quota expectations. For example, comparing the genome of the papaya with Arabidopsis is 1:4, and papaya with Brassica rapa (Chinese cabbage) is 1:12. Without QuotaAlign, much complexity ensues. Most of the 13 publications from my laboratory in 2011 advance or use CoGe in some way. An example most related to crop improvement is Schnable et al. 2011 PNAS. Here we show, by using CoGe tools to divide the newly published genome in maize into two subgenomes, that these subgenomes are not equal. One has many more genes than the other and the genes it has are expressed to higher levels of RNA. In other words, within maize are two genomes that are bout 12 million years old, one ancestral and relatively intact and well-expresses, the other is deleted, under=expressed and "all beat up." We hypothesize that the genes on the "beat up" genome, like the victims is playground bullying, become the novel genes in evolution and domestication. We are testing this hypothesis. Similarly, the genome of Brassica rapa is a paleohexaploid, and one genome is dominant just as in maize. We are trying to find out what makes the genomes of new polyplods different from one another.

    Publications

    • Woodhouse M, Tang H, Freeling M: gene positional history in arabidopsis is correlated with the whole genome duplication events in the Brassicales. The Plant Cell 2011, 12: 4241-53.
    • The-Brassica-rapa-genome-sequencing-project-consortium(-incl.-Freelin g)...Wang: The genome of the mesohexaploid crop species Brassica rapa. Nature Genetics 2011, 43:1035-1039.
    • Tang H, Lyons E, Pedersen B, Paterson A, Freeling M: Guided synteny alignment between duplicated genomes through integer programming. BMC Bioinformatics 2011, 12, 102.
    • Spangler JB, Subramaniam S, Freeling M, Feltus FA: Evidence of function for conserved noncoding sequences in Arabidopsis thaliana. New Phytol 193, 241-252 2011.
    • Schnable JC, Springer NM, Freeling M: Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc Natl Acad Sci U S A 2011, 108:4069-4074.
    • Schnable JC, Pedersen BS, Subramaniam S, Freeling M: Dose-sensitivity, conserved noncoding sequences and duplicate gene retention through multiple tetraploidies in the grasses. Frontiers in Plant Genetics and Genomics 2011, 2:1-7.
    • Schnable JC, Freeling M: Genes Identified by Visible Mutant Phenotypes Show Increased Bias toward One of Two Subgenomes of Maize. PLoS One 2011, 6:e17855.
    • Schnable J, Freeling M, Lyons E: Genome-wide analysis of syntenic gene deletion in the grasses. Genome Biology and Evolution 2011, accepted.
    • Schnable J, Freeling M: Polyploidy research in maize. Edited by Soltis P. and D: Springer; 2011.
    • Pedersen BS, Tang H, Freeling M: Gobe: an interactive, web-based tool for comparative genomic visualization. Bioinformatics 2011, 27:1015-1016.
    • Lyons E, Freeling M, Kustu S, Inwood W: Using genomic sequencing for classical genetics in E. coli K12. PLoS One 2011, 6:e16717.
    • Eichten SR, Swanson-Wagner RA, Schnable JC, Water AJ, Hermanson PJ, Liu S, Yeh C-TE, Jia Y, Freeling M, Schnable PS, et al.: Heritable epigenetic variation among maize inbreds. PLoS Genetics 2011, 7.
    • Cande WZ, Freeling M: Inna Golubovskaya: the life of a geneticist studying meiosis. Genetics 2011, 188:491-498.


    Progress 01/01/10 to 12/31/10

    Outputs
    OUTPUTS: CoGe, or comparative genomics, is a bundle of webtools specifically designed to compare whole genomes to whole genomes; there are over 10,000 genomes loaded into CoGe at present. The units of comparison are dots in dotplots, hits resulting from local or global alignment algorithms, co-plots of gene models with motifs or other features, graphic whole genome blast tools and tools to discover expected syntenic regions whether or not an expected genomic feature lies in it. CoGe facilitates on-the-fly research and is not a browser, as is Gramene, for example. URL: http://synteny.cnr.berkeley.edu/CoGe/index.pl; there are several tutorials and YouTube videos in GoGePedia. CoGe went online in 2008, was up to 2,000 visits/month by the end of 2009 and 4,000 visits per month at the end of 2010. The usage trend is accelerating and is world-wide. CoGe's servers are presently housed in the DataCenter, UC-Berkeley. However, a mirror at iPlant Collaborative, U. Arizona, Eric Lyons, lead developer, is in progress and the PI role for CoGe will pass from Freeling to Lyons eventually. Freeling will remain as the lead geneticist. Since the Freeling lab, the home of CoGe, studies plant genetics and genomics, this software is especially popular with crop biologists who want to take advantage of the complete genome sequences that are being released monthly, but do not have adequate command line expertise, access to graphic programmers or funding to hire it done. PARTICIPANTS: Michel Freeling is the only researcher who, by virtue of his salary, is funded by the agency. TARGET AUDIENCES: All researchers, especially those in plant and microbes, who use whole genome sequence have or should have used CoGe tools over the past year. Underserved high school student research projects in the Freeling lab's summer program have spawned several tutorials and YouTube videos introducing or using CoGe. One favorite is YouTube: SynMap.human.chimp.mov PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

    Impacts
    There are two categories of outcomes. First, the CoGe platform has been enhanced during 2010. The flash graphics for gene alignment viewing has been upgraded by code from programmer Brent Pederson, and is now enabled in GEvo. His publication in Bioinformatics has a 2011 date, so will be reported next year. A particularly valuable enhancement of our dotplotting tool, SynMap, permits the user to compare two genomes that have had different numbers of whole genome duplication events by specifying the expected genomic relationships. postdoc Haibao Tang (submitted) coded this enhancement, called QuotaAlign. For example, comparing the maize genome to the rice genome has an expected 2:1 relationship; Brassica rapa to papaya is 7:1. That means, for each gene is papaya there are up to 7 syntenic orthologous genes in Brassica rapa. Second, all of our own work, cited in the segment following, exemplifies the intelligent use of CoGe software. For example, CoGe software has been applied to problems in sorghum, rice, brachypodium, bannana, Brassica rapa, arabidopsis, papaya, poplar and grape. The Freeling lab was a part of the official annotation teams of papaya and sorghum (released) and is a part of the Brassica rapa and banana teams (genomes yet to be released). The overall agricultural impact is, I trust, the advancement of many research projects dedicated to crop and feedstock improvement. All results are publicaly available through CoGe, and through the Freeling lab website. CoGe has reciprocal links with MaizeGDB (USDA funded) and PlantGDB, and links out to several public genomic support websites.

    Publications

    • PATERSON, A., M. FREELING, H. TANG and X. WANG, 2010 Insights from the comparison of plant genome sequences. Annu. Rev. Plant Biol. 61: 349-372.
    • WOODHOUSE, M., and M. FREELING, 2010 Tandem duplications and gene transposition in plants. Maydica 54: 463-471.
    • WOODHOUSE, M., B. PEDERSEN and M. FREELING, 2010a Transposed genes in Arabidopsis are often associated with flanking repeats. PLoS Genetics 6: 10.
    • KANE, J., M. FREELING and E. LYONS, 2010 The evolution of a high copy gene array in Arabidopsis. Journal of Molecular Evolution 70: 531-544.
    • WOODHOUSE, M., J. SCHNABLE, B. PEDERSEN, E. LYONS, D. LISCH et al., 2010b Following tetraploidy in maize, a short deletion mechanism removed genes preferentially from one of the two homeologs. PLoS Biology 8: 15.
    • WOODWARD, J. B., N. D. ABEYDEERA, D. PAUL, K. PHILLIPS, M. RAPALA-KOZIK et al., 2010 A maize thiamine auxotroph is defective in shoot meristem maintenance. Plant Cell 22: 3305-3317.
    • LI, H., M. FREELING and D. LISCH, 2010 Epigenetic reprogramming during vegetative phase change in maize. Proc Natl Acad Sci U S A 107: 22184-22189.