A large gap remains in our understanding of the function of a very significant portion of Arabidopsis gene products. This project, if funded, will begin filling this gap by systematically analyzing a large number of the functionally-unassigned genes by Fluorescent Tagging of their Full-Length Protein products (FTFLP). The proposed research will generate important information and tools to characterize the Arabidopsis proteome by seeking three specific aims ;
1. Selection and subcloning of approximately 4000 Arabidopsis genes of unknown function with their potential native regulatory sequences.
2. Flurescent tagging of the tested gene products and their insertion into Arabidopsis plants.
3. Analysis of the expession patterns and intracellular localization of the YFP-tagged proteins in planta.
Based on the most recent data on the Arabidopsis genome sequence and its annotations, we have identified 8,293 genes annotated as "unknown protein", "putative protein", or "waiting for functional annotation".
We applied to this list a series of filters to identify most suitable candidate genes for characterization by our FTFLP approach : see Table 1.
We sought to select a short list of 4,000 genes that are maximally diverse and therefore representative of most of the unassigned Arabidopsis sequences : see Table 2.
UPDATE July 5, 2002 Upon recommendation from the grant reviewers, we are scaling down to work on ca. 800 genes as a pilot study. From the 4000 genes we chose in December 2001, we selected 800 on the following criteria: 1. must have a full length cDNA and 2. do not have any Gene Ontology annotations. To maximize the diversity of the genes in the set, we preferentially chose the genes that are single-copies. Therefore, the chosen list of 855 (Table 2) contain more single-copy genes and a bit more plant-specific genes than the list of 4000, but are proportional in all other characteristics (e.g. predicted location and protein domains).
UPDATE August 28, 2002 The list of 855 was split into three files of ~286 genes for each group. In addition, the total number of associated ESTs and the mean intensity (along with standard deviation and coefficient of variance) of all the AFGC microarray experiments (ca. 560 hybridizations) are included to provide a 'rough' idea of level of expression.
UPDATE January 10, 2004 190 genes in the original unknown gene list (855) were no longer unknown genes by comparing them with the data annotated by Tair and released (Tigr4.0) by Tigr. The unknown gene list has been updated by replacing the 190 genes with another 190 unknown genes that are not represented in gene families. The unpdated unknown gene list can be downloaded from Table 2. The current status of gene can be browsed from project summary
Table 1. Identification of the "long list" of candidate genes for FTFLP characterization
Filter | Filter description | Number of "filtered-out" genes | Number of candidate genes |
1 | Genes annotated as unknown or putative proteins | 16,331 | 8,923 |
2 | Genes with transcript size > 6 kb (which, including flanking regulatory sequences, may be beyond the reliable limits of PCR amplification) | 283 | 8,640 |
3 | Genes whose intergenic sequence in either 5' or 3' end is less than 100 bp (most of them likely from incorrect annotations) | 241 | 8,400 |
4 | Genes with more than one annotated gene model (likely from overlapping BAC clones with potential sequence discrepancy) | 800 | 7,600 |
5 | Genes that are made obsolete by TIGR | 14 | 7,586 |
6 | Genes containing the FseI and SfiI sites (identified as the most rare cutters in the genome, to be used for insertion of the YFP tag) | 257 | 7,329 Download "long" list of 7329 genes as; tab-delimited text fileMicrosoft Excel file |
Table 2. Identification of the "short list" of candidate genes for FTFLP characterization
|