A large gap remains in our understanding of the function of a very significant portion of Arabidopsis gene products. This project, if funded, will begin filling this gap by systematically analyzing a large number of the functionally-unassigned genes by Fluorescent Tagging of their Full-Length Protein products (FTFLP). The proposed research will generate important information and tools to characterize the Arabidopsis proteome by seeking three specific aims ;
1. Selection and subcloning of approximately 4000 Arabidopsis genes of unknown function with their potential native regulatory sequences.
2. Flurescent tagging of the tested gene products and their insertion into Arabidopsis plants.
3. Analysis of the expession patterns and intracellular localization of the YFP-tagged proteins in planta.

Based on the most recent data on the Arabidopsis genome sequence and its annotations, we have identified 8,293 genes annotated as "unknown protein", "putative protein", or "waiting for functional annotation". We applied to this list a series of filters to identify most suitable candidate genes for characterization by our FTFLP approach : see Table 1.

We sought to select a short list of 4,000 genes that are maximally diverse and therefore representative of most of the unassigned Arabidopsis sequences : see Table 2.

UPDATE July 5, 2002 Upon recommendation from the grant reviewers, we are scaling down to work on ca. 800 genes as a pilot study. From the 4000 genes we chose in December 2001, we selected 800 on the following criteria: 1. must have a full length cDNA and 2. do not have any Gene Ontology annotations. To maximize the diversity of the genes in the set, we preferentially chose the genes that are single-copies. Therefore, the chosen list of 855 (Table 2) contain more single-copy genes and a bit more plant-specific genes than the list of 4000, but are proportional in all other characteristics (e.g. predicted location and protein domains).

UPDATE August 28, 2002 The list of 855 was split into three files of ~286 genes for each group. In addition, the total number of associated ESTs and the mean intensity (along with standard deviation and coefficient of variance) of all the AFGC microarray experiments (ca. 560 hybridizations) are included to provide a 'rough' idea of level of expression.

UPDATE January 10, 2004 190 genes in the original unknown gene list (855) were no longer unknown genes by comparing them with the data annotated by Tair and released (Tigr4.0) by Tigr. The unknown gene list has been updated by replacing the 190 genes with another 190 unknown genes that are not represented in gene families. The unpdated unknown gene list can be downloaded from Table 2. The current status of gene can be browsed from project summary

Table 1. Identification of the "long list" of candidate genes for FTFLP characterization

Filter Filter description Number of "filtered-out" genes Number of candidate genes
1Genes annotated as unknown or putative proteins16,3318,923
2Genes with transcript size > 6 kb (which, including flanking regulatory sequences, may be beyond the reliable limits of PCR amplification)2838,640
3Genes whose intergenic sequence in either 5' or 3' end is less than 100 bp (most of them likely from incorrect annotations)2418,400
4Genes with more than one annotated gene model (likely from overlapping BAC clones with potential sequence discrepancy)8007,600
5Genes that are made obsolete by TIGR147,586
6Genes containing the FseI and SfiI sites (identified as the most rare cutters in the genome, to be used for insertion of the YFP tag)2577,329
Download "long" list of 7329 genes as;
  • tab-delimited text file
  • Microsoft Excel file
  • Table 2. Identification of the "short list" of candidate genes for FTFLP characterization

    Categories Category description Total genes Gene families Not represented in gene families Blast only against plant proteinsProteins have domain matches (Interpro)Transmembrane domains Full length cDNA
    # of members = 1# of members > 1 (# gene families)
    1Long list (see table 1)732915385029 (1207)7624374429515651720
    2Final, short list
    Download "short" list of 4000 genes as;
  • tab-delimited text file
  • Microsoft Excel file
  • 400015381700 (1207) 762 2206 2184 8881410
    3Pilot List
    Download "pilot" list of 855 genes as;
  • tab-delimited text file
  • Microsoft Excel file
  • 85569691 72 552 352 253855
    4Vitaly's List
    Download "pilot" list of 286 genes as;
  • tab-delimited text file
  • Microsoft Excel file
  • 5Natasha's List
    Download "pilot" list of 286 genes as;
  • tab-delimited text file
  • Microsoft Excel file
  • 6updated 823 unknown genes
  • tab-delimited text file
  • Microsoft Excel file