Introduction to PEDB:

The Prostate Expression Database (PEDB) is a curated relational database and suite of analysis tools designed for the study of prostate gene expression in normal and disease states. Expressed Sequence Tags (ESTs) and full-length cDNA sequences derived from more than 40 human prostate cDNA libraries are maintained and represent a wide spectrum of normal and pathological conditions. Detailed library information including tissue source, library construction methods, sequence diversity and abundance are available in a library archive. Prostate ESTs are assembled into distinct species groups using the multiple alignment program Phrap and are annotated with information from Genbank, dbEST, and the Unigene public sequence databases. Annotated sequences in PEDB are searched using the BLAST algorithm or a gene description keyword. The differential expression of each EST species can be viewed across all libraries using a Virtual Expression Analysis Tool (VEAT), a graphical user interface written in Java for intra- and inter-library species comparisons.

Analyses of the Human Genome sequence indicate that the Genome is composed of approximately 40,000 genes. However, the number of alternatively spliced transcripts and modifications to protein structure may increase the number by several fold. To confer developmental and functional specificity, only a fraction of this total is active in a given cell type at a given time. The messenger RNA (mRNA) molecules in a cell or tissue reflect that portion of the genome that is utilized or expressed. For any given cell or tissue this represents between 10,000 and 30,000 different mRNA species (85). A convenient term describing this repertoire of expressed genes in a cell or tissue is the transcriptome. Converting the entire population of cellular mRNAs into a library of cDNAs, followed by identifying the cDNAs by sequence analysis (producing an Expressed Sequence Tag or EST), provides a "snapshot" of transcriptional activity that can be correlated with a cellular phenotype or function.

The systematic categorization of ESTs by clustering and annotation provides one way of defining a tissue or cellular transcriptome. Another approach involves transcript profiling by microarray. A thorough understanding of which genes are expressed, to what extent, and under what conditions can provide insights into the processes of homeostasis and disease. In addition, unique sets or anthologies of expressed genes may serve to distinguish one cell type from another. The Cancer Genome Anatomy Project (CGAP) and NCBI have established databases of ESTs derived from normal and diseased human tissues for this specific purpose. ESTs have been assembled into the UniGene set, a database comprised of > 80,000 clusters of sequences, each representing transcription product(s) of individual human genes.

Importantly, although the sequence of the human genome is complete, the identification of genes within the genome is ongoing. ESTs allow for the localization of genes. Studies of cell types with highly specialized function allow for the identification of novel cell type-defining genes that are not identified in large sequencing projects.

The foundation of our genomics approach has centered on the characterization of a prostate transcriptome. These transcripts represent that portion of the human genome actively used or transcribed in the prostate. We have constructed 24 cDNA libraries from diverse prostate tissue-types, and generated ESTs representing the repertoire of genes expressed in these tissues. Several libraries have been generated from specific prostate cell types (e.g. secretory epithelium, basal epithelium. An extensive characterization of a subset of the prostate transcriptome derived from a normal prostate cDNA library (designated PN001) indicates that the prostate transcriptome is extremely diverse, with a small number of highly expressed genes; e.g. PSA, hK2, PAP, a moderate number of genes expressed at an intermediate level, and a large number of genes expressed at a low level. This analysis also identified many novel genes without any homology to known sequences present in the public databases.

To date we have used EST transcriptome analyses to: