HOME | MOTIF LEXICON | KREBS CYCLE | FAVORITE GENE | OVERVIEW | PUBLICATIONS
 ABOUT US | HISTORY | EDUCATIONAL MATERIALS | LINKS | LEXOMICS

 

Goals of Our Research

To create a comprehensive searchable database (a lexicon or concordance) of all DNA sequences in a given genome.

To specialize particularly in SHORT, INTERGENIC sequences.

To provide extensive annotations for each sequence, including location, nearest (or neighboring) sequences
including up-and down stream genes, PubMed references, statistical analyses and more.

To include (in fact, rely upon) undergraduate programmers in all aspects of the project.

To focus on user-friendliness by working closely with biologist-users (including students) who are interested in deciphering short intergenic sequences.

To enjoy many of the aesthetic aspects of repeat sequences (directs, mirrors, inverted repeats, and versatile repeats).

To slice off a piece of the research pie that represents a relatively unexamined part of the genome (the short, intergenic sequences) and to use a "linguistic" working metaphor to try to decode some of the functionality. See LEXOMICS.

"The dark side of the moon", is how Brendan Maher (The Scientist, May 19, 2003, p31) described the sequences between the genes, the intergenic sequences. When genomic research finds its way into the popular press, "Genes" make the headlines. Readers and reporters often are unaware of the majority of sequences in most genomes, the vast areas between genes. Most research focuses on genes and most sequence analysis is focused on genes not only because of their commercial promise but because they are more accessible, both physically and conceptually.

In contrast, on the dark side, the intergenic sequences are loaded with repeats especially difficult to sequence and reconstruct, and almost always comprising the last 1-5% of most any genome that remains unfinished. Furthermore intergenic areas are often full of transposable (jumping) sequences and other moveable parts and thus do not yield stable sequence data.

The functions (or putative functions) of intergenic sequences are the other dark aspect of the problem. Much of the subtle regulation of genes is known to be in intergenic (e.g. promoter and enhancer) regions but it is difficult to decipher which sequences are essential and which are redundant or even "junk", the random artifacts of a lively evolution. Many intergenic regulatory sequences are short, making them especially difficult to search for and analyze. They fade into a statistical background of noise. Yet without regulatory sequences, we (multicellular organisms) would be piles of disorganized proteins, all expressed at once. The essential information for organization of gene expression lies between the genes. We believe that aspects of that information can be approached linguistically, having aspects of a "natural language". See LEXOMICS.

HOME | MOTIF LEXICON | KREBS CYCLE | FAVORITE GENE | OVERVIEW | PUBLICATIONS
 ABOUT US | HISTORY | EDUCATIONAL MATERIALS | LINKS | LEXOMICS

 

SITE MAINTAINED BY: MARK LEBLANC | LAST MODIFIED: