| |
HOME |
MOTIF
LEXICON |
KREBS
CYCLE |
FAVORITE GENE | OVERVIEW | PUBLICATIONS
ABOUT US | HISTORY |
EDUCATIONAL MATERIALS | LINKS
| LEXOMICS


Goals of Our Research

To create a comprehensive searchable database (a
lexicon or concordance) of all DNA sequences in a given genome.
To specialize particularly in SHORT, INTERGENIC sequences.
To provide extensive annotations for each sequence, including location,
nearest (or neighboring) sequences
including up-and down stream genes, PubMed references, statistical analyses
and more.
To include (in fact, rely upon) undergraduate programmers in all aspects
of the project.
To focus on user-friendliness by working closely with biologist-users
(including students)
who are interested in deciphering short intergenic sequences.
To enjoy many of the aesthetic aspects of repeat sequences (directs,
mirrors, inverted repeats, and versatile repeats).
To slice off a piece of the research pie that represents a relatively
unexamined
part of the genome (the short, intergenic sequences) and to use a
"linguistic" working metaphor
to try to decode some of the functionality. See
LEXOMICS.
"The dark side of the moon", is how Brendan Maher (The Scientist, May 19,
2003, p31) described the sequences between the genes,
the intergenic sequences. When genomic research finds its way into the
popular press, "Genes" make the headlines.
Readers and reporters often are unaware of the majority of sequences in
most genomes, the vast areas between genes.
Most research focuses on genes and most sequence analysis is focused on
genes not only because
of their commercial promise but because they are more accessible, both
physically and conceptually.
In contrast, on the dark side, the intergenic sequences are loaded with
repeats
especially difficult to sequence and reconstruct, and almost always
comprising the last 1-5% of most
any genome that remains unfinished. Furthermore intergenic areas are often
full of transposable
(jumping) sequences and other moveable parts and thus do not yield stable
sequence data.
The functions (or putative functions) of intergenic sequences are the
other dark aspect of the problem.
Much of the subtle regulation of genes is known to be in intergenic (e.g.
promoter and enhancer) regions
but it is difficult to decipher which sequences are essential and which
are redundant or even "junk",
the random artifacts of a lively evolution. Many intergenic regulatory
sequences are short, making
them especially difficult to search for and analyze. They fade into a
statistical background
of noise. Yet without regulatory sequences, we (multicellular organisms)
would be piles of disorganized
proteins, all expressed at once. The essential information for
organization of gene expression lies
between the genes. We believe that aspects of that information can be
approached linguistically, having aspects
of a "natural language". See LEXOMICS.
HOME |
MOTIF
LEXICON |
KREBS
CYCLE |
FAVORITE GENE | OVERVIEW | PUBLICATIONS
ABOUT US | HISTORY |
EDUCATIONAL MATERIALS | LINKS
| LEXOMICS |
|