ECR Browser :: Introduction and Instructions on usage |
|
Please note that these instructions are outdated and do not describe all available ECR Browser features... |
| Contents |
|
Introduction What kinds of sequence features correspond to ECRs? Why compare higher vertebrates with fishes? Genomes compared How to start using the ECR Browser Selecting the base genome Browser settings List of ECRs in the locus Underlying DNA sequence Navigating by synteny links, accessing alignments, dot-plots and annotation of conserved transcription factor binding sites Re-centering at a given location Zooming and shifting Moving within a chromosome or from one chromosome to another Grab ECR feature Dynamic link to the UCSC Browser Pip-plot vs Smooth-plot Changing the ECR Browser image width Genome alignment ECR Browser legend Questions or comments? |
| Introduction |
ECR Browser is a dynamic graphical interface to
Evolutionary
Conserved Regions (ECRs) in genomes of sequenced species including human,
mouse, rat and Fugu. The conservation
profile created by aligning one sequence (the "base") with all others in a
pairwise fashion is graphically displayed for any locus in a genome. ECRs
are identified as regions of high sequence identity against a
neutrally evolving
background. By scanning an alignment the browser detects and
highlights sequence
elements of significant length that are conserved above a specified level
of sequence identity between the two genomes (as defined by the user-defined
parameters) and highlights them as ECRs. Visually ECRs are represented as
colored peaks on a graph, with the x-axis representing positions in the base
genome and the y-axis representing % identity between the base and aligned
genomes at that specified position. Below is an example of the ECR
Browser visualization of the human/mouse conservation profile of the human
APMCF1 gene.
This display shows a pip-type conservation plot in which human DNA (base sequence) is represented on the horizontal axis, while multiple ungapped blastz alignments are displayed in the graph as short horizontal black lines. The length of the horizontal alignment line correpsponds to the alignment length in the base sequence, while its vertical position corresponds to the level of nucleotide identity in this alignment. A vertical axis cut-off of 50% to 100% identity is utilized to visualize only the significant alignments. ECRs are capped by a track of dark red rectangles at the top of the plot. To demarcate ECRs in relation to protein coding features, annotated genes are depicted as a horizontal blue line above the graph, with strand/transcriptional orientation indicated by the inclined vertical lines. Blue boxes along the line correspond to positions of coding exons, while yellow boxes correspond to UTRs. Peaks within the conservation profile that correspond to these exons are similarly colored within the plot. Peaks within the conservation profile that do not correspond to transcribed sequences are highlighted in red if they are intergenic or pink if they lie within an intron. Green bars on the bottom axis of the plot shows the position of repetitive elements in the base genome and this annotation is shaded to the top of the plot in gray. |
| What kinds of sequence features correspond to ECRs? |
|
Coding exons.
A large fraction of the ECRs identified in any genome alignment
correspond
to conserved protein coding exons (blue bar in the conservation plot).
Due to the functional significance of protein-coding sequences, coding exons
are generally under strong selection pressure to stay unchanged. Therefore,
while the neutral background diverges and 'disappears' from the conservation
plot as the evolutionary distance between two genomes is increased, coding
exons often remain as prominent conserved-sequence peaks. It is usual
to observe
a single, unbroken horizontal alignment line corresponding to coding exons
since insertions and deletions (gaps in the alignment) that will change the
translation frame are not tolerated. Novel genes. Despite recent advances in annotation of the human genome, there are still many genes that remain unknown. The ECR Browser provides graphical annotation of gene predictions above the track of ECRs so that conservation levels of predicted exons can be scrutinized. Since most coding exons are conserved in vertebrate alignments, the ECR browser therefore represents a tool for finding and evaluting novel genes and unannotated alternative exons. In some cases the conservation profile mimics either partially or completely a gene prediction transcript and provides additional evidence that the predicted gene is a real, functional gene. In the example below, 'chr5.11.006.a' Twinscan exons within a gene model with no known-gene counterpart correspond perfectly to a cluster of ECRs within the region, providing extra evidence that this prediction corresponds to a functional gene. Additional ECRs conserved inbetween Twinscan exons could represent candidate exons for an alternatively spliced transcript, or potentially, to regulatory elements within the gene.
Promoters and enhancers. For some genes, transcription is driven partly or even primarily by enhancer elements located immediately upstream of the promoter. These elements, if they have remained conserved in two compared species, can be easily identified as 'red peaks' located near and upstream of the 5' end of the gene. In the example below, the IL4 gene promoter / proximal enhancer region is visualized.
Distant regulatory elements. Additional function is hidden in elements that lie far away from genes and regulate the spatial and temporal transcription patterns of neighboring genes. Regulatory ECRs are often conserved through evolution, and excellent candidates for such distant regulatory elements can be identified in ECR Browser as well conserved elements located between the annotated genes. The example plot above displays an experimentally verified regulatory element of IL4 cytokine (Loots GG et al., Science. 2000 Apr 7;288(5463):136-40) that is located ~10kb upstream of the transcription start site of IL4. It is worth mentioning that this distant regulatory element also drives expression of a second cytokine gene, called IL5, that is located ~120kb away. Active regulatory elements can in fact be located hundreds of kb away from the genes that they control, especially in regions of low gene density (see below). |
| Why compare higher vertebrates with fishes? |
|
In the examples above, ECRs were identified as conserved
peaks in comparison of human and mouse DNA. Indeed, human/mouse comparative
sequence alignment has provided an invaluable tool for functional-element
annotation in both genomes. However, because different regions
of vertebrate
genomes appear to be diverging at very different evolutionary rates, no single
type of two-way comparison can be applied with guaranteed success to all
genomic loci. In certain regions, human-mouse conservation is too high
overall for alignments to usefully single-out specific conserved elements
for further study. For example, in a recent study of human gene deserts,
Nobrega M, Ovcharenko I, et al. (Science. 2003 Oct 17;302(5644):413) found
that in such regions, human/mouse comparative alignments often yeild thousands
of non-coding ECRs per gene. These authors discovered that distant evolutionary
comparisons, in this case between human and pufferfish, provided a highly
efficient way to sift through this multitude of ECRs to find those with highest
probability of function. Nine human/fish identified ECRs were shown
to be functional enhancers of the DACH locus, with the potential
to recapitulate the complex developmental expression pattern of the gene.
The most distant enhancer was found as far away as 1 megabase from
the transcriptional start point of DACH. In other regions,
human-fish comparisons may yield no conserved elements at all, even near
genes with deeply conserved function. As these examples illustrate,
there is no ideal pairwise comparison or single set of rules
regarding evolutionary
distance between aligned genomes that will permit all functional elements
to be identified in a region of interest. For that reason, ECR browser provides
access to multiple pairwise genomic comparisons so the user can chose the
most suitable combination for analysis of each particular locus. |
| Genomes compared |
Present version of the ECR Browser (04/04/2004)
contains comparative alignments of 10 different genomes - human, mouse, rat,
chicken, frog, 3 fishes (Fugu fish, Tetraodon, and Zebrafish), and 2 fruitflies.
The chart below represents
all the avaliable genome comparisons. An arrow headed from a genome indicates
that this genome can be utilized as a base genome in the browser. The ending
position of an arrow indicates a genome that was aligned with the base genome.
For example, human genome was aligned with all the other genomes, while the
Fugu genome was aligned with the
human, mouse, and zebrafish genomes only.
|
| How to start using the ECR Browser |
The fastest
way to start navigating the ECR Browser is to type in the gene name into the
'Jump to' location form. Otherwise you can indicate an absolute chromosomal
location using this form in this format (to describe chromosome 17 between
locations 1000-2000 bp): chr17:1000-2000.
If you type in a partial gene name all genes named similarly to that
search pattern will be retrieved and listed, with full names and locations
of these genes in the base genome linked directly to their visualization
profiles in the ECR Browser. For example, if you type in 'GATA' into the
location form while using the human genome as a base genome and click on
'Submit' button, the new page will appear that has a list of GATA1, GATA2,
... GATA6 genes, their descriptions and the links to the browser.
|
| Selecting the base genome |
The top bar of the ECR Browser contains several
links that provide underlying data access and permit modification of parameters
establishing the scheme of the ECR Browser functionality. The left-most option,
namely 'Base Genome', allows to change the base genome utilized by the browser.
When the base genome is switched, all chomosome coordinates displayed by
the browser will correspond to the selected base genome; the list of available
chromosomes will be limited by the sequenced structure of the genome and
the graphical display of genes and other features will correspond to annotation
in that genome. The user can flip back and forth between different
base genomes to view structure and gene content, annotation and other features
attributed to each species.
|
| Browser settings |
|
The second option from the left, called 'Browser
Settings',
on the top bar of the ECR Browser provides the flexibility to dynamically
change the parameters utilized to generate conservation profiles. The user
has an option to visualize conservation using any selection of available
genomes that have been aligned to the base genome. For example, it
is possible to visualize conservation of the human sequence with only rodents,
only fishes, one rodent and one fish, or all of the species vs the human.
The Browser settings option also allows the user to change the style
of the conservation graph, to view a "smooth-graph" (peaks) or "Pip-plot"
(bars) display (details on different types of conservations graphs
are described
in the next sections). Several gene prediction tracks are available and can
be selected in addition to the main RefSeq gene annotation; the availability
of these tracks depends on the availability of corresponding data at the
UCSC Genome Browser. To provide an effective 'zoom in' effect that will also allow for a visualization of a long genomic locus in a single window at the same time, ECR Browser permits the conservation profile to be split into several layers. Each layer represents a part of the visualized genomic locus, the length of which and relative position within the viewed locus are marked by numbers under each track. The total number of layers is defined by the user and the 'Layer height' setting defines the height of a single layer. That value multiplied by the 'Number of layers' will define the total height of the ECR Browser image. The ECR Browser detects Evolutionary Conserved Regions (ECRs) in a dynamic manner. While 100bps and 70% identity thresholds define a default setting for the minimal length and minimal identity for an alignment to be called an ECR, the user also has an option to change these parameters. This way, the detection of only very long ECRs or only highly conserved ECRs, for example, can be selected. It is important to use non-default ECR detection parameters to properly analyze alignments between highly similar or very divergent sequences, such as mouse-rat or human-fish alignments, or in regions that have been subjected to very different kinds of evolutionary pressures.
|
| List of ECRs in the locus |
|
The absolute genome positions for all the ECRs detected
in the visualized locus in addition to the ECR detailes are available using
the 'ECRs' link from the top bar of the ECR Browser. ECRs are sorted by the
species that were selected by the user to be utilized in the ECR
Browser conservation
plots. In case there are multiple loci in one of the species detected to
contain significant homology to a position in the base genome, then a list
of ECRs corresponding to all homologous loci will be presented. In
the following example, the human (hg16) 'chr17:48,543,517-485,547,000' locus
was compared with the mouse (mm3) and Fugu
(fu3) genomes. There were 6 human-mouse ECRs detected that originate
from the mouse chr11 sequence. Also two Fugu loci were found to match this human
region and those are from scaffold_965 and scaffold_1247 in the Fugu (fu3 or version 3) assembly from
the JGI. The scaffold _965 seems to be the orthologous counterpart in Fugu for this human locus not only because
it has more ECRs that then scaffold_1247, but also the ECRs produced by the
comparison with the scaffold_965 are longer and demonstrate higher level
of sequence identity. The genomic position listed for every ECR corresponds to the position of the match in the base genome. It is linked directly to the underlying sequence from the base genome.
|
| Underlying DNA sequence |
The 'DNA' link of the top bar of the ECR Browser provides
with an access to the genomic sequence of the base genome that underlies the
conservation plot. The sequence obtained by following this link will be in
upper case except for the regions that correspond to repetitive elements.
These repeat regions are in the lower-case letters. (Repeat annotation is
not available for the Fugu genome
currently).
|
| Navigating by synteny links, accessing alignments, dot-plots and annotation of conserved transcription factor binding sites |
|
Synteny relationships underlying ECR Browser conservation
plots can be displayed using the 'Synteny/Alignments' link of the top bar.
In the example below the 35kb region from the chr17 in the human (hg16) genome
is analyzed for the matches from other genomes that are listed one
after another.
Every homologous region from a matching genome contains a 'blue-gray' homology
profile, where the blue color corresponds to the region of synteny in the
base genome. We observed that only two thirds of this human locus recognizes
a clearly homologous region in rodent genomes. The synteny relationships
with fishes are even more localized, limited to a small central segment in
the human locus. The 'Position' column provides the precise coordinates
of the matching region in the aligned genome; it is is linked to a version
of the ECR Browser that uses the sequence of the selected species as a base
genome. By following the 'Position' link it is therefore possible to
visualize the same functional region in multiple species. For example, if
you select the 'GATA3' gene visualization using the human as a base genome,
then you can visualize the same gene in another species (even if the homologous
gene has not yet been annotated in that species!). The 'Length' column displays the size of the homologous region and can be effectively used to detect expansion/shrinkage of specific regions in different evolutionary lineages. The 'Alignments/Graphs' link will forward a homologous alignment to the zPicuture tool (http://zpicture.dcode.org/), which provides access to several additional options to manipulate the sequence and to visualize alignments. One zPicture options creates a similarity dot-plot for an alignment that will visualize different order and spacing of regions within the two sequences being compared. It is capable of detecting reshuffling and reverse-complementation events, for example. The 'Binding sites' link connects the ECR Browser with the rVista tool (http://rvista.dcode.org/), which is designed to identify and annotate evolutionarily conserved transcription factor binding sites in the alignments. rVista will exclude up to 95% false positive transcription factor binding sites (TFBS) predictions while maintaining high sensitivity of the search.
|
| Re-centering at a given location |
|
A mouse click at any position of the ECR Browser plot
will result in a re-centering of the conservation plot at a given location.
(This is a feature similar to mapquest.com and other map manipulation tools).
For example, if you are interested in a detailed visualization of the GATA3
promoter regions, you can visualize GATA3 gene first, click on the 5' end
of the gene (that will re-center on the transcription start site of the gene)
and then zoom in X times. |
| Zooming and shifting |
Several rounded buttons at the bottom of the ECR Browser
plot are responsible for zooming and shifting functions.
|
| Moving inside of the chromosome and from one chromosome to another |
|
At the top of the ECR Browser graphical display, all
the chromosomes of the species represented by the selected base genome are
listed as hotlinked numbers. The active chromosome is highlighted by
a different
background color. It is possible to jump from one chromosome to another by
clicking on the corresponding chromosome symbol. Also, at the left of the
ECR Browser graphical display is shown an actively linked karyotypic image
of the selected chromosome. The position of the genomic locus being displayed
within the chromosome is depicted as a red bar on the chromosome
image. Unsequenced,
heterochromatic regions (corresponding to centromeres and telomeres) are
shown as thinner regions in the chromosome image. The chromosome scale, in
Mb, is shown immediately to the right of the chromosome image. A mouse click
on the active chromosome image will result in moving the ECR display to the
location corresponding to the mouse click. |
| 'Grab ECR' feature |
|
Sequence and alignment details for every highlighted
ECR on the ECR Browser conservation plot can be obtained using the 'Grab
ECR' feature of the browser. A mouse click on the 'Grab ECR' button (which
changes the color of the button after the browser reload), followed by a
second mouse click on any colored peak (ECR) on the plot results in appearance
of a new web page describing the ECR corresponding to that peak. Chromosomal
location, length, percent identity of the pairwise alignment, and GC content
of the ECR are given. In addition the full alignment is visualized
and sequences corresponding to that ECR in both base and aligned species
are shown. Sequences and alignments from other species can be obtained
by using the "Grab ECR" feature to retrieve a peak from the conservation
plot depicting alignments with the genome of that species. An additional
link can be used to forward the ECR alignment to rVista
(http://rvista.dcode.org),
a tool designed for detection of evolutionary conserved transcription factor
binding sites in of that ECR. In addition to these functions links to the
oligo/primer design tool are provided for the base and the second
sequences. Please note that the 'popups' have to be allowed in your browser for the 'ecrbrowser.dcode.org' web-site in order for the 'Grab ECR' function to work properly. Otherwise the new window with the detailed ECR description will not show up after you click on a conserved element.
|
| Dynamic link to the UCSC Browser |
|
<UCSC Browser> button, located on the right side
of the conservation plot, provides dynamic access to detailed annotation of
the genomic locus that is provided by the UCSC Genome Browser. The
Base genome selection, its freeze and an exact location data will be forwarded
to the UCSC Genome Browser to display exactly the same region as that being
viewed in the ECR Browser. Using this function it is possible to study many
additional funcitonal annotation layers for any locus that are available
in the UCSC Genome Browser. Those include annotation of mRNA, EST, SNPs,
detailed gene annotation, etc. Please note that this link is not functional
for base genomes that were not obtained from UCSC (e.g. at present,
this includes fish genomes). |
| Pip-plot vs Smooth-graph |
Any ECR Browser profile can be visualized as either
a Pip-plot or Smooth-graph. The main difference between these two types of
visualization in the method of constructing the black conservation graph.
In the case of a Pip-plot every ungapped aligment is visualized as a separate
horizontal line. The length of the line corresponds to the length of
the alignment,
while its height corresponds to the percent identity of the
alignment. Smooth-graphs
are constructed by using a sliding window of 100bps through the alignment.
Such a window centered at every nucleotide in the base sequence is used to
calculate the number of matches inside of this window. This number provides
with a percent identity in a sliding window centered at a given position.
Percent identity counts in a sliding window are utilized to calculate the
height of the smooth conservation graph at each point. Basically, smooth-graph
is a smooth average of the Pip-plot. Smooth-graphs present a simplified and
clearer view in the conservation profile but loses information regarding
gap distribution in the alignment.
|
| Changing the ECR Browser image width |
|
It is very easy to chage the ECR Browser image width.
Just resize your browser window and the ECR Browser will mimic the change
in the browser window width by changing the width of the browser image to
cover all the space in the browser window. |
| Genome alignment |
'Genome alignment' feature of the ECR Browser is designed
to provide the user with a capability to map a user-submitted sequence to
one of the base genomes (either human, mouse, rat or Fugu genome) and to align the submitted
sequence with the homologous region from the chosen genome. The input sequence
could be submitted as a FASTA file or can be automatically downloaded from
the GenBank by a known accession number.
|
| ECR Browser legend |
|
A detailed annotation of the features annotated in the
ECR Browser in addition to the functional buttons is available at the ECR Browser Legend plot. |
| Questions or comments? |
|
dcode@ncbi.nlm.nih.gov. |