ECR Browser :: Introduction and Instructions on usage


Please note that these instructions are outdated and do not describe all available ECR Browser features...


Contents
   Introduction
   What kinds of sequence features correspond to ECRs?
   Why compare higher vertebrates with fishes?
   Genomes compared

   How to start using the ECR Browser
   Selecting the base genome
   Browser settings
   List of ECRs in the locus
   Underlying DNA sequence

   Navigating by synteny links, accessing alignments, dot-plots and annotation of conserved transcription factor binding sites
   Re-centering at a given location
   Zooming and shifting
   Moving within a chromosome or from one chromosome to another
   Grab ECR feature
   Dynamic link to the UCSC Browser
   Pip-plot vs Smooth-plot
   Changing the ECR Browser image width

   Genome alignment

   ECR Browser legend
   Questions or comments?

Introduction
    ECR Browser is a dynamic graphical interface to Evolutionary Conserved Regions (ECRs) in genomes of sequenced species including human, mouse, rat and Fugu. The conservation profile created by aligning one sequence (the "base") with all others in a pairwise fashion is graphically displayed for any locus in a genome.  ECRs are identified as regions of high sequence identity against a neutrally evolving background. By scanning an alignment the browser detects and highlights sequence elements of significant length that are conserved above a specified level of sequence identity between the two genomes (as defined by the user-defined parameters) and highlights them as ECRs. Visually ECRs are represented as colored peaks on a graph, with the x-axis representing positions in the base genome and the y-axis representing % identity between the base and aligned genomes at that specified position.   Below is an example of the ECR Browser visualization of the human/mouse conservation profile of the human APMCF1 gene.
pip-plot

This display shows a pip-type conservation plot in which human DNA (base sequence) is represented on the horizontal axis, while multiple ungapped blastz alignments are displayed in the graph as short horizontal black lines. The length of the horizontal alignment line correpsponds to the alignment length in the base sequence, while its vertical position corresponds to the level of nucleotide identity in this alignment. A vertical axis cut-off of 50% to 100% identity is utilized to visualize only the significant alignments. ECRs are capped by a track of dark red rectangles at the top of the plot.

To demarcate ECRs in relation to protein coding features, annotated genes are depicted as a horizontal blue line above the graph, with strand/transcriptional orientation indicated by the inclined vertical lines. Blue boxes along the line correspond to positions of coding exons, while yellow boxes correspond to UTRs.  Peaks within the conservation profile that correspond to these exons are similarly colored within the plot.  Peaks within the conservation profile that do not correspond to transcribed sequences are highlighted in red if they are intergenic or pink if they lie within an intron. Green bars on the bottom axis of the plot shows the position of repetitive elements in the base genome and this annotation is shaded to the top of the plot in gray.


What kinds of sequence features correspond to ECRs?
    Coding exons.  A large fraction of the ECRs identified in any genome alignment correspond to conserved protein coding exons (blue bar in the conservation plot).   Due to the functional significance of protein-coding sequences, coding exons are generally under strong selection pressure to stay unchanged. Therefore, while the neutral background diverges and 'disappears' from the conservation plot as the evolutionary distance between two genomes is increased, coding exons often remain as prominent conserved-sequence peaks. It is usual to observe a single, unbroken horizontal alignment line corresponding to coding exons since insertions and deletions (gaps in the alignment) that will change the translation frame are not tolerated.

    Novel genes. Despite recent advances in annotation of the human genome, there are still many genes that remain unknown.  The ECR Browser provides graphical annotation of gene predictions above the track of ECRs so that conservation levels of  predicted exons can be scrutinized.  Since most coding exons are conserved in vertebrate alignments, the ECR browser therefore represents a tool for finding and evaluting novel genes and unannotated alternative exons.  In some cases the conservation profile mimics either partially or completely a gene prediction transcript and provides  additional evidence that the predicted gene is a real, functional gene. In the example below, 'chr5.11.006.a' Twinscan exons within a gene model with no known-gene counterpart correspond perfectly to a cluster of ECRs within the region, providing extra evidence that this prediction corresponds to a functional gene. Additional ECRs conserved inbetween Twinscan exons could represent candidate exons for an alternatively spliced transcript, or potentially, to regulatory elements within the gene.
novel genes

    Promoters and enhancers.  For some genes, transcription is driven partly or even primarily by enhancer elements located immediately upstream of the promoter. These elements, if they have remained conserved in two compared species, can be easily identified as 'red peaks' located near and upstream of the 5' end of the gene. In the example below, the IL4 gene promoter / proximal enhancer region is visualized.

regulatory elements

    Distant regulatory elements. Additional function is hidden in elements that lie far away from genes and regulate the spatial and temporal  transcription patterns of neighboring genes.  Regulatory ECRs are often conserved through evolution, and excellent candidates for such distant regulatory elements can be identified in  ECR Browser as well conserved elements located  between the annotated genes.  The example plot above displays an experimentally verified regulatory element of IL4 cytokine (Loots GG et al., Science. 2000 Apr 7;288(5463):136-40) that is located ~10kb upstream of the transcription start site of IL4. It is worth mentioning that this distant regulatory element also drives expression of a second cytokine gene, called IL5, that is located ~120kb away.  Active regulatory elements can in fact be located hundreds of kb away from the genes that they control, especially in regions of low gene density (see below).

Why compare higher vertebrates with fishes?
    In the examples above, ECRs were identified as conserved peaks in comparison of human and mouse DNA. Indeed, human/mouse comparative sequence alignment has provided an invaluable tool for functional-element annotation in both genomes. However, because different regions of  vertebrate genomes appear to be diverging at very different evolutionary rates, no single type of two-way comparison can be applied with guaranteed success to all genomic loci.  In certain regions, human-mouse conservation is too high overall for alignments to usefully single-out specific conserved elements for further study.  For example, in a recent study of human gene deserts, Nobrega M, Ovcharenko I, et al. (Science. 2003 Oct 17;302(5644):413) found that in such regions, human/mouse comparative alignments often yeild thousands of non-coding ECRs per gene. These authors discovered that distant evolutionary comparisons, in this case between human and pufferfish, provided a highly efficient way to sift through this multitude of ECRs to find those with highest probability of function.  Nine human/fish identified ECRs were shown to be functional enhancers of the  DACH locus, with the potential to recapitulate the complex developmental expression pattern of the gene.   The most distant enhancer was found as far away as 1 megabase from the transcriptional start point of  DACH.  In other regions, human-fish comparisons may yield no conserved elements at all, even near genes with deeply conserved function.  As these examples illustrate, there is no ideal pairwise comparison or single set of rules regarding evolutionary distance between aligned genomes that will permit all functional elements to be identified in a region of interest. For that reason, ECR browser provides access to multiple pairwise genomic comparisons so the user can chose the most suitable combination for analysis of each particular locus.

Genomes compared
    Present version of the ECR Browser (04/04/2004) contains comparative alignments of 10 different genomes - human, mouse, rat, chicken, frog, 3 fishes (Fugu fish, Tetraodon, and Zebrafish), and 2 fruitflies. The chart below represents all the avaliable genome comparisons. An arrow headed from a genome indicates that this genome can be utilized as a base genome in the browser. The ending position of an arrow indicates a genome that was aligned with the base genome. For example, human genome was aligned with all the other genomes, while the Fugu genome was aligned with the human, mouse, and zebrafish genomes only.
compared genomes

How to start using the ECR Browser
   The fastest way to start navigating the ECR Browser is to type in the gene name into the 'Jump to' location form. Otherwise you can indicate an absolute chromosomal location using this form in this format (to describe chromosome 17 between locations 1000-2000 bp):   chr17:1000-2000.   If you type in a partial gene name all genes named similarly to that search pattern will be retrieved and listed, with full names and locations of these genes in the base genome linked directly to their visualization profiles in the ECR Browser. For example, if you type in 'GATA' into the location form while using the human genome as a base genome and click on 'Submit' button, the new page will appear that has a list of GATA1, GATA2, ... GATA6 genes, their descriptions and the links to the browser.
ecr browser navigation
The location form also contains the information on the base genome that is being used currently (in this example it is the hg16 freeze of the human genome in the notation adopted at UCSC Genome Browser) as well as the total length of the genomic locus being visualized (that is 89,980 bps in this case).

Selecting the base genome
    The top bar of the ECR Browser contains several links that provide underlying data access and permit modification of parameters establishing the scheme of the ECR Browser functionality. The left-most option, namely 'Base Genome', allows to change the base genome utilized by the browser. When the base genome is switched, all chomosome coordinates displayed by the browser will correspond to the selected base genome; the list of available chromosomes will be limited by the sequenced structure of the genome and the graphical display of genes and other features will correspond to annotation in that genome.  The user can flip back and forth between different base genomes to view structure and gene content, annotation and other features attributed to each species.
base genome

Browser settings
    The second option from the left, called 'Browser Settings', on the top bar of the ECR Browser provides the flexibility to dynamically change the parameters utilized to generate conservation profiles. The user has an option to visualize conservation using any selection of available genomes that have been aligned to the base genome.   For example, it is possible to visualize conservation of the human sequence with only rodents, only fishes, one rodent and one fish, or all of the species vs the human.   The Browser settings option also allows the user to change the style of the conservation graph, to view a "smooth-graph" (peaks) or  "Pip-plot" (bars) display (details on different types of conservations graphs are described in the next sections). Several gene prediction tracks are available and can be selected in addition to the main RefSeq gene annotation; the availability of these tracks depends on the availability of corresponding data at the UCSC Genome Browser. 
    To provide an effective 'zoom in' effect that will also allow for a visualization of a long genomic locus in a single window at the same time, ECR Browser permits the conservation profile to be split into several layers. Each layer represents a part of the visualized genomic locus, the length of which and relative position within the viewed locus are marked by numbers under each track.  The total number of layers is defined by the user and the 'Layer height' setting defines the height of a single layer. That value multiplied by the 'Number of layers' will define the total height of the ECR Browser image.
    The ECR Browser detects Evolutionary Conserved Regions (ECRs) in a dynamic manner. While 100bps and 70% identity thresholds define a default setting for the minimal length and minimal identity for an alignment to be called an ECR, the user also has an option to change these parameters. This way, the detection of only very long ECRs or only highly conserved ECRs, for example, can be selected. It is important to use non-default ECR detection parameters to properly analyze alignments between highly similar or very divergent sequences, such as mouse-rat or human-fish alignments, or in regions that have been subjected to very different kinds of evolutionary pressures.
browser settings

List of ECRs in the locus
    The absolute genome positions for all the ECRs detected in the visualized locus in addition to the ECR detailes are available using the 'ECRs' link from the top bar of the ECR Browser. ECRs are sorted by the species that were selected by the user to be utilized in the ECR Browser conservation plots. In case there are multiple loci in one of the species detected to contain significant homology to a position in the base genome, then a list of  ECRs corresponding to all homologous loci will be presented. In the following example, the human (hg16) 'chr17:48,543,517-485,547,000' locus was compared with the mouse (mm3) and Fugu (fu3) genomes. There were 6 human-mouse ECRs  detected that originate from the mouse chr11 sequence. Also two Fugu loci were found to match this human region and those are from scaffold_965 and scaffold_1247 in the Fugu (fu3 or version 3) assembly from the JGI. The scaffold _965 seems to be the orthologous counterpart in Fugu for this human locus not only because it has more ECRs that then scaffold_1247, but also the ECRs produced by the comparison with the scaffold_965 are longer and demonstrate higher level of sequence identity.
    The genomic position listed for every ECR corresponds to the position of the match in the base genome. It is linked directly to the underlying sequence from the base genome.
list of ECRs

Underlying DNA sequence
    The 'DNA' link of the top bar of the ECR Browser provides with an access to the genomic sequence of the base genome that underlies the conservation plot. The sequence obtained by following this link will be in upper case except for the regions that correspond to repetitive elements. These repeat regions are in the lower-case letters. (Repeat annotation is not available for the Fugu genome currently).
DNA sequence

Navigating by synteny links, accessing alignments, dot-plots and annotation of conserved transcription factor binding sites
    Synteny relationships underlying ECR Browser conservation plots can be displayed using the 'Synteny/Alignments' link of the top bar. In the example below the 35kb region from the chr17 in the human (hg16) genome is analyzed for the matches from other genomes that are listed one after another. Every homologous region from a matching genome contains a 'blue-gray' homology profile, where the blue color corresponds to the region of synteny in the base genome. We observed that only two thirds of this human locus recognizes a clearly homologous region in rodent genomes. The synteny relationships with fishes are even more localized, limited to a small central segment in the human locus.  The 'Position' column provides the precise coordinates of the matching region in the aligned genome; it is is linked to a version of the ECR Browser that uses the sequence of the selected species as a base genome.  By following the 'Position' link it is therefore possible to visualize the same functional region in multiple species. For example, if you select the 'GATA3' gene visualization using the human as a base genome, then you can visualize the same gene in another species (even if the homologous gene has not yet been annotated in that species!).
    The 'Length' column displays the size of the homologous region and can be effectively used to detect expansion/shrinkage of specific regions in different evolutionary lineages.
    The 'Alignments/Graphs' link will forward a homologous alignment to the zPicuture tool (http://zpicture.dcode.org/), which provides access to several additional options to manipulate the sequence and to visualize alignments. One zPicture options creates a similarity dot-plot for an alignment that will visualize different order and spacing of regions within the two sequences being compared. It is capable of detecting reshuffling and reverse-complementation events, for example.
    The 'Binding sites' link connects the ECR Browser with the rVista tool (http://rvista.dcode.org/), which is designed to identify and annotate evolutionarily conserved transcription factor binding sites in the alignments. rVista will exclude up to 95% false positive transcription factor binding sites (TFBS) predictions while maintaining high sensitivity of the search.  
synteny links

Re-centering at a given location
    A mouse click at any position of the ECR Browser plot will result in a re-centering of the conservation plot at a given location. (This is a feature similar to mapquest.com and other map manipulation tools). For example, if you are interested in a detailed visualization of the GATA3 promoter regions, you can visualize GATA3 gene first, click on the 5' end of the gene (that will re-center on the transcription start site of the gene) and then zoom in X times.

Zooming and shifting
    Several rounded buttons at the bottom of the ECR Browser plot are responsible for zooming and shifting functions.
zooming and shifting
Blue buttons labeled with < and > symbols will shift to the left and to the right, respectively,  preserving 1/3 of the original interval and expanding it appropriately while keeping the total length of the locus visualized being unchanged. Yellow buttons provide zooming IN functions, while the green ones are responsible for zoom OUT. It is possible to zoom 1.5x, 3x and 10x times using the browser.  Note: zooming out to very long intervals can result in significant delays in the response time becase the browser will need to scan many different alignments to create a conservation profile.

Moving inside of the chromosome and from one chromosome to another
    At the top of the ECR Browser graphical display, all the chromosomes of the species represented by the selected base genome are listed as hotlinked numbers. The active chromosome is highlighted by a different background color. It is possible to jump from one chromosome to another by clicking on the corresponding chromosome symbol. Also, at the left of the ECR Browser graphical display is shown an actively linked karyotypic image of the selected chromosome. The position of the genomic locus being displayed within the chromosome is depicted as a red bar on the chromosome image. Unsequenced, heterochromatic regions (corresponding to centromeres and telomeres) are shown as thinner regions in the chromosome image. The chromosome scale, in Mb, is shown immediately to the right of the chromosome image. A mouse click on the active chromosome image will result in moving the ECR display to the location corresponding to the mouse click.

'Grab ECR' feature
    Sequence and alignment details for every highlighted ECR on the ECR Browser conservation plot can be obtained using the 'Grab ECR' feature of the browser. A mouse click on the 'Grab ECR' button (which changes the color of the button after the browser reload), followed by a second mouse click on any colored peak (ECR) on the plot results in appearance of a new web page describing the ECR corresponding to that peak. Chromosomal location, length, percent identity of the pairwise alignment, and GC content of the ECR are given.   In addition the full alignment is visualized and sequences corresponding to that ECR in both base and aligned species are shown.  Sequences and alignments from other species can be obtained by using the "Grab ECR" feature to retrieve a peak from the conservation plot depicting alignments with the genome of that species.  An additional link can be used to forward the ECR alignment to rVista (http://rvista.dcode.org), a tool designed for detection of evolutionary conserved transcription factor binding sites in of that ECR. In addition to these functions links to the oligo/primer design tool are provided for the base and the second sequences.
    Please note that the 'popups' have to be allowed in your browser for the 'ecrbrowser.dcode.org' web-site in order for the 'Grab ECR' function to work properly. Otherwise the new window with the detailed ECR description will not show up after you click on a conserved element.
GrabEcr feature

Dynamic link to the UCSC Browser
    <UCSC Browser> button, located on the right side of the conservation plot, provides dynamic access to detailed annotation of the genomic locus that is provided by the UCSC Genome Browser.  The Base genome selection, its freeze and an exact location data will be forwarded to the UCSC Genome Browser to display exactly the same region as that being viewed in the ECR Browser. Using this function it is possible to study many additional funcitonal annotation layers for any locus that are available in the UCSC Genome Browser. Those include annotation of mRNA, EST, SNPs, detailed gene annotation, etc. Please note that this link is not functional for  base genomes that were not obtained from UCSC (e.g. at present, this includes fish genomes).

Pip-plot vs Smooth-graph
    Any ECR Browser profile can be visualized as either a Pip-plot or Smooth-graph. The main difference between these two types of visualization in the method of constructing the black conservation graph. In the case of a Pip-plot every ungapped aligment is visualized as a separate horizontal line. The length of the line corresponds to the length of the alignment, while its height corresponds to the percent identity of the alignment. Smooth-graphs are constructed by using a sliding window of 100bps through the alignment. Such a window centered at every nucleotide in the base sequence is used to calculate the number of matches inside of this window. This number provides with a percent identity in a sliding window centered at a given position. Percent identity counts in a sliding window are utilized to calculate the height of the smooth conservation graph at each point. Basically, smooth-graph is a smooth average of the Pip-plot. Smooth-graphs present a simplified and clearer view in the conservation profile but loses information regarding gap distribution in the alignment.
pipPlot vs smoothGraph

Changing the ECR Browser image width
    It is very easy to chage the ECR Browser image width. Just resize your browser window and the ECR Browser will mimic the change in the browser window width by changing the width of the browser image to cover all the space in the browser window.

Genome alignment
    'Genome alignment' feature of the ECR Browser is designed to provide the user with a capability to map a user-submitted sequence to one of the base genomes (either human, mouse, rat or Fugu genome) and to align the submitted sequence with the homologous region from the chosen genome. The input sequence could be submitted as a FASTA file or can be automatically downloaded from the GenBank by a known accession number.
genomeAlignment

ECR Browser legend
    A detailed annotation of the features annotated in the ECR Browser in addition to the functional buttons is available at the ECR Browser Legend plot.

Questions or comments?
    dcode@ncbi.nlm.nih.gov.