University of Chicago Cancer Research Center - DNA Sequencing Facility

Interpreting ABI 377 Chromatograms

Last updated on May 28, 1996

Introduction

The chromatograms that are produced by automated sequencing machines consist of sequence typically very close to the primer and extending well beyond the limit of accurate basecalling. Both ends of the sequence derived from the chromatogram need to be trimmed to remove erroneous and ambiguous bases. Additionally, some bases are inaccurately called within the 'accurate' range as well or may be called as 'N'. Often this is the result of the chemistry and enzyme used for the sequencing reaction. Many of these errors and ambiguities can be resolved by inspection of the traces. Below are listed several excerpts of chromatograms that can serve as examples of how to trim sequences and resolve ambiguous basecalls. Click here to view a high quality chromatogram image (gif format). You may also download an example chromatogram file generated at the UCCRC-DSF by clicking here. You may also download a Microsoft Word 5 file of examples showing some of the more common base calling problems compiled at Iowa State University by clicking here.

Examples

This is an example of a good chromatogram showing well-resolved peaks and no ambiguities. Generally the first several hundred bases of a chromatogram will look like this.
This is the start of a chromatogram showing peaks corresponding to unincorporated dye-terminators (dye-blobs) superimposed over and partially obscuring the real peaks. In particular notice the prominent double 'T' blobs (red) from positions 4 to 9, and the paired 'G' and 'C' blobs (black and blue) covering positions 20 to 23. Depending on incorporation and washing efficiency, dye-blobs can range in size from nothing at all to major peaks covering several real peaks. These dye-blobs appear at specific positions in the chromatogram, mostly interfering with the sequences within 30 to 40 bases from the primer (typically vector), but occasionally appearing up to several hundred bases from the primer.
This is a region of a chromatogram fairly far along the sequence where some bases in runs of 2 or more are no longer visible as single peaks. Many peaks are beginning to broaden and smear into one another, interpretation of the peaks has become more difficult, and the basecalling software has begun to use 'N's.
This is a region of a chromatogram where the traces have become too ambiguous for accurate basecalling. While some parts of this region of the chromatogram can be useful for linking to existing sequences following manual editing, it should not be considered accurate. Note that some editing changes have been made to the chromatogram and appear in the upper line in magenta.
These are examples from several chromatograms showing weak 'G' peaks that have been called incorrectly or called as 'N's. Most often weak 'G' peaks follow multiple 'A' peaks, as seen in frames 1, 2, 3 and 5 (corrected bases appear in the upper row). However, they can also appear after single 'A' peaks (frames 3 through 6) and occasionally after single 'C' peaks (frame 7). Frames 3 and 5 each show two weak 'G' peaks after single or multiple 'A' residues.
This is an example of a chromatogram with several dye blobs compared to the corresponding raw gel image of its own and two adjacent lanes. Note that the dye blobs range in intensity and can even partially obscure bands in adjacent lanes. In this particular example, an identical sample has been run two lanes down which has virtually no dye-blobs and can be used to correct the dye-blob sample. The corrections are shown below the original base calls. Note that the strongest 'C' dye-blob obscures two actual 'C' residues. In some of the other cases the dye-blob can be recognized and the true bases identified under the broad dye-blob peak. (Colors from the raw image bands correspond to the following bases: blue-G, red-C, green-A, and yellow-T.)

More specific examples will be coming...