Molecular Biology Protocols

 
Marine DNA Sequencing and Analysis Center
Mount Desert Island Biological Laboratory


Last update May 25, 2005. 
Send comments to dtowle@mdibl.org

 

10.  Analysis of DNA Sequences

 

Editing with Chromas

 

The sequence files that you will receive from the Sequencing Center must be edited using Chromas software, available online at http://www.technelysium.com.au/index.html.  ChromasPro can be downloaded as a 60-day trial version, and then registered (and purchased) at a later date.  The sequence chromatograms, designated as *.ab1 files, should be opened individually in Chromas for editing.

 

The ABI Prism automated sequencer does its best to identify evenly-spaced peaks in the electrophoretic pattern and where the pattern is clean the job is straightforward:

 

 

However, at the beginning and end of the electrophoresis, the pattern of peaks usually becomes difficult to interpret and these sections must be clipped away:

 

 

 

Chromas will allow you to do just that by selecting a “left cutoff” and “right cutoff”, as well as assign identifications to peaks that the ABI software cannot identify.  Make any modifications of this sort in lower case, so that you can recognize such editorial changes later.  You may “delete cutoff sequences” before you save the resulting file.

 

Once you are satisfied with the validity of the sequence, copy the sequence in FASTA format and paste it into any other software package, including Word.  You will see that it is simply a sequence of nucleotides. 

 

Finding an Open Reading Frame

 

Online at http://www.ncbi.nlm.nih.gov/gorf/gorf.html you may copy and paste your FASTA sequence to identify potential open reading frames (ORFs) that may code for a protein.  If your sequence is an accurate one and if it codes for a protein, you should see one long region in a particular reading frame.  The other reading frames are unlikely to encode amino acid sequence because of frequent stop codons.

 

Alternatively, you can use EditSeq or SeqBuilder, components of Lasergene 6, to search for open reading frames.

 

BLASTing GenBank

 

Using the same FASTA sequence, you can search GenBank for any similar sequences using BLAST at http://www.ncbi.nlm.nih.gov/BLAST/.  There are several versions of the BLAST search – the most useful may be BLASTX, where the nucleotide sequence is translated using all possible reading frames and the resulting putative amino acid sequences are compared against everything in the database.  This is the point at which you may be able to proclaim a likely identification of your sequence based on evolutionary relationships to sequences of other species that may already be in GenBank.  BLASTX will also indicate the orientation of the reading frame.

 

Alternatively, you can use EditSeq or SeqBuilder, components of Lasergene 6, to automate BLAST searches.

 

Translation

 

Once you are confident of the proper reading frame of a cDNA sequence, you can obtain a translation of the appropriate region with the Translation Machine at  http://www2.ebi.ac.uk/translate/. 

 

Sequence Assembly

 

If you have two or more sequences of the same amplification product, you can assemble these sequences using SeqMan, a component of Lasergene 6.  SeqMan will also automate the trimming of the original chromatograms, although you may wish to examine the results carefully before accepting them without question.  Since SeqMan is not programmed with information relating to whether a particular sequence was generated with a forward or with a reverse primer, you should look at the resulting assembly to see if it makes sense with regard to open reading frame and orientation.  You may request a reverse complement to correct the assembly if necessary.

 

Multiple Alignment

 

As you find related sequences through BLAST, you may wish to generate a multiple alignment to compare similarities with your own sequence.  Refer back to Protocol 1 for instructions!

 

Amino Acid Sequence Analysis

 

Regions of α-helix or transmembrane domains in your newly translated protein can be predicted using a variety of tools available at http://iubio.bio.indiana.edu/soft/molbio/ibmpc/antheprot-readme.html.  Alternatively, you may use Protean, a component of the Lasergene 6 package.