Before you begin
Learn how to get started and sign up with Benchling here. Start this worksheet using your free academic account in order to get the most out of this worksheet. DNA sequences can be copied directly into your Benchling account. For notebook entries, you can manually copy & paste the content into a new blank entry.
Content and materials for this module were co-developed with Dr. Philip Leftwich, Biology Lecturer at the University of East Anglia (Norwich, UK)
DNA sequencing is a laboratory technique used to determine the exact sequence of bases (A, C, G, and T) in a DNA molecule. This analytical technique was first invented almost 50 years ago, and modern advancements in the field have significantly increased the processing power and its ease of use today. Understanding DNA sequence information is an indispensable tool in modern biology and is widely used to support workflows in biology, medicine, and forensics. Comparing healthy and mutated DNA sequences can provide us with insights into the functions of genes. The power in DNA sequencing is that this technique can provide answers to simple or complex questions in science. We’ll explore how DNA sequencing will allow you to create consensus sequences in Benchling that can verify the identity of a molecular construct.
You should have a solid foundation of the biochemical and structural properties of DNA and have a general understanding for DNA synthesis and replication. You also should understand how DNA sequencing samples are prepared experimentally and are typically sent for downstream analysis. Consider watching this animation that illustrates the fundamental principles behind DNA sequencing.
Align multiple sequences to produce a consensus alignment
Comment on and share alignment results
Cross-reference your sequence with the alignment search tool - BLAST
A DNA consensus sequence is a calculation of the most frequent nucleotide residues that appear at each position within a sequence alignment. It can represent multiple sequencing alignments from a DNA source where you reconstruct the identity of a construct through overlapping DNA fragments.
In this example, we’ll be working to create a consensus sequence from multiple reads generated with the Sanger method for DNA sequencing.
For Sanger sequencing we need to have:
A region of known DNA adjacent to our region of unknown sequence
Sequencing primers which have been designed to bind to our region of known DNA with the 5’ to 3’ orientation pointing towards your region to be sequenced. Sequencing primers are designed using the same criteria we would use for PCR primers.
Identify appropriate sequencing primers from empty vector
In order to send your sample off for DNA sequencing, you will want to find appropriate sequencing primer(s) that flank your unknown insert and will give you the readout you need to create a consensus sequence alignment.
Let’s imagine that you have the following information about some plasmid cloning .
“We cloned the insert into a pET24-a vector using BamHI and EcoRI restriction sites.”
From this, we have no way of knowing what the cloned insert is supposed to be, but we do have the information required to design a sequencing experiment - we have a named plasmid vector (pET24). With this known DNA sequence and information on where in the vector our new DNA is supposed to have been inserted, we can design appropriate sequencing primers.
Open this empty pET24-a plasmid which was used as the vector backbone for cloning. Don’t forget to create a copy of this sequence so you can edit it yourself and exit out of the read-only version!
With the “Digests” icon, search for “BamHI” and “EcoRI” and click on each to highlight them on the sequence. Remember, these are the cut sites that were used to digest this empty vector and ligate the unknown insert into it.
Now that you’ve found the insertion site, you would typically need to identify and design appropriate primers upstream and downstream from the insertion. In this example, we’ve already designed these primers for you.
With the “Primers” icon, inspect the primers “Fwd Seq Primer” and “Rev Seq Primer” and to assess if they would be appropriate for DNA sequencing. As a rule of thumb, Sanger DNA sequencing methods suggest your primer should start 50-60 bp upstream of the sequence of interest you want to analyze.
Create a DNA consensus sequence alignment from sequencing results
Once you have those sequencing primers, you can perform DNA sequencing on the unverified plasmid construct and obtain sequencing results that you can use to generate a consensus sequence alignment. This is usually performed by dedicated sequencing facilities, which require you to provide the sequencing primers and a clean DNA template (the unverified plasmid DNA).
Assume that you’ve sent off your sequencing reactions and have now received your sequencing results. Download and unzip these files to your local computer. Benchling can create alignments from a variety of sequencing file formats (ex. .FASTA, .txt, .ab1)
From the global menu, navigate to “Create” -> “DNA Sequence” -> “New Alignment”. This will open a new modal to create a consensus sequence alignment.
Click on “Choose File(s)” and select both your sequencing results and press “Enter”. Specify a folder to save your consensus sequence and click “Create Alignment” to finish. Consensus sequence alignments will generate a new DNA sequence (“Untitled Consensus”) from the results.
You can rename your consensus DNA sequence if you choose. Look at your sequence map and click+drag to highlight the region within the BamHI and EcoRI restriction sites. Annotate this region by right-clicking and selecting ‘Create Annotation” and naming it “Unknown Insert” This is the identity of your unknown insert from DNA sequencing!
Copy the Unknown Insert region and navigate back to your “pET-24a Empty Vector”. Clone a version of your current plasmid and rename it to “pET-24a Unknown Insert” See Molecular Cloning Methods for a reminder on how to clone.
On the new sequence, navigate to where BamHI and EcoRI sites and paste in (Ctrl / Cmd + V) the bases from before. Congrats! You’ve reconstructed the plasmid map for the construct and verified the unknown insert.
1. What was the name of the known DNA sequence in this example?
2. Where should the cut sites on the plasmid sequence be relative to the primers? 5’ or 3’?
3. The average Sanger sequencing read length is a maximum of 800bp. With this in mind, how would we sequence an insert that is larger than 1600bp (so that our forward and reverse reads have overlap to produce a consensus)?
Try to answer these question(s) on your own and check the "Solution" at the bottom.
Given the context below, try reconstructing another experimental plasmid with an unknown insert through Sanger sequencing and a consensus sequence alignment.
“We cloned the insert using this empty pBAD vector and restriction cloning with EcoRI and HindIII”
Like the previous example, sequencing primers will already be attached to the pBAD vector and you can align these sequencing results to create a consensus sequence.
Important: Consensus sequence alignments may not always generate a DNA sequence in your desired orientation but if you know where your restriction sites are, you can reconstruct the sequence very easily.
Remember to make your own copy of the pBAD vector in order to edit it.
Align sequencing results to a DNA template
In previous examples we used multiple sequencing reads to produce a consensus sequence alignment for a region of unknown DNA sequence. However, we can also compare a sample sequence against a known template. This can be useful when verifying your cloning results (see "Bacterial Transformation" module) or comparing alleles or mutations against a reference genome. In this example we will download a template sequence
The fruitfly Drosophila melanogaster is frequently used as a model organism for the study of genetics due to its ease of handling, characterised genome, and wealth of visible mutant phenotypes. Drosophila with the white eye mutation have white eyes instead of the usual dark brick red compound eyes. In this example we have sequencing reads from mutant fruit flies in the locus for the white gene, and want to compare these mutant sequences against the characterised wildtype genome.
Import a new DNA sequence from an external data base and set this as your template sequence.y Click on “Create” > ”DNA” > “Import DNA Sequences”, and type in FBgn0003996.
Assume that you’ve sent off your sequencing reactions and have now received your sequencing results. Download and unzip these files to your local computer.
From the newly created FBgn0003996 template sequence on Benchling, navigate to “Exon 1”, highlight the amino acid translation and right-click and select “Create Translation -> Forward”. This region will be important as we will see variations in the DNA sequence here that would show mutations in this gene.
Navigate to the Alignments tab and click “Create New Alignment” and you should see that the FBgn0003996 sequence is already marked as the “Template”.
Click on “Choose File(s)” and select your sequencing results and press “Enter”. Then click “Create Alignment” to finish. The results will be attached as an alignment on the template sequence.
Navigate the alignment to where the AA translation is. On the 2nd sequence (from your sequencing results) highlight the same reading frame, right-click and select “Create Translation -> Forward” and you should now be able to compare where this mutant allele diverges from the wild-type.
You can repeat this process with additional sequencing results and continue to observe any mutant variants from your reference template.
2. 3’ Sequencing occurs in a 5’ to 3’ orientation, therefore the cut sites must be 3’ relative to the orientation of the sequencing primer.
3. Many cloned genes can often be too large to have enough overlap when sequencing from each end of the insert. In this case, we would analyze our existing sequencing files and use them to design new sequencing primers within the analyzed part of our insert and do another round of DNA sequencing. This technique is known as “primer walking” and was a method used heavily with the Human Genome Project.
Congrats! You've finished the learning module: DNA Sequencing.