Sequence Annotation

Johnny
Johnny
  • Updated

Before you begin

Learn how to get started and sign up with Benchling here. Start this worksheet using your free academic account in order to get the most out of this worksheet. DNA sequences can be copied directly into your Benchling account. For notebook entries, you can manually copy & paste the content into a new blank entry.

Content and materials for this module were co-developed with Dr. Philip Leftwich, Biology Lecturer at the University of East Anglia (Norwich, UK)


Overview

This worksheet teaches you how to use Benchling to view and understand sequence annotations. With sequence annotation, we can mark out important regions in DNA sequences, such as mutations, cut sites, coding regions, transcription factor binding sites or primer binding sites. Benchling generates a graphical map of your annotated sequences as you go, giving you an organized overview of the important parts of DNA in your sequence.

Prior knowledge

Annotation is simple and just requires a foundation on the types of DNA features that you might wish to label on your sequences. In this module, you will learn what some general features are such as promoters, mutations, primer binding sites or coding regions.

Learning outcomes

  • Understand the sequence, linear and plasmid tabs on Benchling

  • Identify and generate common annotation features

  • Import DNA sequences from external sources

Worked Example

In this example, we'll use the pBAD vector, a reliable and controllable system for expressing recombinant proteins in bacteria. This system is based on the araBAD operon, which controls E. coli L-arabinose metabolism. You can clone a gene of interest into the pBAD vector downstream of the araBAD promoter, which drives expression of that gene in response to L-arabinose, and is inhibited by glucose. Precise control of expression levels makes this system ideal for producing problematic proteins, such as proteins with toxicity or insolubility issues.

Having context on what the pBAD vector contains will be helpful when we visualize the sequence of the pBAD vector on Benchling and discuss several ways you can analyze and/or annotate the sequence.

Copy this pBAD Vector into your own Project on Benchling so you can edit as you please. Afterwards, navigate to your pBAD Vector sequence and you will see there are three different ways to view this sequence:

  • Sequence map

  • Linear map

  • Plasmid map

All three views have annotation features, only the sequence map shows the actual nucleotide sequence, the linear map shows the plasmid as though it is a linear sequence, while the plasmid map shows the plasmid in its native circular form.

There is no correct way to view this sequence, and you may find you have a particular preference, or that the method of viewing your sequence will be highly dependent on your experimental purpose.

From a glance, you will notice that the pBAD Vector has several annotations visible - all of which indicate important elements of your DNA sequence.

Annotation

Description

araBAD promoter

Drives transcription of the gene of interest when L-arabinose is present and glucose is absent. This promoter also controls AraC expression.

RBS

The ribosome-binding site and translation initiation element from T7 bacteriophage. This allows for efficient production of the protein of interest.

ORF

The open reading frame of your gene of interest is placed here.

rrnB terminator

Signal sequence to terminate the transcript made from the gene of interest, preventing run-on transcription.

Ampicillin

Ampicillin resistance gene. It allows the plasmid to be maintained by ampicillin selection in E. coli.

pBR322 ori

pBR322 origin of replication. Plasmids carrying this origin exist in medium copy numbers in E. coli.

araC

Encodes the regulatory protein of the E. coli araBAD operon. AraC inhibits expression from the araBAD promoter in the absence of L-arabinose or the presence of glucose, and activates transcription in the presence of L-arabinose and the absence of glucose.

You can scroll through the map, or if you click on the “Annotations” button you will find a list of all the annotated features, their Location, Length of feature and the color of the annotation. Here you can toggle visibility of each feature on/off or edit each annotation manually.

Question(s)

1. What are the “Locations” (range in the sequence) and Lengths (number of basepairs) for the following elements?

Ampicillin

AraC

pRB322_origin

Try to answer these question(s) on your own and check the "Solution" at the bottom.

Practice

The pBAD vector is designed to be used for the expression of recombinant proteins in bacteria, in order to do this it will contain a multiple cloning site (MCS), a segment of DNA which contains up to 20 restriction sites, in order to allow the easy cloning of foreign DNA into the vector and place it under the regulation of the araBAD promoter.

In the pBAD vector this can be found at positions 421-470, but is not included as an annotation feature.

  • Highlight the region 421-470 (inclusive)

  • Right-click and select “Create Annotation”

  • Give this new annotation a name “Multiple Cloning Site”

  • For Annotation Type, label this as a “feature”

  • Select whatever color you like

  • Finish by clicking on the “Save annotation” button

Stretch Yourself

Import a DNA sequence from an external database

While having existing sequences on Benchling is useful, in many instances, you may need to import a DNA sequence from an external database. Benchling stays connected to a number of research databases which allows you to easily import sequences that have been vetted or used by other scientists. You can find more information on the databases that we support and the type of data they store below:

Addgene - Addgene is a non-profit plasmid repository that members of the scientific community can contribute and order reagents like plasmids used in research literature. Benchling can import any deposited sequence maps that appear on Addgene.

NCBI - The National Center for Biotechnology Information (NCBI) provides centralized access to biomedical and genomic information for all different purposes. Specifically, Benchling will utilize NCBI accession numbers which can correlate to specific DNA sequences or reagents used in primary research and import those sequences.

Ensembl - Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Benchling can pull in ENSEMBL IDs which correlate to the genome of a specific organism and can potentially import that into Benchling. Note: Extremely large genomes will not be able to be imported into Benchling due to their size.

Registry of Standard Biological Parts - Also known as the iGEM Parts Registry, this public repository of genetic parts is used to build synthetic biology devices and systems. Benchling can import any genetic part that has been deposited into a new sequence.

JBEI Public Registry- The Joint BioEnergy Institute's private Inventory of Composable Elements (ICE) is an open source registry software and platform for managing information about biological parts. While this registry contains a broad wealth of information, Benchling will only import items such as plasmid or gene sequences.

Let’s imagine a scenario where you order this sequence from Addgene’s plasmid repository but you would like to analyze the sequence on Benchling. Here’s how you could import it:

  • From the global menu, find the Create icon, then select “DNA Sequence” -> “Import New Sequences”. A new menu should appear giving you several options to obtain your sequence.

  • Go the tab “Search External Databases”. You will see examples for what types of information and the format that Benchling uses to search different databases

  • For Addgene, paste this link: https://www.addgene.org/browse/sequence/251928/ and click “Search”.

  • You will see the sequence populate and then you can rename the file or change the location you’re saving it in. Afterwards, choose “Import” to create your new sequence.

  • Benchling will also import any attached annotations from the external database and pull in any relevant information about this construct in the “Description” tab.

Solution(s)

Feature

Location

Length

Ampicillin

989-1849

861

AraC

 

3198-4076

879

pRB322_origin

2004-2623

620

Congrats! You've finished the learning module: Sequence Annotation.

Was this article helpful?

Have more questions? Submit a request