Searching for sequence similarities using BLAST

Achala
Achala

The Basic Local Alignment Search Tool (BLAST) finds regions of similarity between sequences, comparing nucleotide and protein sequences to calculate the statistical significance of matches. Use BLAST to infer functional and evolutionary relationships between sequences and identify members of gene families.

Search sequences using BLAST

You can use BLAST algorithms to find DNA and AA sequences in your Benchling inventory based on their similarity to an input sequence. The search returns matching sequences and their similarity scores based on the selected algorithm.

To search for a sequence:

1. Right-click a sequence selection and select Run Benchling BLAST.

Alternatively, you can click the lens icon in the left-side menu to open global search, then click the DNA helix in the search bar to open the BLAST modal.

global-blast.png

2. In the BLAST modal, select the BLAST algorithm you're using, then select the input sequence using one of these methods:

    • Select the entities in Benchling
    • Enter or paste the sequences in the text box
    • Upload a file of the sequences, using an approved format

3. [Optional] You can repeat step 2 to add multiple sequences to query at the same time.

4. [Optional] Select the parameter values to customize your query. View the list of parameters and their effects at the National Library of Medicine website.

  • The modal will automatically select the recommended values based on the input sequences. For example, when running blastp for an amino acid shorter than 30 residues, the program is automatically set to blastp-short, which is optimized for shorter sequences. The parameter recommendation is the same as that on NCBI BLAST.
  • Feel free to try different general and scoring parameters. You can always use the Reset to algorithm defaults to reset them. Please note that the program is not included in the reset.

Screenshot 2024-05-30 at 17.48.26.png

Auto program selection based on the query sequences

 

5. Click Run BLAST. The modal will close and the search launches in the global search panel. Note that some search features, like filters, are disabled when using BLAST.

Table View

The BLAST search results have two views for most algorithms. The table view is similar to other search result views in Benchling. It lists all the hit sequences in a table with BLAST result attributes (e.g. query coverage, bit score) and attributes of the hits (e.g. ID, modified timestamp).

Here is an example:

Screenshot 2024-05-30 at 17.54.42.png

Under this view, you can select which columns to include in the view, change display preferences like sorting order, select multiple hit sequences and run bulk operations on them.

3397e4d3-d8e1-4931-965c-5a5226e7d333.png

Filter columns by clicking on the leftmost icon in the table header

 

Screenshot 2024-05-30 at 18.03.00.png

Change sort order, display density, and results per page

 

3136c7cd-bb20-49de-9e14-131ccf1eeae2.png

Select multiple hit sequences and perform bulk operations

 

 

Alignment View

The alignment view provides details about the hit sequence. It lists each High Scoring Pair (HSP) separately. This view combines both the graphic summary and the flat query-anchored alignment view from NCBI BLAST.

Screenshot 2024-06-19 at 10.17.01.png

Switch to alignment view by clicking the “Alignment view” button

 

BLAST result attributes

Under the alignment view, the BLAST result attributes are shown on the left of the alignment view table, including: hit range, percent identity, and E value. You can filter these attribute columns the same way as under the Table View.

38c46266-5bf7-4e98-b8df-53110b4368c9.png

Filter BLAST result attributes

 

Bar Graph vs Pairwise Alignment

The query and hit sequences are listed in the Sequence column on the right of the alignment view table. The width of the Sequence column will change according to the screens size and the attribute columns on the left. If you need more space in the Sequence column, you can shrink the attribute columns or filter them out.

 

If the available space on the screen after the attribute columns are too narrow, the Sequence column will disappear. Again, you can resize or filter out the attribute columns to bring the Sequence column back.

 

You can toggle between two visualization modes for the Sequence column: Bar graph alignment vs Pairwise alignment.

380c9297-e18e-4a5d-a22c-f7a7dcdd4693 (1).png

Bar graph alignment view

18ec714e-4d2b-4283-9140-ee49d1491b03.png

Pairwise alignment view

You can zoom in and out on the sequences by clicking on the + and - buttons in the zoom level slider, or drag the slider directly. When the zoom level is over half way on the slider, the bar graph view will automatically switch to pairwise view.

Select and compare multiple results

Under the alignment view, you can select one or multiple hit sequences and just focus on them. Whenever any hit sequences are selected, a Compare N results button will show up above the table. Clicking this button will hide any other results. All HSPs for the selected hits will be included.

Screenshot 2024-05-30 at 21.30.33.png

Select and focus on specific hit sequences

 

Bar Graph Alignment Score

The bar graphs are colored according to each HSP’s alignment score. You can hover over each segment to see the actual score as well. The alignment score reflects the similarity between the query sequence and the HSP sequence. Higher score indicates greater similarity, while lower scores suggest more differences between the sequences. The alignment score legend will only show under the bar graph view.

Screenshot 2024-05-30 at 20.53.20.png

Legend for the alignment score

 

Pairwise Alignment Coloring Scheme

By default, the pairwise alignment view highlights mismatched nucleotides or residues with a light red background. Currently this is the only coloring scheme for the nucleotide sequences. For amino acid sequences, we provide more coloring schemes. When you are under the pairwise alignment view for amino acids, a View menu will show up with a dropdown menu of four different coloring schemes: the default Mismatch, Rasmol, Polarity, and Hydrophobicity.

4d00f78c-eba7-4a3d-afb0-7c7f0fecc1a3.png

View menu with different coloring schemes for amino acid sequences

Under the alignment view, clicking between any nucleotide or amino acid will introduce a vertical guideline across sequences.

you can select a specific region on one or across multiple sequences. Then the zooming will centered around that region.

 

There are two ways to select a region. The first one is to click on starting position, keep the button down, and drag horizontally towards the end region, as well as vertically to include multiple sequences. Release the button to complete the selection. All the sequences covered by the selected region will be highlighted by a yellow background. Sequences within the selected range but outside of the selected will have a gray background.

 

The second way is to first click on the start position and release the button. A black vertical line will show up to mark the start position. Then press Shift and click on the end position. Under this mode, you can still drag vertically and horizontally to adjust the selected region. Release Shift and the button to complete the selection.

select-with-shift.mov

Multi-Query BLAST

When multiple sequences are queried at the same time, an extra dropdown list of sequences will show up above the result table. You can click on the dropdown to select and see the results for each queried sequence.

Screenshot 2024-05-31 at 18.14.28.png

A dropdown list will show up for multi-query BLAST.

Screenshot 2024-05-31 at 18.14.41.png

Select a sequence in the dropdown to see the relevant results.

Known Limitations

  • BLAST queries for long sequences (e.g. over 20K) may timeout and fail.
  • BLAST queries for highly repetitive sequences may fail or require careful search parameter tuning.
  • The alignment view is unavailable for tblastx.

FAQ

What BLAST algorithms are available?

Currently, Benchling supports these BLAST algorithms:

  • blastn
  • blastp
  • blastx
  • tblastn
  • tblastx

 

Why does a blastp query return nothing, while there are matches in my Benchling inventory?

For complicated BLAST searches, like an amino acid sequence with high repetition, the results can be sensitive to the parameter values. For blastp queries with no result, please try a different scoring matrix.

 

Why does my BLAST search say that it times out?

Currently the general Benchling search system has a one minute limitation. If the BLAST query cannot be completed within one minute, it will timeout. The timeout can happen for long query sequences or complicated BLAST algorithms like tblastx. We plan to support longer sequences and more complicated queries in the future. In the interim, you can run BLAST with shorter sequences.

 

Why is Alignment View not available for tblastx?

Screenshot 2024-05-30 at 21.04.55.png

Disabled alignment view toggle for tblastx

 

Currently, the alignment view is unavailable for tblastx queries. The toggle is disabled for that algorithm.

This is because each tblastx query runs multiple searches, one reading frame per search. The results are more complicated than that of other BLAST algorithms, and require a more specialized user experience to present them in an easily digestible way.

 

Can I use BLAST to find sequences in external databases?

No. Currently, BLAST only searches sequences in your Benchling inventory.

Was this article helpful?

Have more questions? Submit a request