Background
All sequences (natural and those with modifications) are primarily defined and stored as their HELM string, however there are other notations that can be used to define nucleotide strings for a variety of purposes. In Benchling these can be defined as Custom Notations, and require that customers to set up their Monomer Library first.
Custom notation can be configured by customers when alternative languages might be used (for synthesizers, ordering sequences, etc.) by mapping custom tokens for HELM triplets. Custom notations can be created to map HELM symbols to a syntax that is more familiar within a company (such as codes for ordering sequences, or for inputting into synthesizers). Custom notation can be used to generate sequences in Benchling and the translation can be populated from HELM if using computed fields.
To enable this feature please reach out to Benchling support, contact your Benchling representative or support@benchling.com. Please note, we do not offer pre-configured Custom Notations including IDT Syntax at this time. Customers are responsible for setting up their custom notation following the instructions below.
HELM to custom notation tokens
For nucleotides, HELM defines the three component monomers as individual symbols. These symbols are stored in Benchling as the primary definition along with the SMILES string, structure, and other metadata. Monomers are then assembled into triplets (a combination of sugar, base, and phosphate) when a sequence and its modifications are defined. Each monomer triplet is then mapped to the equivalent custom syntax token.
In HELM:
r = Ribose monomer
(N) = Base monomer, where N equals AGCTU and other degenerate bases
p = Phosphate monomer
HELM notation uses the following rules to assemble monomers into a human readable HELM string:
1. Square brackets to enclose monomers with multi-letter IDs, which generally represent synthetic or non-natural analogues of other monomers
2. Parentheses to enclose branch monomers
3. Period symbol to separate monomers into logical groups
For example, the HELM sequence for a natural RNA sequence of AGCU would be:
r(A)p.r(G)p.r(C)p.r(U)p
The same RNA sequence in HELM with modifications at various positions would be:
m(A)p.[fl2r](G)p.r(C)[sp].r(U)p
The same modified RNA sequence mapped to IDT syntax would be:
mA/i2FG/rC*rU
For additional information on HELM we recommend reviewing the original paper.
Permissions
From the Monomer Library, tenant admins will be able to access the Custom Notation page, create and name a new custom notation and select the appropriate configurations. See the sections below for additional information about configuring these attributes.
Any user with Molecular Biology application access is able to download and view the Custom Syntax as a reference.
Create a Custom Notation
Customers can configure a new Custom Notation type via the Monomer Library.
The creation modal requires a user-facing name for the new syntax, and offers four configuration options:
-
Case sensitive: If selected, tokens need to be an exact match to what case it was created with.
-
Use shared delimiter: By default, the system assumes that every character in a custom notation string belongs to exactly one token. Check this option to instead specify a single delimiter to be "shared" by all adjacent tokens. This should rarely be applicable.
-
Extract terminals to schema fields: Designates whether imports using this syntax should support special treatment for 5’ and 3’ terminal tokens, extracting them to schema fields. Enabling this requires defining in advance a single set of acceptable delimiters that will be used to recognize the terminal tokens. See examples in modal image.
-
Always set last phosphate to None: Designates whether to always set the ending phosphate monomer to null when creating sequences using this syntax, regardless of how the corresponding alias is configured.
After creating the new Custom Notation type, select the type and populate the text box with a comma separated list of notations mapped to their corresponding HELM tokens in the following order:
Custom Notation token,HELM token,Custom Notation 5' variant token (optional),Custom Notation 3' variant token (optional)
Optionally, if 5’ and 3’ variants are used (for example when setting up a custom for IDT Syntax) make sure to include the variant tokens as well.
Using custom syntax
Spreadsheet Imports
You can create entities using a custom syntax via bulk spreadsheet import (from the global create or registry create menus). If you have at least one custom syntax configured, a new Custom Notation submenu will be available in the column mapping dropdown. Simply select the desired syntax and proceed with the import as you would normally. Note that if a Custom Notation is mapped, no other column can be mapped to Bases, HELM, or (legacy) IDT.
Registration Tables
You can also use a custom syntax when creating entities via a registration table. When inserting a new registration table for an eligible schema (i.e. RNA Oligo or DNA Oligo or RNA Sequence), simply select the desired syntax from the Notation dropdown.
Computed Fields
Computed fields can be configured to translate sequences into the desired custom syntax per syntax. Please reach out to your Benchling representative or support@benchling.com to add these computed fields, specifying the schema, fields, and syntax that should be used after custom syntax has been configured.