Why are Benchling-generated canonical SMILES different from other SMILES?

Issue

Benchling SMILES wording differs from other generated SMILES wording.

Environment

Molecular Biology application

Cause

Benchling stores both the original, user-supplied molecule format, a molfile for rendering chemical structures, and a canonicalized SMILES string using RDKit provided algorithms. The canonicalized SMILES string is generated from either the original SMILES string or the original molfile provided by the user.

When canonicalizing the SMILES string, the following are considered:

Add enhanced stereochemistry - The output of the enhanced stereochemistry step is to add explicit stereochemistry to the canonicalized SMILES string.
Remove explicit hydrogens - The explicit hydrogen step removes any explicit hydrogens, and converts them to implicit hydrogens (hydrogens that do not appear directly in the structure or are written with the connecting atom).

For strip salts, this only accounts for salts that are stripped from the canonicalized SMILES string.

Dearomatize/Kekulization: The Kekulization step places alternating single and double bonds instead of aromatic bonds. Other canonicalization steps that may or may not apply to SMILES input include:

Remove atom maps
Remove atom label properties
Remove conformers
Remove atom valence properties

Salts that may be included in a chemical structure are currently stripped during the chemical structure canonicalization process.

Note: Currently, Benchling keeps the largest fragment (based on atom count) of the original molecule as the parent molecule, and all other fragments are treated as salts. Salts can be saved in the schema and custom fields when detected upon bulk import.

Why are Benchling-generated canonical SMILES different from other SMILES?

Issue

Environment

Cause

Was this article helpful?

<%= previousTitle %>

<%= nextTitle %>

In this article

Still need help?

Categories

Toggle navigation menu

<%= category.name %>