Why are my entities with a comma in the name getting recognized as two separate names in spreadsheet update?

Sewon
Sewon
  • Updated

Issue:

I have an entity with a comma in the name e.g. "1,2-compound" but when using the spreadsheet update functionality, this name gets recognized as a comma-separated list e.g. "1" and "2-compound" and throws an error that these entities are not found.

Environment:

Registry

Cause:

A comma within a tabular cell is used to denote a multi-link value e.g. if you wanted to link to entities 'ABC' and 'DEF', you could have 'ABC, DEF' in the tabular cell. Benchling spreadsheet import allows you to put quotes around an entity name to indicate that the quoted string to be treated as a single name whereby commas are treated literally. So if you have an entity '1,2-compound' and '3,4-compound', the tabular cell needs to look like "1,2-compound","3,4-compound".

In a CSV, a literal comma in a cell requires the whole cell value to wrapped in quotes and literal quotes in a cell need to be replaced with doubled quote. This means the cell in a CSV would need to look like e.g. """1,2-compound"", ""3,4-compound""" .

Resolution Steps:

There are a few options to choose from in order to avoid this issue:

1. If a CSV is being used, each entity name in a cell should be CSV-escaped (e.g. """1,2-compound""")

2. If a TSV / XLSX / CSV with semicolons spreadsheet is being used, then the entity name can simply contain quotes (e.g. "1,2-compound") - no double-escaping is necessary here.
3. Avoid using the entity name and use Registry IDs instead, which shouldn't contain commas.

Was this article helpful?

Have more questions? Submit a request