Data from multi-select fields are stored in JSONB arrays in the Benchling warehouse, which in Insights automatically get displayed as the blue clickable chips. There are specific JSONB operators and functions to work with these arrays in your postgreSQL query. This article will show you some examples.
Combine tables with JOIN
If you have two tables with matching text columns, you can easily combine these tables using:
SELECT * FROM table1
JOIN table2 ON table1.column1 = table2.column2
However, the same JOIN ... ON column1 = column2
syntax does not work when one of the columns is a JSONB array, and you will get an error: operator does not exist: jsonb = character varying
.
We will use an example to show how to JOIN
tables based on JSONB arrays. We will join data from a Cell Line schema with data from the Plasmid schema. These schemas are linked via a multi-select entity link, and thus is the link from the cell_line
to the plasmid
table stored in a JSONB array. Here is our example cell_line
data:
id | name$ | plasmids |
bfi_6mMswng0 | Cell-001 | ["seq_D3fZa648", "seq_bVWwI73e"] |
bfi_jyMlI3nF | Cell-002 | NULL |
Option 1: use the ? operator
JSONB arrays have specific operators to compare arrays and values. The ?
operator checks if a value is part of the array, which you can use to JOIN
tables ON jsonb_column1 ? column2
.
SELECT
cell_line.name$ AS cell_line_name,
plasmid.name$ AS plasmid_name
FROM cell_line
LEFT JOIN plasmid ON cell_line.plasmids ? plasmid.id
# Output
cell_line_name | plasmid_name
Cell-001 | pBR322
Cell-001 | pX330
Cell-002 | NULL
Note that since Cell-001
has 2 plasmids associated, it has 2 rows, one for each plasmid, in the output.
To aggregate the multiple rows per cell line back into one row, you can use GROUP BY cell_line.id
to indicate that all values per Cell Line should be combined into one row. For columns that are not part of the GROUP BY
clause, you have to use aggregation functions to specify how these values should be combined into a single row:
STRING_AGG(column, separator)
combines text values with a separatorJSONB_AGG(column)
combines IDs or text values into a new JSONB array
(in which entity and inventory item IDs will be rendered as clickable chips in Insights).
# SQL query
SELECT
cell_line.name$ AS cell_line_name,
STRING_AGG(plasmid.name$, ', ') AS plasmid_names,
JSONB_AGG(plasmid.gene) AS linked_genes,
FROM cell_line
LEFT JOIN plasmid ON cell_line.plasmids ? plasmid.id
GROUP BY cell_line.id, cell_line_name
# Output
cell_line_name | plasmid_names | linked_genes
Cell-001 | pBR322, pX330 | ["seq_R9aBs84a", "seq_OvWx49a"]
Cell-002 | NULL | []
Option 2: unpack the JSONB array
Alternatively, you can unpack the JSONB array into single values, each on a separate row, using jsonb_array_elements_text(jsonb_column)
. After you've unpacked the array, you can use these unpacked text values to join other tables with JOIN ... ON unpacked_column1 = column2
. You can also use this in the SELECT
clause to unpack elements into separate rows without joining any other tables.
SELECT
cell_line.name$ AS cell_line_name,
plasmid.id AS plasmid_id,
plasmid.name$ AS plasmid_name
FROM cell_line
LEFT JOIN jsonb_array_elements_text(cell_line.plasmids) AS cell_line_plasmid_id
ON TRUE
LEFT JOIN plasmid ON cell_line_plasmid_id = plasmid.id
# Output
cell_line_name | plasmid_id | plasmid_name
Cell-001 | seq_D3fZa648 | pBR322
Cell-001 | seq_bVWwI73e | pX330
Cell-002 | NULL | NULL
In this example, jsonb_array_elements_text
thus converts the plasmid
column into a new table that has 1 row per element (seq_D3fZa648
and seq_bVWwI73e
).
Next, LEFT JOIN ... ON TRUE
joins that table to our existing cell_line
table. We specifically recommend a LEFT JOIN
instead of (INNER) JOIN
so cell lines that don’t have any plasmids, like Cell-002
, are still included in the output. Otherwise, they would be filtered out.
Note that since Cell-001
has 2 plasmids associated, it has 2 rows, one for each plasmid, in the output. See Option 1 above on how to use GROUP BY
to combine these rows into a single row per cell line again.
Select a single JSONB element
You can select a single element from a JSONB array using column->>[index, starting at 0]
. For example, you can get the first element of the array with column->>0
, selecting seq_D3fZa648
from the array ["seq_D3fZa648", "seq_bVWwI73e"]
.
Computed fields are often multi-select, even if your data model only permits it to contain a single value. You can use ->>0
in SELECT column->>0 FROM ...
statements to select the first text value, or use it to join tables on this single value: JOIN ... ON jsonb_column1->>0 = column2
.
Count the number of JSONB elements
You can usejsonb_array_length(jsonb_column)
to count how many elements are present in a JSONB array:
# Example cell_line table
id | name$ | plasmids
bfi_6mMswng0 | Cell-001 | ["seq_D3fZa648", "seq_bVWwI73e"]
bfi_jyMlI3nF | Cell-002 | []
# SQL query
SELECT
cell_line.name$ AS cell_line_name,
jsonb_array_length(cell_line.plasmids) AS plasmid_count
FROM cell_line
# Output
cell_line_name | plasmid_count
Cell-001 | 2
Cell-002 | 0
Additional resources
https://www.postgresql.org/docs/12/functions-json.html
https://riptutorial.com/postgresql/example/5195/using-jsonb-operators
https://www.postgresqltutorial.com/postgresql-json/