Let’s start with a non-JSONB example to set the groundwork:
# cell_line table
id | name$ | plasmid
bfi_6mMswng0 | Cell-001 | seq_D3fZa648
bfi_jyMlI3nF | Cell-002 | NULL
# SQL query
SELECT cell_line.name$ AS cell_line_name, plasmid.name$ AS plasmid_name
FROM cell_line
LEFT JOIN plasmid ON cell_line.plasmid = plasmid.id
# Output
cell_line_name | plasmid_name
Cell-001 | pBR322
Cell-002 | NULL
What if the plasmid
column is a JSONB array of 1 element, because it’s configured to be a multiple-value link (aka isMulti: true
) field? For example, it’s a computed field.
# cell_line table
id | name$ | plasmid
bfi_6mMswng0 | Cell-001 | ["seq_D3fZa648"]
bfi_jyMlI3nF | Cell-002 | []
# SQL query
SELECT cell_line.name$ AS cell_line_name, plasmid.name$ AS plasmid_name
FROM cell_line
LEFT JOIN plasmid ON cell_line.plasmid->>0 = plasmid.id
# Output
cell_line_name | plasmid_name
Cell-001 | pBR322
Cell-002 | NULL
Note that ->>0
converts the JSONB array of 1 element (["seq_D3fZa648"]
) to a simple string (seq_D3fZa648
) as in the non-JSONB example.
Now, what if the cell_line
table actually has a plasmids
column containing 2 or more elements (a true multiple-value link field)?
# cell_line table
id | name$ | plasmids
bfi_6mMswng0 | Cell-001 | ["seq_D3fZa648", "seq_bVWwI73e"]
bfi_jyMlI3nF | Cell-002 | []
# SQL query 1
SELECT cell_line.name$ AS cell_line_name, plasmid.name$ AS plasmid_name
FROM cell_line
LEFT JOIN jsonb_array_elements_text(cell_line.plasmids) AS cell_line_plasmid_id
ON TRUE
LEFT JOIN plasmid ON cell_line_plasmid_id = plasmid.id
# Output 1
cell_line_name | plasmid_name
Cell-001 | pBR322
Cell-001 | pX330
Cell-002 | NULL
# SQL query 2
SELECT cell_line.name$ AS cell_line_name,
string_agg(plasmid.name$, ', ') AS plasmid_names
FROM cell_line
LEFT JOIN jsonb_array_elements_text(cell_line.plasmids) AS cell_line_plasmid_id
ON TRUE
LEFT JOIN plasmid ON cell_line_plasmid_id = plasmid.id
GROUP BY cell_line.id, cell_line_name
# Output 2
cell_line_name | plasmid_names
Cell-001 | pBR322, pX330
Cell-002 | NULL
From the 1st query, jsonb_array_elements_text
converts the plasmid
column into a table that has 1 row per element (seq_D3fZa648
and seq_bVWwI73e
).
Next, LEFT JOIN ... ON TRUE
joins that table to our query. We specifically recommend a left (outer) join instead of (inner) join so that rows that don’t have any plasmids (“Cell-002” has []
) still come through. Otherwise, they’d be filtered out.
Notice that the 1st output has 1 row per <cell line, plasmid> combo. Since “Cell-001” has 2 plasmids, it has 2 rows in the output.
The 2nd query aims to combine the rows so there’s only 1 row per cell line in the output. It adds GROUP BY cell_line ...
to do so.
Additional resources:
https://www.postgresql.org/docs/9.5/functions-json.html
https://riptutorial.com/postgresql/example/5195/using-jsonb-operators
https://www.postgresqltutorial.com/postgresql-json/