Set up your data warehouse and connect to third party tools

Accessing the data warehouse allows your team to perform advanced analytics, reporting, and data integrations using raw structured data from your Benchling tenant. With direct access, you can connect tools like Jupyter, Power BI, and ETL pipelines to support scientific workflows, business intelligence, and regulatory reporting.

Create warehouse credentials

In order to access your Benchling data warehouse, you will need to create credentials for the warehouse.

Click on your Avatar in the Navigation bar, then navigate to Settings
On the profile information page, find the Warehouse Credentials section and click Create Credentials
Give your credentials a label to indicate what you will use them for
Click Create
Copy the credentials down and save them in a safe place as the password isn’t stored and can’t be retrieved later
Exit the window by clicking Done

To delete a credential, click the trash icon to the right of the credential.

Once you have created the credentials, you can navigate to the Warehouse Credentials section of the Settings page to retrieve connection information. This includes:

Host URL
- [your_tenant_name].benchling.com
Port
- 5432 (the default PostgreSQL port)
Database name
- warehouse
Username and password
- The username and password you generated when you created the credentials in the Settings page

Note: Access to the warehouse must be enabled for your tenant. Contact Benchling if this option is not visible.

Connect with a tool that supports PostgreSQL

See below for instructions on configuring:

The psql command-line tool and other libpq-based code libraries
JDBC / Java
Third party tools and SaaS

For general help with psql, visit the official documentation.

Note: it is very important to configure SSL/TLS correctly since psql and libpq-based clients like Python’s psycopg2 won’t use it by default.

The instructions here set the per-user default and will also work with Python's psycopg2 in addition to psql. If the instructions below don't work, or you need to connect to many PostgreSQL servers, you will need to refer to the official psql and libpq documentation.

Mac OS and Linux

Set up the ~/.postgresql/root.crt file. Open Terminal.app and run the following commands (please note this assumes you're using bash as your shell):

# Download the Aurora RDS root certificate to the default location.
curl https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem \
 --create-dirs -o ~/.postgresql/root.crt

# Set environment variables to require full verification by default
# and make these settings apply to new shells. Use .zshrc for ZSH.
export PGSSLMODE=verify-ca
echo 'export PGSSLMODE=verify-ca' >> ~/.bashrc

Windows

Set up the %APPDATA%\postgresql\root.crt file:

Download the CA certificate file to your Downloads folder and open that folder. You should see a new file named named global-bundle.pem
Open a second windows file explorer window (shortcut: Ctrl+N or go File > New Window) and type %APPDATA% in its address bar
If you see a folder called postgresql here, great! But if not, create a new folder named exactly that. Open this folder.
1. If you can't see or make this folder, contact your IT team
Drag and drop the downloaded global-bundle.pem from the Downloads folder to the postgresql folder
Rename the global-bundle.pem file to root.crt to set it as the default (recommended)

Connect to your Warehouse

To connect to your warehouse, run:

psql 
'postgresql://*yourusername*@postgres-warehouse.*yourdeployname*.benchling.com:5432/warehouse?sslmode=verify-ca'

This should prompt for your password.

Note: replace *yourusername* and *yourdeployname* with your individual settings.

Connect Using JDBC

The connection string for psql will also work for Java JDBC connections:

postgresql://yourusername@postgres-warehouse.yourdeployname.benchling.com:5432/warehouse?sslmode=verify-ca

Note: replace yourusername and yourdeployname with your individual settings.

See the PostgreSQL JDBC Connector SslMode documentation for more info on the SSL / TLS settings. You will need to set up the correct RDS root certificates, just like with the other tools above, possibly by using Java's keytool. You will want to use the verify-ca mode. You may need to import or otherwise specify the location of the global-bundle.pem file if it isn't in the default location.

Connect Using a Third-Party Tool or SaaS

Note: many tools require you to enable SSL to TLS to connect. It is important and highly recommended that you enable it. If you need a Root Certificate Authority (“Root CA”) file, you should supply this one to your tool.

To connect to the Warehouse, use the following parameters in your data visualization tool:

Host Name: postgres-warehouse.yourdeployname.benchling.com. Be sure to replace yourdeployname with your enterprise specific domain (that means the part before your enterprise's benchling.com URL)
- Certain domains may use the form of postgres-warehouse.yourdeployname.vc-postgres-warehouse.benchling.com instead. Check the URL shown in the Create Credentials modal.
Port: 5432 (this is the default)
Database: warehouse
Username: the username you generated above
Password: the password you generated above
Enable SSL or TLS mode - this is very important for your data security
- If possible, you should use the verify-ca SSL / TLS connection mode (it might be called "check hostname" or "check server certificate").
- This will sometimes require you to download then upload this CA certificate file. If that doesn't work, please ask support to provide you a certificate specific to your Warehouse instance.
- If the tool also asks for a client certificate, either generate your own or you can use these pre-generated .key and .crt files.
Root Certificate Authority / Trusted Certificate: Download / upload the global RDS certificate bundle

Spotfire example:

Reading data from the Warehouse

Connect your tool of choice to the Postgres warehouse (see above). For each run/result, you'll see:

Raw tables (e.g. my_assay_result$raw): contain both reviewed and unreviewed data
View (e.g. my_assay_result): contains only valid, reviewed data

Generally, you'll want to query the view when you want reviewed data. Query the raw table if you want to see all data, including unreviewed data. Note that each run/result is uploaded to a project, and you'll only be able to see runs/results in projects you can access. See our Getting Started doc for more information.

You will need permissions to view and edit the data you are trying to access. This means that a user will only see data in the warehouse that that user has permission to see in the Benchling UI. This will mean that different users may see a different number of results for certain queries and may not be able to view a linked item (shown as [No access]) in a linked field.
Permission to view runs and results is derived from the project in which they are placed. Project for a result will be specified through:

If manually entered, the permissions for the assay data is the same as the permissions of the entry in which it is recorded
If uploaded via a script, the project id can be specified at time of upload.
When a run is included for review in a Notebook entry, we move the runs and results to the project for the runs/results

Connect Jupyter/Python to the warehouse

Connecting Jupyter or Python to the Benchling data warehouse allows data scientists and analysts to programmatically query live Benchling data and incorporate it into custom models, pipelines, or notebooks for visualization, transformation, or machine learning workflows.

You can connect Jupyter or any Python environment to the Benchling data warehouse using the psycopg2 PostgreSQL adapter.

Steps:

Install psycopg2 and other necessary packages:

$ brew install postgresql
$ pip install jupyter-lab
$ pip install ipython-sql
$ pip install psycopg2

Start a new notebook
```
$ jupyter-lab
```
Configure the connection to the warehouse by going to the Settings page in Benchling and generating a new set of warehouse credentials

Use the following code snippet in Jupyter to connect:

%load_ext sql
username = 'INSERT_USERNAME_HERE'
password = 'INSERT_PASSWORD_HERE'
host = 'postgres-warehouse.INSERT_TENANT.benchling.com:5432'
%sql postgresql://$username:$password@$host/warehouse

If you don't have any errors, the connection should have worked! You can test it by running an example query like so:

Ingesting data from the warehouse for ETL

Integrating Benchling’s data warehouse with your ETL tools enables teams to extract, transform, and load structured scientific data into downstream systems. This supports central data lakes, long-term storage, and analytics aggregation alongside other enterprise data sources. Benchling supports ETL ingestion workflows using tools that connect to PostgreSQL.

There are two ways to programmatically pull data from Benchling.

Pull data via the Benchling REST APIs
Pull data via the Benchling data warehouse

The Benchling REST API has a few key benefits:

Data from the REST API is live
The REST API allows for integrations to "writeback" into Benchling

However, some customers prefer to query with more transformation and query flexibility. In that case, the warehouse has the following properties:

Data from the warehouse is sync'ed from the live Benchling application. Usually that's within a few minutes, but can be up to a few hours
Data from the warehouse can only be read, not written

Once you've set up your warehouse connection, you can use a tool like FiveTran or Debezium to pull data into your downstream system. There are two possible approaches:

Set up SQL-based ingestion with regular snapshots

In cases where the amount of data in Benchling isn't that large, one can simply run queries for all of the data in the warehouse and pull it regularly (say, once a week). These snapshots practically look like this:

SELECT * FROM registry_entity;
SELECT * FROM entry$raw;
SELECT * FROM container$raw;

Though configuring them in your ingestion tool of choice may look different. The main advantage of snapshotting is that the logic to ingest from the warehouse is fairly straightforward - you're pulling all the data, at some increment.

Set up SQL-based ingestion with incremental pulls

In cases where the amount of data in Benchling is large enough (or the ETL is too slow to snapshot - like queries with lots of joins) you can run queries based on what has changed since the last time a pull was made. Unfortunately, there is not a universal column that can be used for such incremental calculations, but most of our tables have a modified_at timestamp which can be used for this purpose.

Some data ingestion tools prefer to just be pointed at the tables of interest and "autocalculate" what changes have been made to propagate to the downstream system. Unfortunately, those systems often don't work well with Benchling, primarily because of the way we make updates (we don't update the row directly, we insert a new row with the same id, and delete the previous row). As a consequence, tools like Fivetran (using something like xmin replication) will often duplicate data into downstream systems.

As a consequence, tools like Fivetran (using something like xmin replication) will often duplicate data into downstream systems. For example, if data looks like this in Benchling's warehouse:

id	_sync_key	name
bfi_3k3jdkfs	basic_folder_item_224103	Plasmid 1
bfi_ckvjwe4	basic_folder_item_30020	Plasmid 2

The downstream system might contain this:

id	_sync_key	name	insert_time (tool produced)
bfi_3k3jdkfs	basic_folder_item_224082	Plasmid 1	12:00 AM
bfi_3k3jdkfs	basic_folder_item_224103	Plasmid 1	12:03 AM
bfi_ckvjwe4	basic_folder_item_30019	Plasmid 2	12:00 AM
bfi_ckvjwe4	basic_folder_item_30020	Plasmid 2	12:03 AM

The key is to deduplicate the rows based on id - there should only ever be one row per value of id. Most ingestion tools bring along a "insert_time" which can be used to filter the old rows out.

What about write-ahead-log based replication?

Unfortunately, WAL replication is not supported in the Benchling warehouse today - access to the WAL requires the ability to write to specific tables in Postgres, and we don't support write use cases. However, if you have a need for ingestion from the warehouse, please get in touch! We'd love to learn more about the use case, and add appropriate features to this service where relevant.

Connect to Google Data Studio

Connecting Benchling data to Google Data Studio allows teams to visualize and share structured Benchling data using an accessible, web-based dashboarding tool. This enables non-technical stakeholders to explore scientific or operational trends in real time, using reports built directly from live warehouse data.

To connect Benchling data to Google Data Studio:

In Google Data Studio, click Create and select Data Source
Choose the PostgreSQL connector
Enter the following connection parameters under the Basic connection type:
- Host: from warehouse settings
- Port: 5432 (or as specified)
- Database: name from settings
- Username and Password: your warehouse credentials
Under SSL, select Enable SSL
Download the client configuration files below which are needed in the authentication fields
- private client key.key
- client certificate.crt
Click Authenticate and then Connect

If connection fails, verify firewall or network settings to allow outbound access to Benchling's host.

Connect to Microsoft PowerBI

Connecting Benchling data to PowerBI enables teams to create powerful, interactive dashboards and reports using enterprise-grade analytics tooling. It helps visualize trends across experiments, operations, or inventory systems, while integrating Benchling data with other business sources.

You can connect PowerBI Desktop to the warehouse using its built-in PostgreSQL connector.

Once Postgres is installed you should see a Postgres folder within C:/Program Files. Open C:\Program Files\PostgreSQL\13\data\pg_hba.conf in a text editor

Add an additional line in pg_hba.conf like so:

#Added for Benchling warehouse
hostssl     all         all             all               md5

Set up the %APPDATA%\postgresql\root.crt file:
1. Download the CA certificate file to your Downloads folder and open that folder. You should see a new file named named “global-bundle.pem”
2. Open a second windows file explorer window (shortcut: Ctrl+N or go File > New Window) and type %APPDATA% in its address bar
3. If you see a folder called “postgresql” here, great! But if not, create a new folder named exactly that. Open this folder. (If you can't see or make this folder, contact your IT team)
4. Drag and drop the downloaded global-bundle.pem from the Downloads folder to the postgresql folder
5. Rename the global-bundle.pem file to root.crt to set it as the default (recommended)
Install the driver
Configure an ODBC connection in PowerBI (open up PowerBI and set up a new data connection). From here, you can find ODBC
Configuring the ODBC connection: This configuration can be very tricky - be sure to paste in exactly the correct values, otherwise it will not work. In particular, you should be setting the driver, server, port, database, and sslmode in the connection string
```
Driver={PostgreSQL ANSI}; 
Server=postgres-warehouse.*your_tenant_name*.benchling.com; 
Port=5432; Database=warehouse; sslmode=verify-ca
```
Be sure to replace *your_tenant_name* with the correct value. Lastly, you'll be asked to fill in your username and password. These can be retrieved from the credential creation page in your Benchling setting.

SI unit calculations in the warehouse

Benchling’s data warehouse supports SI unit calculations for result values. Each result with a numeric value and unit is stored in both its original unit and a normalized SI unit. SI unit values appear in the column value_si. This allows for standardized queries across entries with varying units (e.g., converting µg/mL to g/L automatically). Benchling stores this data in the warehouse per the following SI units:

kg for mass
mol for moles
L for volume
kg/L for mass concentration
mol/L (M) for molarity

Some warehouse columns that this pertains to include:

container.volume_si
container_content.concentration_si
container_transfer.volume_si

Note: volume_si is correct for Warehouse only. Generally, quantity is preferred, but volume_si is preferred for warehouse. Existing volume columns must be maintained in the warehouse to guarantee stability.

Example Query

To retrieve both original and SI values:

SELECT value, value_si, unit FROM results WHERE unit = 'µg/mL';

Only supported for result schemas where numeric units are defined.

Frequently Asked Questions

Q: Do I need Benchling’s help to enable warehouse access?
A: Yes. If you don’t see the warehouse settings, contact Benchling to enable access.

Q: Can I use my own ETL tool?
A: Yes. Any tool that connects to PostgreSQL using JDBC, ODBC, or native drivers is supported.

Q: Does Benchling standardize units in the warehouse?
A: Yes. SI units are calculated and stored alongside original values in the value_si column.

Q: Is SSL required for third-party connections?
A: Yes. SSL must be enabled when connecting to the warehouse from Google Data Studio, Power BI, or any external tool.