Access your data warehouse

Aarthi
Aarthi
  • Updated

The Benchling Warehouse is a database solution that allows you to view structured data, like entity metadata, assay results and workflow tasks using SQL queries. The warehouse is based on PostgreSQL and is read-only.

External warehouse access allows you to access your Benchling warehouse using third party query tools such as Spotfire and Tableau, or through your preferred programming language. For an overview of the tables in the warehouse, check out the Warehouse Overview or Warehouse Schemas ERD. You can also access the warehouse within the Benchling platform through the Insights application.

This article does not cover the API, which can be used to write data to Benchling. To access the API, visit Write Data via the API.

Note: External Warehouse access is an add-on Benchling package. Please contact your Customer Success representative to determine if access is already included in your package, or hear more about this option.

Set up warehouse credentials

To create warehouse credentials:

  1. Click on your Avatar > Settings.
    mceclip6.png
  2. Click Create Credentials in the Warehouse Credentials section.
    If you do not see this section on your settings page, please reach out to your Customer Success Manager to determine if access is included in your package.
    mceclip3.png
  3. You can give your credentials a label, for example to indicate what you use them for.
  4. Be sure to copy the password and store it in a safe location (like your password manager) when generating credentials, since you will not be able to access this page again. Benchling does not recommend hard-coding these credentials into scripts.

For connection instructions and important security considerations, visit our developer documentation.

Credentials can be revoked at any time by clicking the trash can.
mceclip4.png

Connect with a tool that supports PostgreSQL

See below for instructions on configuring:

  1. The psql command-line tool and other libpq-based code libraries

  2. JDBC / Java

  3. Third party tools and SaaS

Connect Using psql

For general help with psql, visit the official documentation.

Note: It is very important to configure SSL / TLS correctly since psql and libpq-based clients like Python's psycopg2 won't use it by default.

The instructions here set the per-user default and will also work with Python's psycopg2 in addition to psql. If the instructions below don't work, or you need to connect to many PostgreSQL servers, you will need to refer to the official psql and libpq documentation.

Mac OS and Linux

Set up the ~/.postgresql/root.crt file. Open Terminal.app and run the following commands (please note this assumes you're using bash as your shell):

# Download the Aurora RDS root certificate to the default location.
curl https://s3.amazonaws.com/rds-downloads/rds-ca-2019-root.pem \
--create-dirs -o ~/.postgresql/root.crt

# Set environment variables to require full verification by default
# and make these settings apply to new shells. Use .zshrc for ZSH.
export PGSSLMODE=verify-ca
echo 'export PGSSLMODE=verify-ca' >> ~/.bashrc

Windows

Set up the %APPDATA%\postgresql\root.crt file:

  1. Download the CA certificate file to your Downloads folder and open that folder. You should see a new file named named rds-ca-2019-root.pem

  2. Open a second windows file explorer window (shortcut: Ctrl+N or go File > New Window) and type %APPDATA% in its address bar

  3. If you see a folder called postgresql here, great! But if not, create a new folder named exactly that. Open this folder.

    1. If you can't see or make this folder, contact your IT team

  4. Drag and drop the downloaded rds-ca-2019-root.pem from the Downloads folder to the postgresql folder

  5. Rename the rds-ca-2019-root.pem file to root.crt to set it as the default (recommended)

Connect to your Warehouse

To connect to your warehouse, run:

psql 'postgresql://*yourusername*@postgres-warehouse.*yourdeployname*.benchling.com:5432/warehouse?sslmode=verify-ca'

This should prompt for your password. Note: replace *yourusername* and *yourdeployname* with your individual settings.

Connect Using JDBC

The connection string for psql will also work for Java JDBC connections:

postgresql://yourusername@postgres-warehouse.yourdeployname.benchling.com:5432/warehouse?sslmode=verify-ca
Note: replace yourusername and yourdeployname with your individual settings.

See the PostgreSQL JDBC Connector SslMode documentation for more info on the SSL / TLS settings. You will need to set up the correct RDS root certificates, just like with the other tools above, possibly by using Java'skeytool. You will want to use the verify-ca mode. You may need to import or otherwise specify the location of the rds-ca-2019-root.pem file if it isn't in the default location.

Connect Using a Third-Party Tool or SaaS

Note: Many tools require you to enable SSL or TLS to connect. It is important and highly recommended you enable it. If you need a Root Certificate Authority ("Root CA") file, you should supply this one to your tool.

To connect to the Warehouse, use the following parameters in your data visualization tool:

  • Host Name: postgres-warehouse.yourdeployname.benchling.com. Be sure to replace yourdeployname with your enterprise specific domain (that means the part before your enterprise's benchling.com URL)

  • Port: 5432 (this is the default)

  • Database: warehouse

  • Username: the username you generated above

  • Password: the password you generated above

  • Enable SSL or TLS mode - this is very important for your data security

    • If possible, you should use the verify-ca SSL / TLS connection mode (it might be called "check hostname" or "check server certificate").

    • This will sometimes require you to download then upload this CA certificate file. If that doesn't work, please ask support to provide you a certificate specific to your Warehouse instance.

    • If the tool also asks for a client certificate, either generate your own or you can use these pre-generated .key and .crt files.

  • Root Certificate Authority / Trusted Certificate: Download / upload the global RDS certificate bundle (or the correct GovCloud bundle).

For example:


Reading data from the Warehouse

Connect your tool of choice to the Postgres warehouse (see above). For each run/result, you'll see:

  1. Raw tables (e.g. my_assay_result$raw): contain both reviewed and unreviewed data

  2. View (e.g. my_assay_result): contains only valid, reviewed data

Generally, you'll want to query the view when you want reviewed data. Query the raw table if you want to see all data, including unreviewed data. Note that each run/result is uploaded to a project, and you'll only be able to see runs/results in projects you can access. See our Getting Started doc for more information.

You will need permissions to view and edit the data you are trying to access.  This means that a user will only see data in the warehouse that that user has permission to see in the Benchling UI.  This will mean that different users may see a different number of results for certain queries and may not be able to view a linked item (shown as [No access]) in a linked field.
Permission to view runs and results is derived from the project in which they are placed. Project for a result will be specified through:

  1. If manually entered, the permissions for the assay data is the same as the permissions of the entry in which it is recorded

  2. If uploaded via a script, the project id can be specified at time of upload.

  3. When a run is included for review in a Notebook entry, we move the runs and results to the project for the runs/results

FAQ

Error trying to connect to database 'warehouse'

Two potential causes:

1) Your client or tool may not be connecting using SSL / TLS.  

To fix this, check that you followed the SSL / TLS setup instructions above. In particular, check that you have sslmode=verify-ca appended to the connection URL.

2) Bash is interpreting the username as a variable.  

To fix, make sure that you're using the correct connection string:

  • Use psql -U 'u$benchlingsupport' (with single quotes) instead of psql -U u$benchlingsupport -- be sure to replace benchlingsupport with your user ID

Error: psql: server certificate for "....rds.amazonaws.com" (and 2 other names) does not match host name postgres-warehouse.yourdeployname.benchling.com

You've tried to connect with the verify-full SSL mode. Either use the verify-ca mode or change the hostname you connect with to the *.rds.amazonaws.com one shown in the error message. The CA verification process ensures that this is a secure connection either way.

Was this article helpful?

Have more questions? Submit a request