Note: Data entry assistant is a part of Benchling Intelligence. See the Benchling Intelligence page for information on how to enable data entry assistant and other AI-based features.
This feature is in beta, so keep in mind that it is a work in progress. Usage limits may apply as we prepare for general availability.
The data entry assistant uses AI to allow importing a wide range of files as Benchling data. For example, it can often understand PDFs and Excel spreadsheets even if not formatted in a clean tabular form. The AI system uses context clues from the document and your notebook entry, as well as any instructions and examples you provide, to determine the most reasonable way to enter data into your Benchling Notebook.
Usage
In any result table, registration table, or create/fill containers table, click the Data entry assistant button.
Upload one or more files, and the AI system will automatically make a best guess at the first row based on all files, context, and instructions provided. To help the AI system make the right decision, it helps to clarify the task, which can be done in a few ways:
- Edit the example row as necessary to match the format and data you expect. This helps the AI system understand how the data maps to column headings and what format to use.
- Fill in any custom instructions to further clarify the task. For example, this is a good place to clarify the handling of blank values, or to explain any ID lookup steps that may not be obvious from context.
- Any instructions or other context already in the Notebook entry will also be considered, including instructions provided via template.
When the instructions have been specified, click Submit to start processing the data.
In this example, the uploaded Excel file looks like this:
The AI system will typically take 1 to 3 minutes to process the file. When it finishes, it will show a preview of the results. A banner will indicate if the results were verified across repeated runs. If there was a discrepancy between the multiple runs, the differences will be shown.
If the results are unexpected, the Discard and start over button can be used to edit any instructions and try again.
Always check the results for accuracy, even when verification passes; see “Risks from AI mistakes” below for more details. When verification does not pass, the results may still be accurate, but we encourage checking the flagged values.
After confirming accuracy, click Approve and insert rows or Acknowledge verification failure and insert rows to insert the displayed rows as new rows in the table. Any files will be attached to the notebook entry for reference.
Recommended practices
While the data entry assistant is designed to accommodate a wide variety of use cases automatically, there are some ways to improve the reliability and scale of the feature:
- Be aware that the AI system receives all data from your notebook entry and all files that you upload directly. It may be helpful to narrow down the data to a more specific set, e.g. by removing irrelevant Excel sheets.
- The AI system does not receive the contents of previously-attached files in your notebook entry or related items in Benchling. Ensure that any relevant information is directly contained in the notebook entry, uploaded files, or custom instructions.
-
Instructions and examples for the AI system may be written either directly when using the tool, or may be contained anywhere in the notebook entry.
- For simple tasks, custom instructions may be unnecessary; the AI system can often figure out the details of the task from context.
- In writing custom instructions, write as if you are explaining the task to a human. Think of any possible clarifications that a human might find helpful. Walking through concrete examples can often be more effective than generic instructions.
- Use the tool in an iterative process, and try different instructions until you find instructions that are most reliable.
- Consider designing your Benchling schema and table to make the data entry process as straightforward as possible. For example, the AI system will have an easier time if the column names match exactly. The primary identifying columns (e.g. sample ID and time) should appear first in order and should uniquely identify each row. Consider including intermediate values like well plate position to guide the AI system to take a step-by-step approach.
- If you are a tenant admin, consider enabling OpenAI as an AI provider by following the steps on the Benchling Intelligence page. Enabling OpenAI gives Benchling access to a wider variety of models to cross-check data entry accuracy.
Risks from AI mistakes
As with any AI tool, the data entry assistant may make mistakes, so it is critical to verify the results to ensure the data is properly entered into the system. While the tool contains a system to cross-check AI responses to avoid data transcription errors, this is not a guarantee of correctness.
Be aware of the different ways that the feature may potentially produce an incorrect result:
-
In some cases, the task itself may be ambiguous. For example, an “ID” column might reasonably correspond to several types of IDs present in the source document, or there may be multiple reasonable ways to handle missing or invalid data. The AI system will attempt to make reasonable choices based on the situation, but it may make the wrong assumption in some cases. To mitigate this risk, specify custom instructions, either in the prompt or in the notebook entry, to make the task as clear and unambiguous as possible.
- For some types of files, such as PDFs, the underlying text data may contain errors or ambiguities. For example, numerical superscripts and subscripts may be extracted incorrectly. When possible, prefer document formats like Excel that are more inherently structured. We are working to improve the accuracy in these situations over time.
- The tool has a “planning” phase where it assesses the work and attempts to break it down into smaller tasks, and there may be mistakes in this phase. Ensure that the total number of rows is the number that you expect, and ensure that all major portions of the dataset are present and not duplicated.
- Individual values may be entered incorrectly. While the cross-model verification mechanism significantly reduces the likelihood of such an error happening, it is not a guarantee, and the results should be checked for accuracy. Cell-level errors may typically manifest as a value being taken from the wrong row or column in the input file.
Limitations
-
While we aim to support a wide variety of file types, some files will work better than others. If a file is too large or too complex, the system may give an error message or may produce incorrect results. We are working to improve the capabilities of the system over time to handle larger and more complex files.
- Only some table types are supported: result tables, entity registration tables, create container tables, and fill container tables. More table types may be supported in the future.
- The operation usually takes at least 1 minute to run, even on small files.
- Custom instructions cannot yet be directly saved or shared as a template. However, the AI system will see any instructions contained directly in the notebook entry, which can be provided via template.
- Some information may not be available to the AI system, such as details about linked items, attachments, etc. It may sometimes be necessary to provide this information explicitly in AI instructions.
- Embedded images in the uploaded files can not yet be transferred to attachment fields in the table.
Security and privacy
For more information about privacy and security for AI-powered features, see Security and Privacy for Benchling Intelligence.