Note: Flexible data entry is a part of Benchling Intelligence. See the Benchling Intelligence page for information on how to enable flexible data entry and other AI-based features.
This feature is in beta, so keep in mind that it is a work in progress. This feature can currently be enabled for smaller documents. Please reach out to ai-ml@benchling.com for larger use cases. Usage limits may apply as we prepare for general availability.
The flexible data entry feature uses AI to allow importing a wide range of files as Benchling data. For example, it can often understand digital PDFs and Excel spreadsheets even if not formatted in a clean tabular form. The AI system uses context clues from the document and your notebook entry, as well as any instructions you provide, to determine the most reasonable way to enter data into your Benchling Notebook.
Usage
In any result table, registration table, or create/fill containers table, click the Flexible data entry button.
Fill in custom instructions (e.g. using dates instead of day numbers in this example). The entry is also sent to the AI system, so custom instructions or additional context (e.g. a table that maps animal numbers to Benchling animal IDs) can also come from there.
Attach any files, then click the play button to submit the task to the AI system. Digital PDFs, Word documents, Excel workbooks, and PowerPoint presentations are all supported.
In this example, the uploaded Excel file looks like this:
The AI system will typically take 1 to 3 minutes to process the file. Currently only smaller files (under ~10 pages of text, including the notebook entry) are enabled.
When it finishes, it will show a preview of the results. A banner will indicate if the results were verified across repeated runs. If there was a discrepancy between the multiple runs, the differences will be shown.
Always check the results for accuracy, even when verification passed; see “Risks from AI mistakes” below for more details. When verification does not pass, the results may still be accurate, but we encourage checking the flagged values.
After confirming accuracy, click Approve and insert rows or Acknowledge verification failure and insert rows to insert the displayed rows as new rows in the table. Any files will be attached to the notebook entry for reference.
Recommended practices
While the flexible data entry feature is designed to accommodate a wide variety of use cases automatically, there are some ways to improve the reliability and scale of the feature:
- Be aware that the AI system receives all data from your notebook entry and all files that you upload directly. It may be helpful to narrow down the data to a more specific set, e.g. by removing irrelevant Excel sheets.
- The AI system does not receive the contents of previously-attached files in your notebook entry or related items in Benchling. Ensure that any relevant information is directly contained in the notebook entry, uploaded files, or custom instructions.
-
Instructions for the AI system may be written either directly when using the tool, or may be contained anywhere in the notebook entry.
- For simple tasks, custom instructions may be unnecessary; the AI system can often figure out the details of the task from context.
- In writing custom instructions, write as if you are explaining the task to a human. Think of any possible clarifications that a human might find helpful. Walking through concrete examples can often be more effective than generic instructions.
- Use the tool in an iterative process, and try different instructions until you find instructions that are most reliable.
- Consider designing your Benchling schema and table to make the data entry process as straightforward as possible. For example, the AI system will have an easier time if the column names match exactly. The primary identifying columns (e.g. sample ID and time) should appear first in order and should uniquely identify each row. Consider including intermediate values like well plate position to guide the AI system to take a step-by-step approach.
- If you are a tenant admin, consider enabling OpenAI as an AI provider by following the steps on the Benchling Intelligence page. Enabling OpenAI gives Benchling access to a wider variety of models to cross-check data entry accuracy.
Risks from AI mistakes
As with any AI tool, the flexible data entry tool may make mistakes, so it is critical to verify the results to ensure the data is properly entered into the system. While the tool contains a system to cross-check AI responses to avoid data transcription errors, this is not a guarantee of correctness.
Be aware of the different ways that the feature may potentially produce an incorrect result:
-
In some cases, the task itself may be ambiguous. For example, an “ID” column might reasonably correspond to several types of IDs present in the source document, or there may be multiple reasonable ways to handle missing or invalid data. The AI system will attempt to make reasonable choices based on the situation, but it may make the wrong assumption in some cases. To mitigate this risk, specify custom instructions, either in the prompt or in the notebook entry, to make the task as clear and unambiguous as possible.
- For some types of files, such as digital PDFs, the underlying text data may contain errors or ambiguities. For example, numerical superscripts and subscripts may be extracted incorrectly. When possible, prefer document formats like Excel that are more inherently structured. We are working to improve the accuracy in these situations over time.
- The tool has a “planning” phase where it assesses the work and attempts to break it down into smaller tasks, and there may be mistakes in this phase. Ensure that the total number of rows is the number that you expect, and ensure that all major portions of the dataset are present and not duplicated.
- Individual values may be entered incorrectly. While the cross-model verification mechanism significantly reduces the likelihood of such an error happening, it is not a guarantee, and the results should be checked for accuracy. Cell-level errors may typically manifest as a value being taken from the wrong row or column in the input file.
Limitations
- By default, large files are not supported. The exact cutoff depends on the details of the file, and is approximately 10 pages of text for the file and notebook entry combined. To discuss larger use cases, reach out to ai-ml@benchling.com.
-
While we aim to support a wide variety of file types, some files will work better than others. If a file is too large or too complex, the system may give an error message or may produce incorrect results. We are working to improve the capabilities of the system over time to handle larger and more complex files.
- Only some table types are supported: result tables, entity registration tables, create container tables, and fill container tables. More table types may be supported in the future.
- Image-only PDFs and other image-based inputs are not yet supported.
- The operation usually takes at least 1 minute to run, even on small files.
- Custom instructions cannot yet be saved or shared as a template.
- Some information may not be available to the AI system, such as details about linked items, attachments, schema configuration, dropdown values, etc. It may sometimes be necessary to provide this information explicitly in AI instructions.
Security and privacy
For more information about privacy and security for AI-powered features, see Security and Privacy for Benchling Intelligence.