Data Entry Agent uses AI facilitate the import of a wide range of files as Benchling data. For example, it can often understand PDFs and Excel spreadsheets even if not formatted in a clean tabular form. The AI system uses context clues from the document, your Notebook entry, and any instructions and examples you provide to determine the most reasonable way to enter data into structured tables within your Notebook.
This is a Benchling AI feature. See the AI at Benchling page for information on how to enable this and other AI-based features.
Use Data Entry agent
The Data Entry Agent can be used to fill result tables, registration tables, plate creation tables, box creation tables, or create/fill containers table. You can access the agent in three ways:
- In any supported structured table, select the Data Entry Agent option from the + button in the table header
- Click the Sparkle button in the table header (see above image). If you are processing files, you can click the attachment button from this pop-up
- Click the Insert data button from the AI tools dropdown in the entry tool bar at the top of the page
Once you've entered the Data Entry Agent modal you will complete three main steps: selecting tables and files, confirming instructions, and confirming results before the data is added to the table(s).
- Use the search bar to select one or more tables you'd like to fill
- Upload your file(s) and then click Next
- The agent will automatically suggest instructions and populate the first row of data in the table based on all the files you attach and other context in the entry
- You may wish to manually edit instructions to improve the agent's performance, such as:
- Editing the example row as necessary to match the format and data you expect. This helps the AI system understand how the data maps to column headings and what format to use
- Editing or removing the automatically suggested instructions to further clarify the task. For example, this is a good place to clarify the handling of blank values, or to explain any ID lookup steps that may not be obvious from context
- In this example, the uploaded Excel file looks like this:
- When the instructions have been confirmed, click Submit to start processing the data
- The AI system will typically take a few seconds to 3 minutes to process the file. When it finishes, it will show a preview of the results. A banner will indicate if the results were verified across repeated runs. If there was a discrepancy between the multiple runs, the differences will be shown
- If the results are unexpected, the Discard and go back button can be used to edit any instructions and try again
- Always check the results for accuracy, even when verification passes; see “Risks from AI mistakes” below for more details. When verification does not pass, the results may still be accurate, but we encourage checking the flagged values
- After confirming accuracy, click Approve and insert rows or Acknowledge verification failure and insert rows to insert the displayed rows as new rows in the table. Any files will be attached to the notebook entry for reference
Recommended practices
While the Data Entry Agent is designed to accommodate a wide variety of use cases automatically, there are some ways to improve the reliability and scale of the feature:
- Be aware that the AI system receives all data from your notebook entry and all files that you upload directly. It may be helpful to narrow down the data to a more specific set, e.g. by removing irrelevant Excel sheets
- The AI system does not receive the contents of previously-attached files in your notebook entry or related items in Benchling. Ensure that any relevant information is directly contained in the notebook entry, uploaded files, or custom instructions
- Instructions and examples for the AI system may be written either directly when using the tool, or may be contained anywhere in the notebook entry
- For simple tasks, custom instructions may be unnecessary; the AI system can often figure out the details of the task from context
- In writing custom instructions, write as if you are explaining the task to a human. Think of any possible clarifications that a human might find helpful. Walking through concrete examples can often be more effective than generic instructions
- Use the tool in an iterative process, and try different instructions until you find instructions that are most reliable
- Consider designing your Benchling schema and table to make the data entry process as straightforward as possible. For example, the AI system will have an easier time if the column names match exactly. The primary identifying columns (e.g. sample ID and time) should appear first in order and should uniquely identify each row. Consider including intermediate values like well plate position to guide the AI system to take a step-by-step approach
Risks from AI mistakes
As with any AI tool, the Data Entry Agent may make mistakes, so it is critical to verify the results to ensure the data is properly entered into the system. While the tool contains a system to cross-check AI responses to avoid data transcription errors, this is not a guarantee of correctness.
Be aware of the different ways that the feature may potentially produce an incorrect result:
- In some cases, the task itself may be ambiguous. For example, an “ID” column might reasonably correspond to several types of IDs present in the source document, or there may be multiple reasonable ways to handle missing or invalid data. The AI system will attempt to make reasonable choices based on the situation, but it may make the wrong assumption in some cases. To mitigate this risk, specify custom instructions, either in the prompt or in the notebook entry, to make the task as clear and unambiguous as possible
- For some types of files, such as PDFs, the underlying text data may contain errors or ambiguities. For example, numerical superscripts and subscripts may be extracted incorrectly. When possible, prefer document formats like Excel that are more inherently structured. We are working to improve the accuracy in these situations over time
- The tool has a “planning” phase where it assesses the work and attempts to break it down into smaller tasks, and there may be mistakes in this phase. Ensure that the total number of rows is the number that you expect, and ensure that all major portions of the dataset are present and not duplicated
-
Individual values may be entered incorrectly. While the cross-model verification mechanism significantly reduces the likelihood of such an error happening, it is not a guarantee, and the results should be checked for accuracy. Cell-level errors may typically manifest as a value being taken from the wrong row or column in the input file
Limitations
- While we aim to support a wide variety of file types, some files will work better than others. If a file is too large or too complex, the system may give an error message or may produce incorrect results. We are working to improve the capabilities of the system over time to handle larger and more complex files
- Only some table types are supported: result tables, entity registration tables, create container tables, fill container tables, box creation tables and plate creation tables. More table types may be supported in the future
- Larger operations may take at least 1 minute to run
- Custom instructions cannot yet be directly saved or shared as a template. However, the AI system will see any instructions contained directly in the notebook entry, which can be provided via entry template
- Some information may not be available to the AI system, such as details about linked items, attachments, etc. It may sometimes be necessary to provide this information explicitly in AI instructions
-
Embedded images in the uploaded files can not yet be transferred to attachment fields in the table
Security and privacy
For more information about privacy and security for AI-powered features, see Data protection and security for AI at Benchling.