This page provides more detailed documentation on the various mathematical functions available in Analysis. Each section describes a different functional context and links out to the documentation that powers the functionality in Benchling.
Aggregate and Window Functions
The following is a list of the functions available for use in Analysis when you use either "Add window function" or "Aggregate table" transformations. Because Analysis tables are powered by duckdb, we link directly to the DuckDB description of each function.
Aggregate functions operate on multiple rows of data and return a single result, which makes them useful for summarizing large sets of data.
- average: Returns the arithmetic mean of a set of numeric values
- count: Counts the number of non-NULL entries
- count nulls: Counts the number of NULL values in a dataset
- max: Returns the largest value in a dataset
- median: Finds the median value in a set of numbers
- min: Returns the smallest value in a dataset
- mode: Returns the most frequent value in a dataset
- product: Returns the product of all values in a dataset (multiplies them together)
- standard deviation: Measures the amount of variation or dispersion in a set of values
- standard error: Measures the accuracy with which a sample distribution represents a population
- sum: Returns the sum of all non-NULL values
- variance: Measures how far a set of numbers are spread out from their average value
- geometric mean: Calculates the geometric mean, which is useful when dealing with percentages or rates of change
Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, window functions do not group the result into a single row.
- first: Returns the first value in the window of rows
- lag: Returns the value of a preceding row at a given offset
- last: Returns the last value in the window of rows
- lead: Returns the value of the following row at a given offset
- cume dist: Calculates the cumulative distribution of a value within its partition
- dense rank: Assigns ranks to rows without gaps in ranking values
- percent rank: Returns the relative rank of a row as a percentage of the total
- rank: Assigns a rank to each row within a partition, with gaps if there are ties
- row number: Assigns a unique sequential integer to rows within a partition
Computed Column Functions
These are the functions available in the formulas for computed columns.
- countblanks: Counts the number of blank or NULL values in a dataset
- geomean: Computes the geometric mean in specific notebook environments
Date & Time Functions
Date functions help to manipulate and analyze date and time values.
- datebucket: Aggregates date or timestamp values into defined time intervals or "buckets"
- dateconvert: Converts a date to another format or data type
- datedif: Returns the difference between two dates in a specified unit (e.g., days, months)
- datetrunc: Truncates a date or time to a specified precision (e.g., month, day)
String Functions
- concatenate: Concatenates (joins) multiple text values into one
- left: Extract a certain number of characters starting from the beginning of a string
- len: Get length of a string
- mid: Extract a certain number of characters starting from a chosen position in a string
- right: Extract a certain number of characters starting from the end of a string
- split: Splits a string into an array based on a specified delimiter
- substitute: Replace one string with another
- textjoin: combine strings with a delimiter
Search Functions
- extract: Extracts an element from a list
- vlookup: Locates the first row with a value in a specified column that matches a supplied value, then returns a value from a different column in the same located row. Note that VLOOKUP does not currently support using a column as the value to search for. You can only specify a string.
Regression Models
Regressions are available as part of Analysis.
Note: Regressions, Interpolations, and Analysis Templates require an additional purchase. For access to these features, please contact your Benchling account manager.
Linear and Quadratic regressions are powered bythe Statsmodel library. Visit the Statsmodels documentation.
The 4-Parameter Logistic (4PL) regression is commonly used in bioassay analysis. It fits a sigmoidal curve to the data, making it ideal for modeling biological systems with a limited response range. This algorithm fits the 4-Parameter Logistic curve to data by minimizing the residual sum of squares between observed and predicted values. It is typically used for nonlinear curve fitting and is implemented in Analysis using the lmfit package in Python. For more information, visit the lmfit documentation.