Analysis mathematical detail

This page provides more detailed documentation on the various mathematical functions available in Analysis. Each section describes a different functional context and links out to the documentation that powers the functionality in Benchling.

Aggregate and Window Functions

The following is a list of the functions available for use in Analysis when you use either "Add window function" or "Aggregate table" transformations. Because Analysis tables are powered by duckdb, we link directly to the DuckDB description of each function.

Aggregate functions operate on multiple rows of data and return a single result, which makes them useful for summarizing large sets of data.

average: Returns the arithmetic mean of a set of numeric values
count: Counts the number of non-NULL entries
count nulls: Counts the number of NULL values in a dataset
max: Returns the largest value in a dataset
median: Finds the median value in a set of numbers
min: Returns the smallest value in a dataset
mode: Returns the most frequent value in a dataset
product: Returns the product of all values in a dataset (multiplies them together)
standard deviation: Measures the amount of variation or dispersion in a set of values
standard error: Measures the accuracy with which a sample distribution represents a population
sum: Returns the sum of all non-NULL values
variance: Measures how far a set of numbers are spread out from their average value
geometric mean: Calculates the geometric mean, which is useful when dealing with percentages or rates of change

Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, window functions do not group the result into a single row.

first: Returns the first value in the window of rows
lag: Returns the value of a preceding row at a given offset
last: Returns the last value in the window of rows
lead: Returns the value of the following row at a given offset
cume dist: Calculates the cumulative distribution of a value within its partition
dense rank: Assigns ranks to rows without gaps in ranking values
percent rank: Returns the relative rank of a row as a percentage of the total
rank: Assigns a rank to each row within a partition, with gaps if there are ties
row number: Assigns a unique sequential integer to rows within a partition

Computed Column Functions

These are the functions available in the formulas for computed columns.

countblanks: Counts the number of blank or NULL values in a dataset
geomean: Computes the geometric mean in specific notebook environments

Date & Time Functions

Date functions help to manipulate and analyze date and time values.

datebucket: Aggregates date or timestamp values into defined time intervals or "buckets"
dateconvert: Converts a date to another format or data type
datedif: Returns the difference between two dates in a specified unit (e.g., days, months)
datetrunc: Truncates a date or time to a specified precision (e.g., month, day)

String Functions

concatenate: Concatenates (joins) multiple text values into one
left: Extract a certain number of characters starting from the beginning of a string
len: Get length of a string
mid: Extract a certain number of characters starting from a chosen position in a string
right: Extract a certain number of characters starting from the end of a string
split: Splits a string into an array based on a specified delimiter
substitute: Replace one string with another
textjoin: combine strings with a delimiter

Search Functions

extract: Extracts an element from a list
vlookup: Locates the first row with a value in a specified column that matches a supplied value, then returns a value from a different column in the same located row. Note that VLOOKUP does not currently support using a column as the value to search for. You can only specify a string.

Regression Models

Regressions are available as part of Analysis.

Note: Regressions, Interpolations, and Analysis Templates require an additional purchase. For access to these features, please contact your Benchling account manager.

Linear and Quadratic regressions are powered bythe Statsmodel library. Visit the Statsmodels documentation.

The 4-Parameter Logistic (4PL) regression is commonly used in bioassay analysis. It fits a sigmoidal curve to the data, making it ideal for modeling biological systems with a limited response range. This algorithm fits the 4-Parameter Logistic curve to data by minimizing the residual sum of squares between observed and predicted values. It is typically used for nonlinear curve fitting and is implemented in Analysis using the lmfit package in Python. For more information, visit the lmfit documentation.

Analysis mathematical detail

Aggregate and Window Functions

Computed Column Functions

Date & Time Functions

String Functions

Search Functions

Regression Models

Was this article helpful?

<%= previousTitle %>

<%= nextTitle %>

In this article

Still need help?

Categories

Toggle navigation menu

<%= category.name %>