Query optimization is a feature of relational databases that tries to determine the most efficient way to execute a query by considering all possible query plans. Optimization is important because a query's efficiency determines how long it takes the database to generate your query's results.
In Insights, queries automatically time out after one minute. To prevent timeouts, we recommend optimizing your queries.
This article covers recommendations for what to prioritize and avoid when structuring a query and other ways to identify query bottlenecks.
Prioritize
We recommend prioritizing the following when structuring queries:
- Selecting only necessary columns
- Considering row limits
- Filtering early
- Setting a row limit
- Splitting queries
Selecting only necessary columns
Consider specifying only the columns you need, Instead of calling SELECT *
. Selecting more columns than is necessary puts extra load on the database, causing slowness.
Considering row limits
Insights can only display 1000 rows of data at a time within the tool. This applies to both the raw data output, any CTEs, and any data visualized in graphs. If you are working with large quantities of data that you wish to visualize on Insights, consider using group by
clauses and Parameters to reduce the total number of rows.
The group by
clause collapses and aggregates the returned rows into groups, thereby reducing the total number of displayed rows. Parameters are placeholders for variables when executing a query. By specifying a value for your parameters, you can refine your query to a more targeted output and reduce the total number of rows.
In cases where row limits are still exceeded, we recommend either exporting the data in .csv format for visualization on tools like Excel or leveraging third party business intelligence tools through external warehouse connections.
Filtering early
Filter data with a WHERE
clause earlier in the query, like in an earlier CTE, before joining to other tables. This means fewer rows need to be joined to other tables downstream. Don't repeat filters.
Example
Original query
select *
from tableA
join tableB on some_condition
where filter_tableB_on_some_condition
Optimized query
with cte as (
select * from tableB
where filter_tableB_on_some_condition)
select *
from tableA
join cte on some_condition
Setting a row limit
If the query will return a large number of rows, limit the number of rows it returns using a LIMIT
clause. Insights automatically truncates displayed rows at 1000.
Parameters
Splitting queries
If your query is large and joins many tables, reduce the number of views used in the join by splitting it into multiple, smaller queries. The more tables joined in a query, the more time the query optimizer needs.
You can perform filters and expensive operations on subsets of data by including CTEs and subqueries. Testing subsets of your query also helps pinpoint what operations are slowing down execution.
Example
Original query
select a.a, b.b, c.c
from a
left join b on a.a = b.b
left join c on c.c = a.a
Split query
select a.a, b.b
from a
left join b on a.a. = b.b
select a.a., c.c
from a
left join c on c.c. = a.a
Avoid
Avoid the following when structuring queries:
FULL OUTER JOINs
If your output only needs the matching records from one table, try retrieving only the necessary records by calling LEFT JOIN
or INNER JOIN
instead of FULL OUTER JOIN
. Outer joins limit the options for database optimization.
Expensive operators
Sorting is an expensive operation. Calling ORDER BY
earlier in the query on a larger dataset is slower than calling it towards the end on a filtered dataset. Avoid repeating sorts.
DISTINCT
and UNION
operators require sorting. UNION ALL
is more efficient than UNION
, and GROUP BY
is more efficient than DISTINCT
.
Identify query bottlenecks
If your query is optimized and still timing out, you can identify bottle necks by running operators on the query connecting directly to the warehouse:
-
EXPLAIN
provides the execution plan of the query and the estimated amount of time each statement will cost -
EXPLAIN ANALYZE
executes the query and provides the actual total runtime
To learn more about these queries, visit PostgreSQL.
Note: Access to the warehouse is licensed. To directly connect to the warehouse from your command line shell or third-party analytics tool, visit our developer documentation.