Query optimization is a feature of relational databases that tries to determine the most efficient way to execute a query by considering all possible query plans. Optimization is important because a query's efficiency determines how long it takes the database to generate your query's results.
In Insights, queries automatically time out after one minute. To prevent timeouts, we recommend optimizing your queries.
This article covers recommendations for what to prioritize and avoid when structuring a query and other ways to identify query bottlenecks.
We recommend prioritizing the following when structuring queries:
- Selecting only necessary columns
- Considering row limits
- Filtering early
- Setting a row limit
- Splitting queries
Selecting only necessary columns
Consider specifying only the columns you need, Instead of calling
SELECT *. Selecting more columns than is necessary puts extra load on the database, causing slowness.
Considering row limits
Insights can only display 1000 rows of data at a time within the tool. This applies to both the raw data output, any CTEs, and any data visualized in graphs. If you are working with large quantities of data that you wish to visualize on Insights, consider using
group by clauses and Parameters to reduce the total number of rows.
group by clause collapses and aggregates the returned rows into groups, thereby reducing the total number of displayed rows. Parameters are placeholders for variables when executing a query. By specifying a value for your parameters, you can refine your query to a more targeted output and reduce the total number of rows.
In cases where row limits are still exceeded, we recommend either exporting the data in .csv format for visualization on tools like Excel or leveraging third party business intelligence tools through external warehouse connections.
Filter data with a
WHERE clause earlier in the query, like in an earlier CTE, before joining to other tables. This means fewer rows need to be joined to other tables downstream. Don't repeat filters.
join tableB on some_condition
with cte as (
select * from tableB
join cte on some_condition
Setting a row limit
If the query will return a large number of rows, limit the number of rows it returns using a
LIMIT clause. Insights automatically truncates displayed rows at 1000.
If your query is large and joins many tables, reduce the number of views used in the join by splitting it into multiple, smaller queries. The more tables joined in a query, the more time the query optimizer needs.
You can perform filters and expensive operations on subsets of data by including CTEs and subqueries. Testing subsets of your query also helps pinpoint what operations are slowing down execution.
select a.a, b.b, c.c
left join b on a.a = b.b
left join c on c.c = a.a
select a.a, b.b
left join b on a.a. = b.b
select a.a., c.c
left join c on c.c. = a.a
Avoid the following when structuring queries:
FULL OUTER JOINs
If your output only needs the matching records from one table, try retrieving only the necessary records by calling
LEFT JOIN or
INNER JOIN instead of
FULL OUTER JOIN. Outer joins limit the options for database optimization.
Sorting is an expensive operation. Calling
ORDER BY earlier in the query on a larger dataset is slower than calling it towards the end on a filtered dataset. Avoid repeating sorts.
UNION operators require sorting.
UNION ALL is more efficient than
GROUP BY is more efficient than
Identify query bottlenecks
If your query is optimized and still timing out, you can identify bottle necks by running operators on the query connecting directly to the warehouse:
EXPLAINprovides the execution plan of the query and the estimated amount of time each statement will cost
EXPLAIN ANALYZEexecutes the query and provides the actual total runtime
To learn more about these queries, visit PostgreSQL.
Note: Access to the warehouse is licensed. To directly connect to the warehouse from your command line shell or third-party analytics tool, visit our developer documentation.