Optimizing queries

Meghan
Meghan
  • Updated

Query optimization is a feature of relational databases that tries to determine the most efficient way to execute a query by considering all possible query plans. Optimization is important because a query's efficiency determines how long it takes the database to generate your query's results.

In Insights, queries automatically time out after one minute. To prevent timeouts, we recommend optimizing your queries.

This article covers recommendations for what to prioritize and avoid when structuring a query and other ways to identify query bottlenecks.

Prioritize

We recommend prioritizing the following when structuring queries:

Selecting only necessary columns

Consider specifying only the columns you need, Instead of calling SELECT *. Selecting more columns than is necessary puts extra load on the database, causing slowness.

Considering row limits 

Insights can only display 1000 rows of data at a time within the tool. This applies to both the raw data output, any CTEs, and any data visualized in graphs. If you are working with large quantities of data that you wish to visualize on Insights, consider using group by clauses and Parameters to reduce the total number of rows.

The group by clause collapses and aggregates the returned rows into groups, thereby reducing the total number of displayed rows. Parameters are placeholders for variables when executing a query. By specifying a value for your parameters, you can refine your query to a more targeted output and reduce the total number of rows.

In cases where row limits are still exceeded, we recommend either exporting the data in .csv format for visualization on tools like Excel or leveraging third party business intelligence tools through external warehouse connections.

Filtering early

Filter data with a WHERE clause earlier in the query, like in an earlier CTE, before joining to other tables. This means fewer rows need to be joined to other tables downstream. Don't repeat filters. 

Example

Original query

select *
from tableA
join tableB on some_condition
where filter_tableB_on_some_condition

Optimized query

with cte as (
select * from tableB
where filter_tableB_on_some_condition)

select *
from tableA
join cte on some_condition

Setting a row limit

If the query will return a large number of rows, limit the number of rows it returns using a LIMIT clause. Insights automatically truncates displayed rows at 1000.

 

Parameters 

Splitting queries

If your query is large and joins many tables, reduce the number of views used in the join by splitting it into multiple, smaller queries. The more tables joined in a query, the more time the query optimizer needs.

You can perform filters and expensive operations on subsets of data by including CTEs and subqueries. Testing subsets of your query also helps pinpoint what operations are slowing down execution.

Example

Original query

select a.a, b.b, c.c
from a
left join b on a.a = b.b
left join c on c.c = a.a

Split query

select a.a, b.b
from a
left join b on a.a. = b.b

select a.a., c.c
from a
left join c on c.c. = a.a

Avoid

Avoid the following when structuring queries:

FULL OUTER JOINs

If your output only needs the matching records from one table, try retrieving only the necessary records by calling LEFT JOIN or INNER JOIN instead of FULL OUTER JOIN. Outer joins limit the options for database optimization.

blobid0.jpg

Expensive operators

Sorting is an expensive operation. Calling ORDER BY earlier in the query on a larger dataset is slower than calling it towards the end on a filtered dataset. Avoid repeating sorts.

DISTINCT and UNION operators require sorting. UNION ALL is more efficient than UNION, and GROUP BY is more efficient than DISTINCT.

Identify query bottlenecks

If your query is optimized and still timing out, you can identify bottle necks by running operators on the query connecting directly to the warehouse:

  • EXPLAIN provides the execution plan of the query and the estimated amount of time each statement will cost
  • EXPLAIN ANALYZE executes the query and provides the actual total runtime

To learn more about these queries, visit PostgreSQL.

Note: Access to the warehouse is licensed. To directly connect to the warehouse from your command line shell or third-party analytics tool, visit our developer documentation.

 

Was this article helpful?

Have more questions? Submit a request