How the WITH Clause Can Simplify Your SQL Queries
This is a quick post to highlight how you can simplify your SQL using the WITH clause. Many of the posts on this blog use this technique to prepare queries that are easy to understand and work with.
-
Migrated from ken-williams.com
-
Explained recursion and updated the “performance” section.
Benefits of using the WITH Clause
The WITH clause allows you to reduce joins and logically sequence your code. It does this by creating temporary tables (technically they are views) that are usable by your main SQL statement so that you can break your code into smaller and easier to understand snippets.
Syntax of the WITH Clause
NOTE |
---|
I am working with Google Analytics 4 data in BigQuery for this post, but the following will be applicable to many other database systems. |
The WITH clause is a prefix for your main query. You can use it to create one or more CTEs (also known as "common table expressions". Each CTE can be given a name, and then reference in other places in the same query.
As an example, here is how you might pull the unique ID's of users who visited your site yesterday and then came back again today:
# Step 1: Get a list of users who visited the site yesterday WITH yesterday AS ( SELECT user_pseudo_id AS yesterday_user_pseudo_id FROM `<project name>.<dataset name>.events_08012021` ) # Step 2: Get a list of users who also visited the site today # by pulling today's users and then running an inner join # with those who visited yesterday. SELECT user_pseudo_id FROM `<project name>.<dataset name>.events_09012021`,yesterday WHERE yesterday.yesterday_user_pseudo_id = user_pseudo_id
In the example above I'm defining a temporary table with the name "yesterday". You could also put a comma at the end to add other temporary tables if you choose, like this:
WITH yesterday AS ( SELECT user_pseudo_id AS yesterday_user_pseudo_id FROM `<project name>.<dataset name>.events_08012021` ), last_week AS ( SELECT user_pseudo_id AS yesterday_user_pseudo_id FROM `<project name>.<dataset name>.events_02012021` )
Recursive vs. Non-recursive CTEs
The CTEs that you create using the WITH clause can be recursive (meaning that it references itself) or non-recursive. Virtually all of the CTE's created in this blog are non-recursive to keep them simple and easy to read, but recursive CTEs will declare that they are "RECURSIVE" and be given a keyword like this:
# Recursive CTE in a WITH clause # The recursive keyword in this example is "n" WITH RECURSIVE T1 AS ( (SELECT 1 AS n) UNION ALL (SELECT n + 2 FROM T1 WHERE n < 4)) SELECT * FROM T1 ORDER BY n /*---* | n | +---+ | 1 | | 3 | | 5 | *---*/
So why am I teaching you about recursive CTE's if I don't use them in this blog? Well it's so you can understand how the WITH clause impacts query performance...
Performance
Recursive CTEs will be materialized in BigQuery, which means that they will be precomputed and the results will be cached. This is great for performance because they will only be executed once even if they are referenced multiple times.
The results of your non-recursive CTEs will not be materialized however. So if you reference the CTE multiple times it will be executed once for each reference.
So, the tl;dr is that recursive CTEs (which you will not find in this blog) are helpful for performance, but non-recursive CTEs are useful to make your code readable. You can read Google's post on this topic HERE
Other Resources for BigQuery and GA4
Here are a few other posts that I’ve created for using BigQuery with data from Google Analytics 4:
- Cross-Stream First-Touch Attribution with Google Analytics 4 and BigQuery
- How the WITH Clause Can Simplify Your SQL Queries
- BigQuery + GA4: How To Get the First or Most Recent Event for a User
- How to Use the Over clause to Create a Moving Average in BigQuery
- Data Studio + BigQuery + GA4: Viewing a Parameter Across Multiple Events
- Data Studio + BigQuery + GA4: How to View Multiple Parameters on the Same Event
- How to Flatten a BigQuery Table with UNNEST