When working with BigQuery, it’s common to encounter scenarios where you need to join datasets. A unique feature of BigQuery is its ability to handle nested and repeated fields using the UNNEST
function. In this blog, we’ll walk through how to use a LEFT JOIN
with an unnested attribute, explaining each term and concept in detail.
The Problem
Consider the following scenario: you have two tables:
orders
Table:
order_id | customer_id | products |
---|---|---|
1 | 101 | [“apple”, “banana”] |
2 | 102 | [“orange”, “grape”] |
3 | 103 | NULL |
product_details
Table:
product_name | price |
---|---|
apple | 1.5 |
banana | 1.0 |
orange | 2.0 |
grape | 3.0 |
Our goal is to join these tables such that we get a detailed list of all products in each order, along with their respective prices. If an order has no products, it should still appear in the result.
The Query
Here’s how we can achieve this:
SELECT
o.order_id,
o.customer_id,
p.product_name,
pd.price
FROM
`project.dataset.orders` o
LEFT JOIN
UNNEST(o.products) AS p
LEFT JOIN
`project.dataset.product_details` pd
ON
p = pd.product_name;
Understanding the Terminology
LEFT JOIN
:
- A type of SQL join that combines rows from two tables.
- Ensures that all rows from the left table (
orders
) appear in the result, even if there’s no matching row in the right table (product_details
).
UNNEST
:
- A BigQuery-specific function that expands an array into a set of rows.
- In our example, the
products
column in theorders
table is an array (e.g.,["apple", "banana"]
). UNNEST(o.products)
creates one row for each product in the array.
- Alias (
AS p
):
- Assigns a short name (alias) to the result of the
UNNEST
operation. - Here,
p
represents individual product names from theproducts
array.
ON
Clause:
- Specifies the condition for joining tables.
- In this case,
p = pd.product_name
ensures that each product name from theUNNEST
operation is matched with its corresponding entry in theproduct_details
table.
The Result
The query produces the following output:
order_id | customer_id | product_name | price |
---|---|---|---|
1 | 101 | apple | 1.5 |
1 | 101 | banana | 1.0 |
2 | 102 | orange | 2.0 |
2 | 102 | grape | 3.0 |
3 | 103 | NULL | NULL |
Explanation of the Output
- For
order_id = 1
, theproducts
array containsapple
andbanana
, so we see one row per product with the corresponding prices. - For
order_id = 2
, theproducts
array containsorange
andgrape
. - For
order_id = 3
, theproducts
array isNULL
, so theUNNEST
operation produces no rows, but theLEFT JOIN
ensures that the row from theorders
table still appears withNULL
values forproduct_name
andprice
.
Why Use LEFT JOIN with UNNEST?
- Handle Nested Data: BigQuery’s
UNNEST
function allows you to work seamlessly with nested and repeated fields, which are common in JSON or NoSQL datasets. - Preserve Left Table Rows: The
LEFT JOIN
ensures no rows are lost from the primary (left) table, even if there’s no matching data in the joined table. - Scalable Operations: BigQuery’s architecture is optimized for such operations, making it efficient for large datasets.
Key Takeaways
- BigQuery’s
UNNEST
function is essential for working with arrays. LEFT JOIN
ensures data integrity by including all rows from the left table, even if there are no matches.- Combining
LEFT JOIN
withUNNEST
allows you to flatten nested data while enriching it with additional details from other tables.
This combination is a powerful tool in BigQuery for working with structured and semi-structured data. By mastering these concepts, you can unlock the full potential of BigQuery for complex data transformations.