BigQuery Advanced Patterns
๐ฏ Purpose¶
This guidebook expands your BigQuery fluency with advanced features: ARRAYs, STRUCTs, UNNEST, ARRAY_AGG, WITH OFFSET
, nested joins, and multi-level aggregation logic โ all commonly found in GA4, Firebase, and modern analytics tables.
๐งบ 1. Working with ARRAYs¶
๐น UNNEST an array¶
SELECT user_id, item
FROM `project.dataset.orders`, UNNEST(items) AS item;
โ๏ธ Required when querying arrays or GA4-style events
๐น Count array elements¶
SELECT user_id, ARRAY_LENGTH(items) AS item_count
FROM orders;
๐๏ธ 2. STRUCTs and Nested Fields¶
๐น Access nested STRUCT fields¶
SELECT
user_info.name AS name,
user_info.address.city AS city
FROM users;
โ๏ธ Common in event data, especially user/device metadata
๐น STRUCTs inside ARRAYs¶
SELECT
user_id,
product.name AS product_name
FROM orders, UNNEST(products) AS product;
โ๏ธ Use dot notation after UNNEST
๐งฎ 3. ARRAY_AGG + Deduplication¶
๐น Collect items into an array¶
SELECT user_id, ARRAY_AGG(item) AS items
FROM purchases
GROUP BY user_id;
๐น Deduplicate with DISTINCT¶
SELECT user_id, ARRAY_AGG(DISTINCT item) AS unique_items
FROM purchases
GROUP BY user_id;
๐ข 4. WITH OFFSET (Indexing Inside Arrays)¶
SELECT
user_id,
interest,
offset AS interest_rank
FROM users,
UNNEST(interests) WITH OFFSET AS offset;
โ๏ธ Useful for top-N ranking within flattened fields
๐ช 5. Nested Aggregations (2-Level)¶
-- Example: most purchased item per user
WITH item_counts AS (
SELECT user_id, item, COUNT(*) AS cnt
FROM orders
GROUP BY user_id, item
),
ranked AS (
SELECT *,
RANK() OVER (PARTITION BY user_id ORDER BY cnt DESC) AS rnk
FROM item_counts
)
SELECT *
FROM ranked
WHERE rnk = 1;
โ๏ธ Use CTEs + RANK to simulate FIRST_VALUE or top-1 logic
๐ 6. Flattened Event Table Tips¶
Feature | Tip |
---|---|
event_params is an ARRAY |
Use UNNEST(event_params) and filter by key |
user_properties are nested |
Use UNNEST() + dot notation + OFFSET if needed |
Always filter large arrays early | Apply WHERE clause after UNNEST or inside CTE |
โ Patterns Covered¶
- Querying nested and repeated fields (ARRAY + STRUCT)
- Flattening GA4-style event schemas
- Aggregating inside groups
- Simulating top-1 selection with
RANK()
๐ก Tip¶
โIn BigQuery, flat queries break fast. Nested logic + UNNEST is the key to modeling modern event data.โ