Combining Data: Merge, Append & Queries
Learn how to merge tables (like SQL JOIN), append tables (like UNION), handle semi-structured data, and choose between reference and duplicate queries in Power Query.
Bringing data together
Think of two puzzle pieces.
Merge is like connecting two pieces side by side β matching them on a shared edge (a common column). You get a wider table. This is like a SQL JOIN.
Append is like stacking one piece on top of another β both have the same shape (same columns), and you just add more rows. This is like a SQL UNION.
This module covers both operations, plus how to handle messy JSON/XML data and when to use reference vs duplicate queries.
Merge Queries: combining tables side by side
Riley at Coastal Fresh (π) has two tables:
Sales table: OrderID, ProductID, Quantity, Revenue Products table: ProductID, ProductName, Category, Supplier
She needs product names on her sales data. She merges Sales with Products on the ProductID column β this adds product details to each sales row.
The six join types
| Join Type | What It Returns | SQL Equivalent |
|---|---|---|
| Left Outer | All rows from the first table, matched rows from second (nulls where no match) | LEFT JOIN |
| Right Outer | All rows from second table, matched rows from first | RIGHT JOIN |
| Inner | Only rows that match in both tables | INNER JOIN |
| Full Outer | All rows from both tables (nulls where no match on either side) | FULL OUTER JOIN |
| Left Anti | Only rows from first table that have NO match in second | LEFT JOIN WHERE right IS NULL |
| Right Anti | Only rows from second table that have NO match in first | RIGHT JOIN WHERE left IS NULL |
How to Merge:
- Home β Merge Queries (or Merge Queries as New)
- Select the first table and key column
- Select the second table and matching column
- Choose join type (Left Outer is default)
- Expand the resulting nested column to select which fields to include
Exam tip: anti joins
Left Anti and Right Anti joins are exam favourites. They find rows that DONβT have a match.
Example: βFind all products that have never been sold.β Merge the Products table (first) with the Sales table (second) using a Left Anti join on ProductID. The result contains only products with zero sales.
Append Queries: stacking tables on top
Nadia at Prism Agency (π) receives campaign data from Google Ads, Meta Ads, and LinkedIn β three separate tables with identical column structures. She appends them into a single unified campaign table.
Append = same columns, more rows.
How to Append:
- Home β Append Queries (or Append Queries as New)
- Choose βTwo tablesβ or βThree or more tablesβ
- Select the tables to stack
- Columns are matched by name β extra columns become null
Merge vs Append at a glance:
- Merge: Different columns, same key β wider table
- Append: Same columns, different data β taller table
Semi-structured data: JSON and XML
Dr. Ethan Moss at Bayview Medical Centre (π₯) connects to a REST API that returns patient appointment data in JSON format. The raw import shows a single column of nested records.
Converting JSON to a table:
- Power Query imports JSON as a single column of Record or List values
- Click the expand icon (β) on the column header
- Select which fields to include
- Repeat for nested objects
Key M functions for semi-structured data:
Json.Document()β parse JSON text into records/listsTable.FromRecords()β convert a list of records into a tableRecord.FieldValues()β extract values from a record
Real-world: expanding nested JSON
API responses often have nested structures. For example, a patient record might contain a name field and an appointments array, where each appointment has a date and doctor name.
You expand the top-level record first to get the name, then expand the appointments list, which creates new rows β one per appointment per patient. This row multiplication is normal when expanding nested lists.
Reference vs Duplicate queries
When you need to create a variation of an existing query, you have two options:
| Feature | Reference Query | Duplicate Query |
|---|---|---|
| How it works | Points back to the original query's output as its starting point | Creates an independent copy of all steps from the original |
| Dependency | Depends on the original β if original changes, reference updates too | Independent β changes to original don't affect the duplicate |
| Processing | Original runs once, reference builds on its result | Both queries run their full step sequences independently |
| Use when | You want to branch from a common base (e.g., filter the same cleaned data differently) | You want a completely independent copy to modify freely |
| Performance | More efficient β avoids re-processing shared steps | Less efficient β duplicates all processing work |
Example: Kenji at Apex Manufacturing (π) has a clean FactoryData query. He creates:
- A Reference query called
TokyoFactorythat filters to Tokyo - A Reference query called
ShanghaiFactorythat filters to Shanghai
Both reference queries share the same cleaning steps from FactoryData. If he adds a new cleaning step to the original, both references benefit automatically.
Knowledge check
Riley has a Products table and a Sales table. She wants to find all products that have never been sold. Which operation should she use?
Nadia needs to create two filtered versions of a cleaned campaign dataset β one for Google Ads and one for Meta Ads. She wants changes to the shared cleaning steps to apply to both. What should she use?
π¬ Video coming soon
Next up: Fact Tables, Dimension Tables and Keys β design the right table structure for Power BI.