Data Profiling & Quality
Learn to evaluate data quality using Power Query's profiling tools — column distribution, column quality, column profile — and how to resolve nulls, errors, and inconsistencies.
Trust your data before you build on it
Imagine buying a used car. Before you drive it, you check the engine, tyres, and mileage. You don’t just trust the seller — you verify.
Data profiling is the same idea. Before you build dashboards and DAX measures on your data, you check it for problems: missing values, duplicates, wrong data types, unexpected patterns. Power Query gives you three built-in tools to do this.
The three profiling views
Power Query Editor offers three profiling tools under View tab. Each reveals different problems:
| Tool | What It Shows | What to Look For |
|---|---|---|
| Column Quality | Percentage of Valid, Error, and Empty values per column | Any column with errors or high empty percentage needs attention |
| Column Distribution | Count of distinct values and unique values per column | A primary key should have 100% unique values; low distinct count may indicate a category column |
| Column Profile | Detailed statistics: min, max, average, standard deviation, value distribution histogram | Outliers, unexpected ranges, skewed distributions |
Exam tip: first 1,000 rows vs entire dataset
This is a classic exam trick. By default, Power Query profiles only the first 1,000 rows. If your dataset has 500,000 rows, you might miss errors that appear after row 1,000.
Always change to “Column profiling based on entire dataset” (click the link in the Power Query Editor status bar) when profiling production data. The exam will test whether you know this default and how to change it.
Spotting and fixing data quality issues
Nadia Okafor at Prism Agency (📊) loads campaign data from 5 different advertising platforms. Each platform formats data differently. Here’s what she finds using profiling:
Nulls and empty values
Column Quality shows that the CampaignEndDate column has 15% empty values. These are campaigns that are still running — they don’t have an end date yet.
Fix options:
- Replace nulls with a meaningful value (e.g., “Ongoing”) if the column is text
- Fill down/up to inherit values from adjacent rows
- Remove rows if null records aren’t needed for analysis
- Keep nulls if “no value” is meaningful (e.g., active campaigns)
Data type mismatches
The Spend column was imported as Text because one platform formats currency as “$1,234.56”. Power Query can’t do math on text.
Fix: Remove the dollar sign using Replace Values, then change the column type to Decimal Number. If some rows still fail conversion, use Replace Errors to handle them.
Inconsistent values
The Platform column has “Facebook”, “facebook”, “FACEBOOK”, and “Meta Ads” — all meaning the same thing.
Fix: Use Replace Values to standardise, or use Conditional Column to map variations to a single value.
Import errors
Riley at Coastal Fresh (🛒) gets an error importing an Excel file because a column has mixed data types (some cells are numbers, others are text). Power BI imports it as “Error” for the mismatched rows.
Fix options:
- Change the column type in Power Query — rows that can’t convert become Error values (then use Replace Errors to handle them)
- Fix the source data in Excel
- Use Replace Errors to substitute a default value
Real-world: Dr. Ethan's patient data audit
Dr. Ethan Moss at Bayview Medical Centre (🏥) profiles a patient records extract before building a dashboard. Column Distribution reveals that PatientID has 50,000 distinct values but only 49,800 unique values — meaning 200 IDs appear more than once. This signals duplicate records that would inflate patient counts. He uses Remove Duplicates on the PatientID column to fix it.
Common data quality actions in Power Query
| Problem | Power Query Fix | Where to Find It |
|---|---|---|
| Null values | Replace Values, Fill Down/Up, Remove Rows | Right-click column → Replace Values |
| Wrong data type | Change Type | Column header icon or Transform tab |
| Duplicates | Remove Duplicates | Right-click column → Remove Duplicates |
| Extra whitespace | Trim | Transform → Format → Trim |
| Mixed case | Capitalize Each Word, UPPER, lower | Transform → Format |
| Errors in column | Replace Errors, Remove Errors | Right-click column |
| Inconsistent values | Replace Values, Conditional Column | Right-click or Add Column tab |
Knowledge check
Nadia profiles a campaign dataset and notices that Column Quality shows 5% errors in the 'Spend' column. The errors occur because some values contain a dollar sign ('$1,234'). What's the best fix?
Which Power Query profiling view would help Riley identify that her 'ProductID' column has duplicate values?
🎬 Video coming soon
Next up: Power Query Transforms — reshape your data with columns, groups, pivots, and more.