Cleaning and Transforming Data in Power Query

Loading

Here’s a comprehensive and detailed guide on Cleaning and Transforming Data in Power Query within Power BI.


Introduction to Data Cleaning and Transformation in Power Query

Power Query is an essential tool within Power BI that helps users extract, transform, and load (ETL) data from multiple sources. One of the most crucial steps in data preparation is cleaning and transforming raw data to ensure accuracy, consistency, and usability.

In this guide, we will explore how to clean and transform data effectively using Power Query in Power BI.


Steps for Cleaning and Transforming Data in Power Query

Step 1: Load Data into Power Query

  1. Open Power BI Desktop.
  2. Click on Home > Get Data and choose a data source such as:
    • Excel
    • SQL Server
    • SharePoint
    • Web
    • OData Feed, etc.
  3. Click Transform Data to open the Power Query Editor.

Step 2: Understanding the Power Query Editor Interface

Once inside Power Query, you will see:

  • Queries Pane (Left): Lists all data queries loaded.
  • Data Preview Grid (Center): Shows a preview of your dataset.
  • Query Settings Pane (Right): Lists all applied transformations.
  • Ribbon (Top Menu): Contains transformation functions.

Step 3: Removing Unwanted Columns

  1. Identify columns that are not needed.
  2. Click on the column header, then choose Remove Columns from the ribbon.
  3. Alternatively, select Choose Columns and manually pick only the necessary ones.

Step 4: Removing Duplicates

  1. Select a column or multiple columns where duplicate values exist.
  2. Click on Remove Duplicates in the ribbon.
  3. This will keep only the first occurrence of each unique value.

Step 5: Handling Null and Missing Values

  1. Click on a column with missing data.
  2. Choose Replace Values to fill missing values with a default (e.g., 0 or “Unknown”).
  3. Alternatively, filter out rows where the value is null using the filter dropdown.

Step 6: Changing Data Types

  1. Ensure each column has the correct data type (e.g., numbers, text, date).
  2. Click the small ABC icon next to the column name.
  3. Choose the correct type (e.g., Whole Number, Date/Time, Text).
  4. Incorrect data types can cause calculation errors, so ensure they are correctly assigned.

Step 7: Splitting Columns

If a column contains multiple values in a single field (e.g., “First Last”), you can split them:

  1. Select the column.
  2. Click Split Column > By Delimiter (e.g., space, comma, semicolon).
  3. Choose Each occurrence of the delimiter or Only the first occurrence.
  4. Rename the new split columns appropriately.

Step 8: Merging Queries (Joins)

To combine data from multiple sources:

  1. Click Merge Queries.
  2. Select the two tables to merge.
  3. Choose a common column to join on.
  4. Select the type of join (e.g., Inner Join, Left Join, Right Join).
  5. Expand the merged column to select the fields you want to include.

Step 9: Adding Conditional Columns

  1. Click Add Column > Conditional Column.
  2. Define conditions (e.g., If Sales > 1000, then “High”, else “Low”).
  3. Click OK to create a new column based on conditions.

Step 10: Grouping and Aggregating Data

If you need to summarize data:

  1. Click on a column (e.g., Date or Category).
  2. Choose Group By.
  3. Select the column to group by and the operation to perform (e.g., Sum, Count, Average).
  4. Click OK to create a summary table.

Step 11: Unpivoting Data

Unpivoting is useful when dealing with wide tables.

  1. Select multiple columns.
  2. Click Unpivot Columns to turn column headers into rows.
  3. Rename the new columns (e.g., Attribute and Value).

Step 12: Sorting and Filtering Data

  1. Click on the column header.
  2. Choose Sort Ascending or Sort Descending.
  3. To filter, click the dropdown in the column header and deselect unwanted values.

Step 13: Using Advanced Editor for M Code

For advanced transformations, you can modify the M code manually:

  1. Click Advanced Editor in the ribbon.
  2. Modify the script to customize transformations.

Example of M Code:

let
    Source = Excel.Workbook(File.Contents("C:\Data.xlsx")),
    FilteredRows = Table.SelectRows(Source, each [Sales] > 1000),
    ChangedType = Table.TransformColumnTypes(FilteredRows, {{"Date", type date}})
in
    ChangedType

Step 14: Applying and Loading Data

  1. Click Close & Apply in the Power Query Editor.
  2. Data will be loaded into Power BI for further analysis.

Conclusion

Cleaning and transforming data in Power Query is crucial to ensure high-quality reporting in Power BI. By removing duplicates, handling missing values, merging queries, and applying transformations, you can prepare clean and structured data for analysis.

Would you like a step-by-step tutorial on any specific transformation?

Leave a Reply

Your email address will not be published. Required fields are marked *