Data Cardinality and Its Impact on Performance

Loading

Data Cardinality and Its Impact on Performance in Power BI – A Comprehensive Guide

Introduction:
Data cardinality plays a crucial role in Power BI performance optimization. Cardinality refers to the uniqueness of values in a column, which directly affects how Power BI compresses data, builds relationships, and processes queries.

This guide will cover:
What is Data Cardinality?
Types of Cardinality
How Cardinality Affects Performance
Best Practices to Optimize Cardinality in Power BI
Step-by-Step Optimization Techniques


🔹 What is Data Cardinality?

In Power BI, cardinality refers to the number of unique values in a column. It impacts data storage, relationships, and DAX calculations.

For example:

  • A column with 100% unique values (e.g., OrderID) has high cardinality.
  • A column with few distinct values (e.g., OrderStatus with “Pending,” “Shipped,” “Delivered”) has low cardinality.

Why is Cardinality Important?

Power BI compresses data to improve performance. Columns with low cardinality compress better than those with high cardinality, leading to faster queries and smaller file sizes.


🔹 Types of Data Cardinality

There are three main types of cardinality in Power BI:

Cardinality TypeDescriptionExampleImpact on Performance
High CardinalityA column has a large number of unique valuesOrderID (unique for every order)Poor compression, slow queries
Medium CardinalityA column has moderate distinct valuesProduct Category (e.g., Electronics, Furniture, Clothing)Moderate compression, decent performance
Low CardinalityA column has few distinct valuesOrder Status (Pending, Shipped, Delivered)Best compression, fast performance

🔹 How Data Cardinality Affects Performance in Power BI

1️⃣ High Cardinality Increases Memory Usage

Power BI compresses data using columnar storage. More unique values = less compression.
🔴 Example: A column with 10 million unique customer IDs takes more space than a column with only 5 customer categories.

2️⃣ High Cardinality Slows Down Relationships

Power BI optimizes joins using indexed lookups. If you have high-cardinality columns in relationships, Power BI takes longer to search and match records.
🔴 Example: Creating relationships on OrderID (millions of unique values) is slower than using CustomerSegment (10 values).

3️⃣ DAX Queries Slow Down with High-Cardinality Columns

DAX formulas process data column by column. If a measure involves high-cardinality columns, calculations become slower.
🔴 Example:

Total Sales = SUMX(SalesTable, SalesTable[Revenue])

If SalesTable has millions of unique rows, Power BI takes longer to sum the values.

4️⃣ Filtering and Slicers Perform Better with Low Cardinality

When using a slicer or filter, Power BI creates internal index lookups. Columns with low cardinality make filtering faster.
Best Practice: Use ProductCategory instead of ProductID in slicers for better performance.


🔹 Step-by-Step: Optimizing Data Cardinality in Power BI

Step 1: Identify High-Cardinality Columns

Use Power BI’s VertiPaq Analyzer to check column distinct values and compression.

📌 How to do it?

  1. Download DAX Studio (a free tool).
  2. Open your Power BI model and launch DAX Studio.
  3. Run: EVALUATE SUMMARIZECOLUMNS(SalesTable[OrderID], "Unique Count", DISTINCTCOUNT(SalesTable[OrderID]))
  4. Identify columns with millions of unique values.

🔴 Solution: Replace or reduce the cardinality of those columns.


Step 2: Reduce High Cardinality in Fact Tables

✅ Best Practices:

  • Remove unnecessary unique columns like OrderID, TransactionID, and Timestamp.
  • Bucketize numeric values (e.g., replace exact sales amounts with sales ranges like “Low,” “Medium,” “High”).
  • Convert detailed timestamps into Date & Time buckets.

📌 Example: Convert High-Cardinality Timestamps into Buckets
🔴 Before (High Cardinality)

Timestamp: 2023-01-15 12:45:23
Timestamp: 2023-01-15 12:46:01
Timestamp: 2023-01-15 12:46:45

After (Lower Cardinality)

Time Slot: 12:45 PM
Time Slot: 12:46 PM

📌 DAX Formula to Create Time Buckets:

TimeBucket = FORMAT([Timestamp], "hh:mm tt")

Step 3: Optimize Relationships by Avoiding High-Cardinality Joins

Use Surrogate Keys Instead of High-Cardinality Joins
🔴 Bad: Relationship on OrderID (millions of unique values).
Good: Relationship on CustomerSegment (low unique values).

📌 Example: Instead of linking FactSales[OrderID] → DimCustomer[CustomerID], use FactSales[CustomerSegment] → DimCustomer[Segment].


Step 4: Use Aggregations to Reduce High-Cardinality Columns

Aggregations precompute results and replace detailed data with summarized data, reducing high-cardinality bottlenecks.

📌 Example: Create Aggregated Table for Faster Queries

AggregatedSales = SUMMARIZE(FactSales, DimDate[Year], DimProduct[Category], "Total Sales", SUM(FactSales[SalesAmount]))

This aggregates sales at the Category level, reducing unique rows.

Benefits:
✔️ Less memory usage
✔️ Faster queries
✔️ Better model performance


Step 5: Use Grouping Instead of Individual Unique Values in Slicers

Use categories instead of unique IDs for filtering.
🔴 Bad: Using CustomerID (millions of values) in a slicer.
Good: Using CustomerRegion (5 values) in a slicer.

📌 Example:
Instead of filtering by ProductID, group by ProductCategory (Electronics, Furniture, Clothing).


🔹 Summary of Optimizing Data Cardinality in Power BI

StepOptimizationImpact
Identify high-cardinality columnsUse DAX Studio to check distinct countsFind performance bottlenecks
Reduce unnecessary columnsRemove OrderID, TransactionIDImproves compression
Bucketize numerical/timestamp dataConvert detailed timestamps into time slotsReduces unique values
Optimize relationshipsAvoid high-cardinality joins, use surrogate keysFaster model relationships
Use aggregationsPrecompute results and replace detailed rowsFaster queries, better performance
Optimize slicers and filtersUse low-cardinality fields like categoriesImproves filtering speed

🔹 Final Thoughts

🔹 High cardinality negatively impacts Power BI performance by increasing memory usage, slowing relationships, and making queries inefficient.
🔹 Optimizing cardinality helps Power BI models run faster and use less memory.
🔹 Use best practices like aggregations, relationship optimization, and reducing unnecessary columns to improve efficiency.

Would you like a sample Power BI file demonstrating these optimizations?

Leave a Reply

Your email address will not be published. Required fields are marked *