Data Cardinality and Its Impact on Performance in Power BI – A Comprehensive Guide
Introduction:
Data cardinality plays a crucial role in Power BI performance optimization. Cardinality refers to the uniqueness of values in a column, which directly affects how Power BI compresses data, builds relationships, and processes queries.
This guide will cover:
✅ What is Data Cardinality?
✅ Types of Cardinality
✅ How Cardinality Affects Performance
✅ Best Practices to Optimize Cardinality in Power BI
✅ Step-by-Step Optimization Techniques
🔹 What is Data Cardinality?
In Power BI, cardinality refers to the number of unique values in a column. It impacts data storage, relationships, and DAX calculations.
For example:
- A column with 100% unique values (e.g.,
OrderID
) has high cardinality. - A column with few distinct values (e.g.,
OrderStatus
with “Pending,” “Shipped,” “Delivered”) has low cardinality.
Why is Cardinality Important?
Power BI compresses data to improve performance. Columns with low cardinality compress better than those with high cardinality, leading to faster queries and smaller file sizes.
🔹 Types of Data Cardinality
There are three main types of cardinality in Power BI:
Cardinality Type | Description | Example | Impact on Performance |
---|---|---|---|
High Cardinality | A column has a large number of unique values | OrderID (unique for every order) | Poor compression, slow queries |
Medium Cardinality | A column has moderate distinct values | Product Category (e.g., Electronics, Furniture, Clothing) | Moderate compression, decent performance |
Low Cardinality | A column has few distinct values | Order Status (Pending, Shipped, Delivered) | Best compression, fast performance |
🔹 How Data Cardinality Affects Performance in Power BI
1️⃣ High Cardinality Increases Memory Usage
Power BI compresses data using columnar storage. More unique values = less compression.
🔴 Example: A column with 10 million unique customer IDs takes more space than a column with only 5 customer categories.
2️⃣ High Cardinality Slows Down Relationships
Power BI optimizes joins using indexed lookups. If you have high-cardinality columns in relationships, Power BI takes longer to search and match records.
🔴 Example: Creating relationships on OrderID (millions of unique values) is slower than using CustomerSegment (10 values).
3️⃣ DAX Queries Slow Down with High-Cardinality Columns
DAX formulas process data column by column. If a measure involves high-cardinality columns, calculations become slower.
🔴 Example:
Total Sales = SUMX(SalesTable, SalesTable[Revenue])
If SalesTable
has millions of unique rows, Power BI takes longer to sum the values.
4️⃣ Filtering and Slicers Perform Better with Low Cardinality
When using a slicer or filter, Power BI creates internal index lookups. Columns with low cardinality make filtering faster.
✅ Best Practice: Use ProductCategory instead of ProductID in slicers for better performance.
🔹 Step-by-Step: Optimizing Data Cardinality in Power BI
Step 1: Identify High-Cardinality Columns
Use Power BI’s VertiPaq Analyzer to check column distinct values and compression.
📌 How to do it?
- Download DAX Studio (a free tool).
- Open your Power BI model and launch DAX Studio.
- Run:
EVALUATE SUMMARIZECOLUMNS(SalesTable[OrderID], "Unique Count", DISTINCTCOUNT(SalesTable[OrderID]))
- Identify columns with millions of unique values.
🔴 Solution: Replace or reduce the cardinality of those columns.
Step 2: Reduce High Cardinality in Fact Tables
✅ Best Practices:
- Remove unnecessary unique columns like
OrderID
,TransactionID
, andTimestamp
. - Bucketize numeric values (e.g., replace exact sales amounts with sales ranges like “Low,” “Medium,” “High”).
- Convert detailed timestamps into Date & Time buckets.
📌 Example: Convert High-Cardinality Timestamps into Buckets
🔴 Before (High Cardinality)
Timestamp: 2023-01-15 12:45:23
Timestamp: 2023-01-15 12:46:01
Timestamp: 2023-01-15 12:46:45
✅ After (Lower Cardinality)
Time Slot: 12:45 PM
Time Slot: 12:46 PM
📌 DAX Formula to Create Time Buckets:
TimeBucket = FORMAT([Timestamp], "hh:mm tt")
Step 3: Optimize Relationships by Avoiding High-Cardinality Joins
✅ Use Surrogate Keys Instead of High-Cardinality Joins
🔴 Bad: Relationship on OrderID
(millions of unique values).
✅ Good: Relationship on CustomerSegment
(low unique values).
📌 Example: Instead of linking FactSales[OrderID] → DimCustomer[CustomerID], use FactSales[CustomerSegment] → DimCustomer[Segment].
Step 4: Use Aggregations to Reduce High-Cardinality Columns
Aggregations precompute results and replace detailed data with summarized data, reducing high-cardinality bottlenecks.
📌 Example: Create Aggregated Table for Faster Queries
AggregatedSales = SUMMARIZE(FactSales, DimDate[Year], DimProduct[Category], "Total Sales", SUM(FactSales[SalesAmount]))
This aggregates sales at the Category level, reducing unique rows.
✅ Benefits:
✔️ Less memory usage
✔️ Faster queries
✔️ Better model performance
Step 5: Use Grouping Instead of Individual Unique Values in Slicers
✅ Use categories instead of unique IDs for filtering.
🔴 Bad: Using CustomerID
(millions of values) in a slicer.
✅ Good: Using CustomerRegion
(5 values) in a slicer.
📌 Example:
Instead of filtering by ProductID, group by ProductCategory (Electronics
, Furniture
, Clothing
).
🔹 Summary of Optimizing Data Cardinality in Power BI
Step | Optimization | Impact |
---|---|---|
Identify high-cardinality columns | Use DAX Studio to check distinct counts | Find performance bottlenecks |
Reduce unnecessary columns | Remove OrderID, TransactionID | Improves compression |
Bucketize numerical/timestamp data | Convert detailed timestamps into time slots | Reduces unique values |
Optimize relationships | Avoid high-cardinality joins, use surrogate keys | Faster model relationships |
Use aggregations | Precompute results and replace detailed rows | Faster queries, better performance |
Optimize slicers and filters | Use low-cardinality fields like categories | Improves filtering speed |
🔹 Final Thoughts
🔹 High cardinality negatively impacts Power BI performance by increasing memory usage, slowing relationships, and making queries inefficient.
🔹 Optimizing cardinality helps Power BI models run faster and use less memory.
🔹 Use best practices like aggregations, relationship optimization, and reducing unnecessary columns to improve efficiency.
Would you like a sample Power BI file demonstrating these optimizations?