In Python, data classes provide a simple way to define classes that primarily store data, reducing boilerplate code for common tasks like initializing objects, comparing instances, and printing readable representations.
Python’s built-in dataclasses
module (introduced in Python 3.7) offers the @dataclass
decorator, which automatically generates methods like:
__init__()
→ Auto-generates an initializer
__repr__()
→ Creates a readable string representation
__eq__()
→ Enables equality comparison
__hash__()
→ (Optional) Allows hashability for use in sets and dictionaries
Why Use Data Classes?
- Less boilerplate → No need to manually define
__init__()
,__repr__()
, etc. - Readability → Code is cleaner and self-explanatory.
- Automatic comparison → Built-in
__eq__()
for instance comparisons. - Immutable options → Can create frozen (read-only) objects.
1. Creating a Simple Data Class
The @dataclass
decorator is used to define a data class.
Example: Basic Data Class
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
p1 = Person("Alice", 30)
p2 = Person("Bob", 25)
print(p1) # Output: Person(name='Alice', age=30)
print(p1 == p2) # Output: False (compares values automatically)
What happens here?
@dataclass
automatically generates__init__()
,__repr__()
, and__eq__()
p1 == p2
works without manually defining__eq__()
2. Adding Default Values
You can set default values using the =
operator.
Example: Default Values
@dataclass
class Employee:
name: str
salary: float = 50000 # Default value
e1 = Employee("John")
e2 = Employee("Doe", 70000)
print(e1) # Output: Employee(name='John', salary=50000)
print(e2) # Output: Employee(name='Doe', salary=70000)
Why use this?
- Allows optional attributes without explicitly passing values.
3. Using field()
for Default Values with dataclasses.field()
For default values requiring function calls (e.g., list
, dict
, datetime
), use field(default_factory=...)
.
Example: Using default_factory
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class Task:
title: str
created_at: datetime = field(default_factory=datetime.now) # Auto-assign current time
t1 = Task("Buy groceries")
t2 = Task("Read a book")
print(t1) # Output: Task(title='Buy groceries', created_at=2025-03-10 12:34:56)
print(t2) # Different timestamp
Why use default_factory
?
- Prevents using mutable defaults (
[]
,{}
) that could cause shared state issues.
4. Making Data Classes Immutable (frozen=True
)
Setting frozen=True
makes the class immutable (read-only).
Example: Immutable Data Class
@dataclass(frozen=True)
class Point:
x: int
y: int
p = Point(5, 10)
print(p.x) # Output: 5
p.x = 20 # Error: Cannot modify frozen dataclass
Why use frozen=True
?
- Prevents accidental modifications.
- Useful for hashable objects (e.g., dictionary keys).
5. Controlling __repr__()
, __eq__()
, and __hash__()
By default, dataclass
generates these methods. You can disable them if needed.
Example: Customizing @dataclass
Behavior
@dataclass(repr=False, eq=False)
class Car:
brand: str
model: str
c1 = Car("Toyota", "Camry")
c2 = Car("Toyota", "Camry")
print(c1) # Output: <__main__.Car object at 0x...> (No auto `__repr__()`)
print(c1 == c2) # Output: False (No `__eq__()` defined)
Why use this?
- Helps when customizing class behavior or avoiding unintended comparisons.
6. Sorting Data Classes with order=True
Setting order=True
enables sorting (<
, >
, <=
, >=
operators).
Example: Sorting Objects
@dataclass(order=True)
class Student:
grade: int
name: str # Sorting is based on the first field (grade)
s1 = Student(90, "Alice")
s2 = Student(85, "Bob")
print(s1 > s2) # Output: True (90 > 85)
Why use order=True
?
- Enables sorting of objects without manually defining comparison methods.
7. Converting Data Classes to Dictionaries (asdict()
)
The dataclasses.asdict()
function converts a data class instance into a dictionary.
Example: Converting to Dictionary
from dataclasses import asdict
@dataclass
class Product:
name: str
price: float
p = Product("Laptop", 999.99)
print(asdict(p)) # Output: {'name': 'Laptop', 'price': 999.99}
Why use this?
- Useful for JSON serialization or working with APIs.
8. Inheriting from Data Classes
Data classes support inheritance.
Example: Inheriting from Another Data Class
@dataclass
class Animal:
species: str
@dataclass
class Dog(Animal):
breed: str
age: int
d = Dog("Mammal", "Labrador", 5)
print(d) # Output: Dog(species='Mammal', breed='Labrador', age=5)
Why use this?
- Allows hierarchical organization of related data.
9. Comparing dataclasses
vs Regular Classes
Feature | Regular Class | Data Class |
---|---|---|
Boilerplate Code | More (manual __init__ , __repr__ , etc.) | Less (auto-generated) |
Equality Comparison (== ) | Based on object identity | Based on field values |
Hashability (hash() ) | Requires manual implementation | Automatic (if frozen=True ) |
Sorting | Requires defining __lt__ , __gt__ , etc. | Auto-generated (order=True ) |
Conversion to Dict | Manual implementation needed | asdict() available |
10. When to Use Data Classes?
Use dataclasses
when:
- You need lightweight classes for storing data.
- You want automatic comparison and printing (
__eq__()
,__repr__()
). - You need immutability (
frozen=True
). - You work with JSON, APIs, or databases (use
asdict()
).
Avoid dataclasses
if:
- You need custom behavior beyond just storing data.
- You require complex inheritance and method overrides.
- You are using older Python versions (<3.7) (Use
namedtuple
instead).