Data Classes (dataclasses module)

Loading

In Python, data classes provide a simple way to define classes that primarily store data, reducing boilerplate code for common tasks like initializing objects, comparing instances, and printing readable representations.

Python’s built-in dataclasses module (introduced in Python 3.7) offers the @dataclass decorator, which automatically generates methods like:
__init__() → Auto-generates an initializer
__repr__() → Creates a readable string representation
__eq__() → Enables equality comparison
__hash__() → (Optional) Allows hashability for use in sets and dictionaries

Why Use Data Classes?

  • Less boilerplate → No need to manually define __init__(), __repr__(), etc.
  • Readability → Code is cleaner and self-explanatory.
  • Automatic comparison → Built-in __eq__() for instance comparisons.
  • Immutable options → Can create frozen (read-only) objects.

1. Creating a Simple Data Class

The @dataclass decorator is used to define a data class.

Example: Basic Data Class

from dataclasses import dataclass

@dataclass
class Person:
name: str
age: int

p1 = Person("Alice", 30)
p2 = Person("Bob", 25)

print(p1) # Output: Person(name='Alice', age=30)
print(p1 == p2) # Output: False (compares values automatically)

What happens here?

  • @dataclass automatically generates __init__(), __repr__(), and __eq__()
  • p1 == p2 works without manually defining __eq__()

2. Adding Default Values

You can set default values using the = operator.

Example: Default Values

@dataclass
class Employee:
name: str
salary: float = 50000 # Default value

e1 = Employee("John")
e2 = Employee("Doe", 70000)

print(e1) # Output: Employee(name='John', salary=50000)
print(e2) # Output: Employee(name='Doe', salary=70000)

Why use this?

  • Allows optional attributes without explicitly passing values.

3. Using field() for Default Values with dataclasses.field()

For default values requiring function calls (e.g., list, dict, datetime), use field(default_factory=...).

Example: Using default_factory

from dataclasses import dataclass, field
from datetime import datetime

@dataclass
class Task:
title: str
created_at: datetime = field(default_factory=datetime.now) # Auto-assign current time

t1 = Task("Buy groceries")
t2 = Task("Read a book")

print(t1) # Output: Task(title='Buy groceries', created_at=2025-03-10 12:34:56)
print(t2) # Different timestamp

Why use default_factory?

  • Prevents using mutable defaults ([], {}) that could cause shared state issues.

4. Making Data Classes Immutable (frozen=True)

Setting frozen=True makes the class immutable (read-only).

Example: Immutable Data Class

@dataclass(frozen=True)
class Point:
x: int
y: int

p = Point(5, 10)
print(p.x) # Output: 5

p.x = 20 # Error: Cannot modify frozen dataclass

Why use frozen=True?

  • Prevents accidental modifications.
  • Useful for hashable objects (e.g., dictionary keys).

5. Controlling __repr__(), __eq__(), and __hash__()

By default, dataclass generates these methods. You can disable them if needed.

Example: Customizing @dataclass Behavior

@dataclass(repr=False, eq=False)
class Car:
brand: str
model: str

c1 = Car("Toyota", "Camry")
c2 = Car("Toyota", "Camry")

print(c1) # Output: <__main__.Car object at 0x...> (No auto `__repr__()`)
print(c1 == c2) # Output: False (No `__eq__()` defined)

Why use this?

  • Helps when customizing class behavior or avoiding unintended comparisons.

6. Sorting Data Classes with order=True

Setting order=True enables sorting (<, >, <=, >= operators).

Example: Sorting Objects

@dataclass(order=True)
class Student:
grade: int
name: str # Sorting is based on the first field (grade)

s1 = Student(90, "Alice")
s2 = Student(85, "Bob")

print(s1 > s2) # Output: True (90 > 85)

Why use order=True?

  • Enables sorting of objects without manually defining comparison methods.

7. Converting Data Classes to Dictionaries (asdict())

The dataclasses.asdict() function converts a data class instance into a dictionary.

Example: Converting to Dictionary

from dataclasses import asdict

@dataclass
class Product:
name: str
price: float

p = Product("Laptop", 999.99)
print(asdict(p)) # Output: {'name': 'Laptop', 'price': 999.99}

Why use this?

  • Useful for JSON serialization or working with APIs.

8. Inheriting from Data Classes

Data classes support inheritance.

Example: Inheriting from Another Data Class

@dataclass
class Animal:
species: str

@dataclass
class Dog(Animal):
breed: str
age: int

d = Dog("Mammal", "Labrador", 5)
print(d) # Output: Dog(species='Mammal', breed='Labrador', age=5)

Why use this?

  • Allows hierarchical organization of related data.

9. Comparing dataclasses vs Regular Classes

FeatureRegular ClassData Class
Boilerplate CodeMore (manual __init__, __repr__, etc.)Less (auto-generated)
Equality Comparison (==)Based on object identityBased on field values
Hashability (hash())Requires manual implementationAutomatic (if frozen=True)
SortingRequires defining __lt__, __gt__, etc.Auto-generated (order=True)
Conversion to DictManual implementation neededasdict() available

10. When to Use Data Classes?

Use dataclasses when:

  • You need lightweight classes for storing data.
  • You want automatic comparison and printing (__eq__(), __repr__()).
  • You need immutability (frozen=True).
  • You work with JSON, APIs, or databases (use asdict()).

Avoid dataclasses if:

  • You need custom behavior beyond just storing data.
  • You require complex inheritance and method overrides.
  • You are using older Python versions (<3.7) (Use namedtuple instead).

Leave a Reply

Your email address will not be published. Required fields are marked *