Working with Date and Time Data in Pandas
Introduction
Handling date and time data is essential in data analysis, particularly when working with time-series data, scheduling, or tracking events. Pandas provides robust support for working with timestamps, date ranges, time deltas, and time zones.
In this guide, we will explore:
- Creating and converting date-time data
- Extracting components from date-time objects
- Performing operations on date-time values
- Handling missing or incorrect time data
- Time zone handling
Step 1: Importing Required Libraries
Before working with date-time data, import pandas and numpy for efficient data manipulation.
import pandas as pd
import numpy as np
Step 2: Creating Date-Time Data
1. Creating Date-Time Objects Using pd.to_datetime()
The pd.to_datetime()
function converts date strings or numerical values into proper Pandas datetime objects.
# Convert a single date string
date = pd.to_datetime("2023-10-15")
print(date)
Output:
2023-10-15 00:00:00
The default time is set to 00:00:00 if not specified.
# Convert a list of date strings
dates = pd.to_datetime(["2023-01-01", "2023-06-15", "2023-12-31"])
print(dates)
Output:
DatetimeIndex(['2023-01-01', '2023-06-15', '2023-12-31'], dtype='datetime64[ns]')
2. Creating a Date Range with pd.date_range()
The pd.date_range()
function generates a sequence of dates.
# Generate a sequence of dates from Jan 1, 2023, for 10 days
date_range = pd.date_range(start="2023-01-01", periods=10, freq="D")
print(date_range)
Output:
DatetimeIndex(['2023-01-01', '2023-01-02', ..., '2023-01-10'], dtype='datetime64[ns]', freq='D')
✅ Customizing Frequency (freq
Parameter):
Frequency Code | Meaning |
---|---|
'D' | Daily |
'W' | Weekly |
'M' | Monthly |
'Y' | Yearly |
'H' | Hourly |
'T' | Minute-wise |
Example: Generate weekly dates:
weekly_dates = pd.date_range(start="2023-01-01", periods=5, freq="W")
print(weekly_dates)
Step 3: Extracting Date and Time Components
Pandas allows us to extract useful components from date-time objects.
# Create a sample datetime series
df = pd.DataFrame({"dates": pd.date_range(start="2023-01-01", periods=5, freq="D")})
df["year"] = df["dates"].dt.year
df["month"] = df["dates"].dt.month
df["day"] = df["dates"].dt.day
df["day_of_week"] = df["dates"].dt.day_name()
df["quarter"] = df["dates"].dt.quarter
print(df)
Output:
dates year month day day_of_week quarter
0 2023-01-01 2023 1 1 Sunday 1
1 2023-01-02 2023 1 2 Monday 1
2 2023-01-03 2023 1 3 Tuesday 1
3 2023-01-04 2023 1 4 Wednesday 1
4 2023-01-05 2023 1 5 Thursday 1
✅ Common Attributes:
Attribute | Description |
---|---|
.dt.year | Extracts the year |
.dt.month | Extracts the month |
.dt.day | Extracts the day |
.dt.hour | Extracts the hour |
.dt.minute | Extracts the minute |
.dt.second | Extracts the second |
.dt.weekday | Returns the day of the week (0=Monday) |
.dt.day_name() | Returns the full weekday name |
Step 4: Performing Operations on Date-Time Data
1. Calculating Time Differences (Timedelta
)
The Timedelta
object represents the difference between two dates.
df["next_day"] = df["dates"] + pd.Timedelta(days=1)
df["time_diff"] = df["next_day"] - df["dates"]
print(df)
Output:
dates next_day time_diff
0 2023-01-01 2023-01-02 1 days
1 2023-01-02 2023-01-03 1 days
✅ Adding/Subtracting Time Intervals:
df["prev_week"] = df["dates"] - pd.Timedelta(weeks=1)
print(df)
2. Filtering Data Based on Date Ranges
# Filter rows where the date is after "2023-01-03"
filtered_df = df[df["dates"] > "2023-01-03"]
print(filtered_df)
✅ Other Conditions:
# Get all rows in January 2023
df[df["dates"].dt.month == 1]
# Get all rows from weekends
df[df["dates"].dt.weekday >= 5]
Step 5: Handling Missing or Incorrect Dates
1. Detecting Missing Dates
df.loc[2, "dates"] = np.nan # Introduce NaN
print(df["dates"].isna().sum()) # Count NaNs
2. Filling Missing Dates
df["dates"] = df["dates"].fillna(pd.to_datetime("2023-01-01"))
3. Removing Invalid Dates
df.dropna(subset=["dates"], inplace=True)
Step 6: Working with Time Zones
By default, Pandas uses naive timestamps (no timezone). To assign a timezone:
df["dates"] = df["dates"].dt.tz_localize("UTC")
print(df["dates"])
✅ Convert to Different Timezones:
df["dates_est"] = df["dates"].dt.tz_convert("US/Eastern")
print(df["dates_est"])