Working with Date and Time Data

Loading

Working with Date and Time Data in Pandas

Introduction

Handling date and time data is essential in data analysis, particularly when working with time-series data, scheduling, or tracking events. Pandas provides robust support for working with timestamps, date ranges, time deltas, and time zones.

In this guide, we will explore:

  • Creating and converting date-time data
  • Extracting components from date-time objects
  • Performing operations on date-time values
  • Handling missing or incorrect time data
  • Time zone handling

Step 1: Importing Required Libraries

Before working with date-time data, import pandas and numpy for efficient data manipulation.

import pandas as pd
import numpy as np

Step 2: Creating Date-Time Data

1. Creating Date-Time Objects Using pd.to_datetime()

The pd.to_datetime() function converts date strings or numerical values into proper Pandas datetime objects.

# Convert a single date string
date = pd.to_datetime("2023-10-15")
print(date)

Output:

2023-10-15 00:00:00

The default time is set to 00:00:00 if not specified.

# Convert a list of date strings
dates = pd.to_datetime(["2023-01-01", "2023-06-15", "2023-12-31"])
print(dates)

Output:

DatetimeIndex(['2023-01-01', '2023-06-15', '2023-12-31'], dtype='datetime64[ns]')

2. Creating a Date Range with pd.date_range()

The pd.date_range() function generates a sequence of dates.

# Generate a sequence of dates from Jan 1, 2023, for 10 days
date_range = pd.date_range(start="2023-01-01", periods=10, freq="D")
print(date_range)

Output:

DatetimeIndex(['2023-01-01', '2023-01-02', ..., '2023-01-10'], dtype='datetime64[ns]', freq='D')

Customizing Frequency (freq Parameter):

Frequency CodeMeaning
'D'Daily
'W'Weekly
'M'Monthly
'Y'Yearly
'H'Hourly
'T'Minute-wise

Example: Generate weekly dates:

weekly_dates = pd.date_range(start="2023-01-01", periods=5, freq="W")
print(weekly_dates)

Step 3: Extracting Date and Time Components

Pandas allows us to extract useful components from date-time objects.

# Create a sample datetime series
df = pd.DataFrame({"dates": pd.date_range(start="2023-01-01", periods=5, freq="D")})
df["year"] = df["dates"].dt.year
df["month"] = df["dates"].dt.month
df["day"] = df["dates"].dt.day
df["day_of_week"] = df["dates"].dt.day_name()
df["quarter"] = df["dates"].dt.quarter

print(df)

Output:

       dates  year  month  day day_of_week  quarter
0 2023-01-01  2023      1    1      Sunday        1
1 2023-01-02  2023      1    2      Monday        1
2 2023-01-03  2023      1    3     Tuesday        1
3 2023-01-04  2023      1    4   Wednesday        1
4 2023-01-05  2023      1    5    Thursday        1

Common Attributes:

AttributeDescription
.dt.yearExtracts the year
.dt.monthExtracts the month
.dt.dayExtracts the day
.dt.hourExtracts the hour
.dt.minuteExtracts the minute
.dt.secondExtracts the second
.dt.weekdayReturns the day of the week (0=Monday)
.dt.day_name()Returns the full weekday name

Step 4: Performing Operations on Date-Time Data

1. Calculating Time Differences (Timedelta)

The Timedelta object represents the difference between two dates.

df["next_day"] = df["dates"] + pd.Timedelta(days=1)
df["time_diff"] = df["next_day"] - df["dates"]
print(df)

Output:

       dates next_day time_diff
0 2023-01-01 2023-01-02 1 days
1 2023-01-02 2023-01-03 1 days

Adding/Subtracting Time Intervals:

df["prev_week"] = df["dates"] - pd.Timedelta(weeks=1)
print(df)

2. Filtering Data Based on Date Ranges

# Filter rows where the date is after "2023-01-03"
filtered_df = df[df["dates"] > "2023-01-03"]
print(filtered_df)

Other Conditions:

# Get all rows in January 2023
df[df["dates"].dt.month == 1]

# Get all rows from weekends
df[df["dates"].dt.weekday >= 5]

Step 5: Handling Missing or Incorrect Dates

1. Detecting Missing Dates

df.loc[2, "dates"] = np.nan  # Introduce NaN
print(df["dates"].isna().sum())  # Count NaNs

2. Filling Missing Dates

df["dates"] = df["dates"].fillna(pd.to_datetime("2023-01-01"))

3. Removing Invalid Dates

df.dropna(subset=["dates"], inplace=True)

Step 6: Working with Time Zones

By default, Pandas uses naive timestamps (no timezone). To assign a timezone:

df["dates"] = df["dates"].dt.tz_localize("UTC")
print(df["dates"])

Convert to Different Timezones:

df["dates_est"] = df["dates"].dt.tz_convert("US/Eastern")
print(df["dates_est"])

Leave a Reply

Your email address will not be published. Required fields are marked *