Working with Date and Time Data

Working with Date and Time Data in Pandas

Introduction

Handling date and time data is essential in data analysis, particularly when working with time-series data, scheduling, or tracking events. Pandas provides robust support for working with timestamps, date ranges, time deltas, and time zones.

In this guide, we will explore:

  • Creating and converting date-time data
  • Extracting components from date-time objects
  • Performing operations on date-time values
  • Handling missing or incorrect time data
  • Time zone handling

Step 1: Importing Required Libraries

Before working with date-time data, import pandas and numpy for efficient data manipulation.

import pandas as pd
import numpy as np

Step 2: Creating Date-Time Data

1. Creating Date-Time Objects Using pd.to_datetime()

The pd.to_datetime() function converts date strings or numerical values into proper Pandas datetime objects.

# Convert a single date string
date = pd.to_datetime("2023-10-15")
print(date)

Output:

2023-10-15 00:00:00

The default time is set to 00:00:00 if not specified.

# Convert a list of date strings
dates = pd.to_datetime(["2023-01-01", "2023-06-15", "2023-12-31"])
print(dates)

Output:

DatetimeIndex(['2023-01-01', '2023-06-15', '2023-12-31'], dtype='datetime64[ns]')

2. Creating a Date Range with pd.date_range()

The pd.date_range() function generates a sequence of dates.

# Generate a sequence of dates from Jan 1, 2023, for 10 days
date_range = pd.date_range(start="2023-01-01", periods=10, freq="D")
print(date_range)

Output:

DatetimeIndex(['2023-01-01', '2023-01-02', ..., '2023-01-10'], dtype='datetime64[ns]', freq='D')

Customizing Frequency (freq Parameter):

Frequency CodeMeaning
'D'Daily
'W'Weekly
'M'Monthly
'Y'Yearly
'H'Hourly
'T'Minute-wise

Example: Generate weekly dates:

weekly_dates = pd.date_range(start="2023-01-01", periods=5, freq="W")
print(weekly_dates)

Step 3: Extracting Date and Time Components

Pandas allows us to extract useful components from date-time objects.

# Create a sample datetime series
df = pd.DataFrame({"dates": pd.date_range(start="2023-01-01", periods=5, freq="D")})
df["year"] = df["dates"].dt.year
df["month"] = df["dates"].dt.month
df["day"] = df["dates"].dt.day
df["day_of_week"] = df["dates"].dt.day_name()
df["quarter"] = df["dates"].dt.quarter

print(df)

Output:

       dates  year  month  day day_of_week  quarter
0 2023-01-01  2023      1    1      Sunday        1
1 2023-01-02  2023      1    2      Monday        1
2 2023-01-03  2023      1    3     Tuesday        1
3 2023-01-04  2023      1    4   Wednesday        1
4 2023-01-05  2023      1    5    Thursday        1

Common Attributes:

AttributeDescription
.dt.yearExtracts the year
.dt.monthExtracts the month
.dt.dayExtracts the day
.dt.hourExtracts the hour
.dt.minuteExtracts the minute
.dt.secondExtracts the second
.dt.weekdayReturns the day of the week (0=Monday)
.dt.day_name()Returns the full weekday name

Step 4: Performing Operations on Date-Time Data

1. Calculating Time Differences (Timedelta)

The Timedelta object represents the difference between two dates.

df["next_day"] = df["dates"] + pd.Timedelta(days=1)
df["time_diff"] = df["next_day"] - df["dates"]
print(df)

Output:

       dates next_day time_diff
0 2023-01-01 2023-01-02 1 days
1 2023-01-02 2023-01-03 1 days

Adding/Subtracting Time Intervals:

df["prev_week"] = df["dates"] - pd.Timedelta(weeks=1)
print(df)

2. Filtering Data Based on Date Ranges

# Filter rows where the date is after "2023-01-03"
filtered_df = df[df["dates"] > "2023-01-03"]
print(filtered_df)

Other Conditions:

# Get all rows in January 2023
df[df["dates"].dt.month == 1]

# Get all rows from weekends
df[df["dates"].dt.weekday >= 5]

Step 5: Handling Missing or Incorrect Dates

1. Detecting Missing Dates

df.loc[2, "dates"] = np.nan  # Introduce NaN
print(df["dates"].isna().sum())  # Count NaNs

2. Filling Missing Dates

df["dates"] = df["dates"].fillna(pd.to_datetime("2023-01-01"))

3. Removing Invalid Dates

df.dropna(subset=["dates"], inplace=True)

Step 6: Working with Time Zones

By default, Pandas uses naive timestamps (no timezone). To assign a timezone:

df["dates"] = df["dates"].dt.tz_localize("UTC")
print(df["dates"])

Convert to Different Timezones:

df["dates_est"] = df["dates"].dt.tz_convert("US/Eastern")
print(df["dates_est"])

Leave a Reply

Your email address will not be published. Required fields are marked *