KeyError: column not found in DataFrame

Loading

In Pandas, a KeyError: column not found in DataFrame occurs when you try to access a column that does not exist in a DataFrame. This error commonly happens due to:

  1. Misspelled column names
  2. Incorrect column case sensitivity
  3. Columns not being available at the time of access
  4. Whitespace issues in column names
  5. Incorrect data type for column selection

1. Understanding the Error with an Example

Incorrect Code:

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)

# Trying to access a non-existent column
print(df['age']) # ❌ KeyError: 'age' not found

Error Output:

KeyError: 'age'

Problem:

  • The actual column name is “Age” (with an uppercase “A”).
  • Pandas is case-sensitive, so "age" is not recognized.

Fix: Use the Correct Column Name

print(df['Age'])  #  Correct

2. Common Causes of “KeyError: Column Not Found”

1️⃣ Misspelled Column Names

Incorrect Code:

print(df['Nam'])  # ❌ KeyError: 'Nam'

Problem:

  • The correct column name is “Name”, not “Nam”.

Fix: Check Column Names

print(df.columns)  #  Output: Index(['Name', 'Age'], dtype='object')

This will list all available column names.


2️⃣ Case Sensitivity Issues

Incorrect Code:

print(df['age'])  #  KeyError: 'age'

Problem:

  • "Age" is capitalized in the DataFrame, but "age" is not.

Fix: Ensure Correct Case

pythonCopyEditprint(df['Age'])  #  Correct

OR
Convert column names to lowercase for consistency:

df.columns = df.columns.str.lower()
print(df['age']) # Now works

3️⃣ Column Name Contains Extra Spaces

Incorrect Code:

data = {' Name ': ['Alice', 'Bob'], ' Age ': [25, 30]}  # Columns have spaces
df = pd.DataFrame(data)

print(df['Name']) # ❌ KeyError: 'Name'

Problem:

  • The column actually contains spaces (' Name ' instead of 'Name').

Fix: Remove Extra Spaces

df.columns = df.columns.str.strip()
print(df['Name']) # ✅ Now it works

4️⃣ Column Not Available at the Time of Access

Incorrect Code:

df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]})
df.drop(columns=['Age']) # ❌ Removes the 'Age' column

print(df['Age']) # ❌ KeyError: 'Age'

🚨 Problem:

  • The column "Age" was removed using drop(), so it no longer exists.

Fix: Check If Column Exists Before Accessing

if 'Age' in df.columns:
print(df['Age'])
else:
print("Column 'Age' not found!")

5️⃣ Using the Wrong Data Type for Column Selection

Incorrect Code:

print(df[25])  # ❌ KeyError: 25

Problem:

  • Trying to access a column using an integer, but column names are strings.

Fix: Use String Column Names

print(df['Age'])  # ✅ Works correctly

3. Best Practices to Avoid “KeyError”

1. Always Check Available Columns

Before accessing a column, check if it exists:

print(df.columns)

2. Use get() to Avoid KeyErrors

Instead of df['column'], use:

df.get('column', 'Column not found!')

This prevents errors if the column does not exist.


3. Convert Column Names to a Standard Format

To avoid case and whitespace issues:

df.columns = df.columns.str.lower().str.strip()

Now, df['age'] and df['Age'] will work the same way.


4. Verify Columns Before Dropping or Modifying

if 'Age' in df.columns:
df.drop(columns=['Age'], inplace=True)
else:
print("Column 'Age' does not exist!")

4. Summary Table

MistakeProblemFix
Misspelled column name"Nam" instead of "Name"Check df.columns
Case sensitivity"age" vs. "Age"Use df.columns = df.columns.str.lower()
Extra spaces in column names" Name " instead of "Name"Use df.columns = df.columns.str.strip()
Column was dropped or missing"Age" was removedUse df.get('Age', 'Column not found')
Using wrong data typedf[25] when expecting df['Age']Use correct data type

Leave a Reply

Your email address will not be published. Required fields are marked *