![]()
Introduction to Python for Data Science
1. Introduction
Python is one of the most widely used programming languages in data science, machine learning, and artificial intelligence. Its simple syntax, powerful libraries, and vast community support make it the go-to language for data analysis, visualization, and predictive modeling.
✅ Why Python for Data Science?
✔ Easy to learn – Simple and readable syntax.
✔ Rich ecosystem – Extensive libraries for data manipulation, visualization, and machine learning.
✔ Scalability – Used by startups and enterprises alike.
✔ Versatility – Works for web development, automation, and deep learning.
📌 Popular Use Cases of Python in Data Science:
- Data Analysis & Exploration
- Machine Learning & AI
- Data Visualization
- Web Scraping
- Deep Learning
- Statistical Computing
2. Setting Up Python for Data Science
A. Installing Python
Python can be installed from the official website:
🔗 Download Python
B. Using Anaconda for Data Science
Anaconda is a Python distribution that includes pre-installed libraries for data science.
📌 Steps to Install Anaconda:
- Download Anaconda from Anaconda’s Official Site.
- Follow the installation instructions.
- Open Jupyter Notebook or Spyder to start coding.
C. Using Google Colab
Google Colab is an online Jupyter Notebook that allows running Python in the cloud without installation.
🔗 Try Google Colab
3. Python Basics for Data Science
A. Python Data Types
# Numeric Types
x = 10 # Integer
y = 3.14 # Float
# Strings
name = "Data Science"
# Boolean
is_python_easy = True
# Lists (Ordered, Mutable)
numbers = [1, 2, 3, 4, 5]
# Tuples (Ordered, Immutable)
coordinates = (10.0, 20.5)
# Dictionaries (Key-Value Pairs)
student = {"name": "Alice", "age": 25, "grade": "A"}
B. Control Flow (Loops & Conditionals)
# If-Else Condition
age = 18
if age >= 18:
print("You are eligible to vote.")
else:
print("You are not eligible.")
# For Loop
for i in range(1, 6):
print(i)
# While Loop
num = 5
while num > 0:
print(num)
num -= 1
C. Functions in Python
def add_numbers(a, b):
return a + b
result = add_numbers(10, 20)
print(result) # Output: 30
4. Python Libraries for Data Science
Python has several libraries specifically built for data science tasks.
A. NumPy (Numerical Computing)
NumPy provides powerful tools for working with numerical data and arrays.
import numpy as np
# Create an array
arr = np.array([1, 2, 3, 4, 5])
# Basic operations
print(arr.mean()) # Mean
print(arr.sum()) # Sum
print(arr.std()) # Standard Deviation
B. Pandas (Data Manipulation & Analysis)
Pandas provides DataFrames and Series to work with structured data.
import pandas as pd
# Create a DataFrame
data = {"Name": ["Alice", "Bob", "Charlie"], "Age": [25, 30, 35]}
df = pd.DataFrame(data)
print(df.head()) # Display first 5 rows
print(df.describe()) # Summary statistics
C. Matplotlib & Seaborn (Data Visualization)
Matplotlib and Seaborn are used to create charts and graphs.
import matplotlib.pyplot as plt
import seaborn as sns
# Sample Data
x = [1, 2, 3, 4, 5]
y = [10, 15, 20, 25, 30]
# Line Plot
plt.plot(x, y, marker='o')
plt.title("Simple Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
# Seaborn Histogram
sns.histplot(df["Age"], kde=True)
plt.show()
D. Scikit-Learn (Machine Learning)
Scikit-Learn provides machine learning models for classification, regression, and clustering.
from sklearn.linear_model import LinearRegression
import numpy as np
# Sample Data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([10, 20, 25, 30, 40])
# Train a Simple Linear Regression Model
model = LinearRegression()
model.fit(X, y)
# Make Predictions
predictions = model.predict(X)
print(predictions)
5. Hands-on Example: Data Analysis with Pandas
A. Load and Explore Data
df = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv")
# Display first few rows
print(df.head())
# Check data summary
print(df.info())
# Check for missing values
print(df.isnull().sum())
B. Perform Basic Data Analysis
# Compute Summary Statistics
print(df.describe())
# Group Data
print(df.groupby("sex")["total_bill"].mean())
# Count Values
print(df["day"].value_counts())
C. Data Visualization
# Boxplot of total_bill by gender
sns.boxplot(x="sex", y="total_bill", data=df)
plt.show()
# Pairplot for correlation analysis
sns.pairplot(df)
plt.show()
6. Summary & Next Steps
✔ Python is the foundation of modern data science.
✔ It provides essential libraries like NumPy, Pandas, Matplotlib, and Scikit-Learn.
✔ Using Jupyter Notebooks or Google Colab makes coding easier.
✔ Data manipulation, analysis, and visualization are key skills.
📌 Next Steps:
✅ Learn Advanced Pandas for data cleaning.
✅ Explore Machine Learning with Scikit-Learn.
✅ Work on real-world datasets (e.g., Kaggle).
✅ Learn SQL for database querying.
💡 Want hands-on projects or tutorials on a specific topic? Let me know!
