Data Visualization with Seaborn – A Comprehensive Guide
1. Introduction to Data Visualization & Seaborn
A. What is Data Visualization?
Data visualization is the graphical representation of data that helps in:
✅ Identifying trends & patterns
✅ Communicating insights effectively
✅ Comparing different data points
✅ Enhancing decision-making
B. What is Seaborn?
Seaborn is a Python library for statistical data visualization built on top of Matplotlib. It provides:
✔ High-level interface for complex visualizations
✔ Beautiful and customizable graphs
✔ Better aesthetics than Matplotlib
✔ Integration with Pandas & NumPy
📌 Why Use Seaborn?
✔ Easier to use than Matplotlib
✔ Attractive default styles
✔ Built-in themes for better presentation
✔ Handles categorical data better
✔ Provides complex visualizations in a simple way
2. Installing and Importing Seaborn
A. Install Seaborn
pip install seaborn
B. Import Seaborn and Other Required Libraries
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
3. Basic Seaborn Plots
A. Load Sample Dataset
Seaborn provides built-in datasets for practice.
# Load Seaborn's built-in dataset
tips = sns.load_dataset("tips")
tips.head()
📌 Common Datasets in Seaborn:
tips
→ Restaurant bill tipsiris
→ Flower measurementspenguins
→ Penguin species datadiamonds
→ Diamond pricing
B. Scatter Plot (for Relationships Between Variables)
sns.scatterplot(x="total_bill", y="tip", data=tips, hue="sex", style="smoker", size="size")
plt.title("Scatter Plot of Total Bill vs Tip")
plt.show()
📌 Best Use Case:
✔ Visualizing relationships between numerical variables
C. Line Plot (for Trends Over Time)
sns.lineplot(x="day", y="total_bill", data=tips, hue="sex", marker="o")
plt.title("Line Plot of Total Bill Over Days")
plt.show()
📌 Best Use Case:
✔ Time series analysis
D. Bar Plot (for Categorical Data Comparison)
sns.barplot(x="day", y="total_bill", data=tips, hue="sex", estimator=np.mean)
plt.title("Average Total Bill per Day")
plt.show()
📌 Best Use Case:
✔ Comparing categories based on numerical values
E. Histogram & KDE Plot (for Data Distribution)
sns.histplot(tips["total_bill"], bins=20, kde=True, color="blue")
plt.title("Histogram of Total Bill Amount")
plt.show()
📌 Best Use Case:
✔ Understanding data distribution & frequency
F. Box Plot (for Outlier Detection)
sns.boxplot(x="day", y="total_bill", data=tips, hue="sex")
plt.title("Box Plot of Total Bill by Day")
plt.show()
📌 Best Use Case:
✔ Identifying outliers & spread of data
G. Violin Plot (for Distribution & Density)
sns.violinplot(x="day", y="total_bill", data=tips, hue="sex", split=True)
plt.title("Violin Plot of Total Bill by Day")
plt.show()
📌 Best Use Case:
✔ Combining box plot & KDE plot for better insights
H. Pair Plot (for Multi-Variable Relationships)
sns.pairplot(tips, hue="sex")
plt.show()
📌 Best Use Case:
✔ Analyzing relationships between multiple variables
4. Advanced Seaborn Customizations
A. Setting Themes
sns.set_theme(style="darkgrid")
📌 Themes Available:
"darkgrid"
"whitegrid"
"dark"
"white"
"ticks"
B. Customizing Colors
sns.barplot(x="day", y="total_bill", data=tips, palette="coolwarm")
📌 Popular Color Palettes:
"coolwarm"
"Blues"
"Reds"
"magma"
"viridis"
C. Adding Titles & Labels
plt.title("Customized Seaborn Plot", fontsize=15, fontweight="bold")
plt.xlabel("X-axis Label", fontsize=12)
plt.ylabel("Y-axis Label", fontsize=12)
D. Using FacetGrid (Multiple Plots in One Figure)
g = sns.FacetGrid(tips, col="sex", row="smoker", margin_titles=True)
g.map_dataframe(sns.scatterplot, x="total_bill", y="tip")
plt.show()
📌 Best Use Case:
✔ Creating multiple related plots
5. Heatmaps (for Correlation & Relationships)
# Compute Correlation Matrix
corr_matrix = tips.corr()
# Create Heatmap
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", linewidths=0.5)
plt.title("Correlation Heatmap")
plt.show()
📌 Best Use Case:
✔ Finding relationships between multiple numerical variables
6. Saving Seaborn Plots
plt.savefig("seaborn_plot.png", dpi=300, bbox_inches="tight")
7. Summary
✔ Seaborn is a high-level Python library for data visualization
✔ Supports scatter, line, bar, histogram, box, violin, and pair plots
✔ Offers built-in themes and color palettes
✔ Allows advanced customizations with FacetGrid & Heatmaps
✔ Best suited for statistical data analysis
📌 Next Steps:
✅ Explore Matplotlib for lower-level customization
✅ Use Seaborn with Pandas & NumPy
✅ Try interactive dashboards with Plotly
Need help with a project? Let me know!