Working with APIs for Data Extraction

Loading

APIs (Application Programming Interfaces) allow applications to communicate and exchange data efficiently. For data extraction, APIs provide structured access to real-time and large datasets from various sources like social media, finance, weather, and e-commerce.

Why Use APIs for Data Extraction?
βœ” Faster & More Reliable than Web Scraping
βœ” Provides Structured Data (JSON/XML)
βœ” Avoids Legal & Ethical Issues
βœ” Access to Real-time & Historical Data


2. Understanding API Types

Before extracting data, it’s essential to understand the different API types:

API TypeDescriptionExample
REST APIUses HTTP methods (GET, POST, etc.) and returns JSON/XMLTwitter API, OpenWeatherMap
SOAP APIUses XML-based messagingPayment gateways, Banking APIs
GraphQL APIClient requests only specific fields neededGitHub API, Shopify API
WebSocket APIProvides real-time data streamingBinance API (crypto), Stock APIs

Most modern APIs use REST or GraphQL.


3. Setting Up API Requests in Python

3.1. Using the requests Library

Install requests if not already installed:

pip install requests

3.2. Sending a Simple API Request

Example: Fetching weather data from OpenWeatherMap API

import requests

API_KEY = "your_api_key"
url = f"https://api.openweathermap.org/data/2.5/weather?q=London&appid={API_KEY}"

response = requests.get(url)

if response.status_code == 200:
data = response.json()
print(data) # Display JSON data
else:
print("Error:", response.status_code)

πŸ”Ή Key Points:

  • Uses GET request to retrieve data
  • API key authentication is required
  • response.json() converts API response into a Python dictionary

4. Handling Authentication

Most APIs require authentication to prevent misuse.

4.1. API Key Authentication

Common method using headers or query parameters.

headers = {"Authorization": "Bearer your_api_key"}
response = requests.get("https://api.example.com/data", headers=headers)

4.2. OAuth 2.0 Authentication

Used for APIs like Twitter, GitHub, and Google APIs.
Example: Fetching GitHub user details using OAuth Token

headers = {"Authorization": "token your_oauth_token"}
response = requests.get("https://api.github.com/user", headers=headers)
print(response.json())

5. Extracting and Processing API Data

5.1. Handling JSON Responses

Most APIs return data in JSON format. Extracting specific fields:

response = requests.get("https://api.github.com/users/octocat")
data = response.json()

print("Username:", data["login"])
print("Public Repos:", data["public_repos"])

5.2. Handling Errors Gracefully

Common API Errors & Fixes

Status CodeMeaningSolution
200Success Everything is fine
400Bad Request Check request parameters
401Unauthorized Check API key/authentication
403Forbidden Insufficient permissions
429Too Many Requests Implement rate limiting
500+Server Error Try again later

Example: Handling errors in API requests

response = requests.get("https://api.example.com/data")

if response.status_code == 200:
data = response.json()
elif response.status_code == 401:
print("Unauthorized! Check your API key.")
elif response.status_code == 429:
print("Rate limit exceeded! Try again later.")
else:
print("API Error:", response.status_code)

6. API Pagination for Large Datasets

APIs limit data per request (e.g., 100 results per page).
Solution: Use pagination to fetch more data.

6.1. Handling Pagination with page Parameter

Example: Fetching multiple pages of data

import requests

API_URL = "https://api.example.com/data"
all_data = []
page = 1

while True:
response = requests.get(API_URL, params={"page": page})
data = response.json()

if not data: # Stop when no more data
break

all_data.extend(data)
page += 1 # Go to next page

print("Total items:", len(all_data))

6.2. Handling next Links in Pagination

Some APIs return a “next” URL in responses:

import requests

API_URL = "https://api.example.com/data"
data_list = []

while API_URL:
response = requests.get(API_URL)
data = response.json()
data_list.extend(data["results"])

API_URL = data.get("next") # Get next page URL

print("Total records:", len(data_list))

Efficient for large datasets!


7. Rate Limiting & Throttling

APIs limit requests to prevent overload.

7.1. Handling API Rate Limits

  • Check X-RateLimit headers in responses.
  • Add delays to prevent exceeding limits.
import time

for i in range(10): # Example loop
response = requests.get("https://api.example.com/data")

if response.status_code == 429:
print("Rate limit hit! Waiting...")
time.sleep(60) # Wait before retrying
else:
print(response.json())

8. Storing API Data in Databases

Extracted data is often stored in CSV, JSON, or databases.

8.1. Save API Data as a CSV

import csv

data = [{"id": 1, "name": "John"}, {"id": 2, "name": "Alice"}]

with open("data.csv", "w", newline="") as file:
writer = csv.DictWriter(file, fieldnames=["id", "name"])
writer.writeheader()
writer.writerows(data)

8.2. Save API Data to a Database (SQLite)

import sqlite3

conn = sqlite3.connect("data.db")
cursor = conn.cursor()

cursor.execute("CREATE TABLE IF NOT EXISTS users (id INTEGER, name TEXT)")

data = [(1, "John"), (2, "Alice")]
cursor.executemany("INSERT INTO users VALUES (?, ?)", data)

conn.commit()
conn.close()

Efficient for handling large API data!


9. Working with Real APIs (Examples)

βœ” Twitter API (Tweepy) – Fetch tweets
βœ” Google Maps API – Get locations
βœ” Alpha Vantage API – Stock market data
βœ” NASA API – Space images
βœ” OpenWeatherMap API – Weather updates

Example: Fetching NASA’s Astronomy Picture of the Day

API_KEY = "DEMO_KEY"
url = f"https://api.nasa.gov/planetary/apod?api_key={API_KEY}"

response = requests.get(url)
data = response.json()
print("Title:", data["title"])
print("URL:", data["url"])

Leave a Reply

Your email address will not be published. Required fields are marked *