Web Scraping Financial Data Using Python

Web scraping is essential for collecting real-time financial data, stock prices, cryptocurrency trends, and market news. Python provides powerful libraries such as BeautifulSoup, Selenium, Scrapy, and requests for financial data extraction.

Key Topics Covered

✔ Scraping stock prices from Yahoo Finance
✔ Extracting financial news headlines
✔ Scraping cryptocurrency data
✔ Handling JavaScript-rendered financial data
✔ Storing financial data in a database

1. Installing Required Libraries

pip install requests beautifulsoup4 selenium pandas yfinance scrapy

2. Scraping Stock Prices from Yahoo Finance

Yahoo Finance provides historical and real-time stock price data.

import requests
from bs4 import BeautifulSoup

# Define stock symbol (e.g., Apple - AAPL)
stock_symbol = "AAPL"
url = f"https://finance.yahoo.com/quote/{stock_symbol}"

# Send request
response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(response.text, 'html.parser')

# Extract stock price
stock_price = soup.find("fin-streamer", {"data-field": "regularMarketPrice"}).text
print(f"{stock_symbol} Stock Price: ${stock_price}")

✔ Helps track live stock prices

3. Scraping Financial News Headlines

Scraping news from Yahoo Finance News Section

news_url = "https://finance.yahoo.com"
response = requests.get(news_url, headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(response.text, 'html.parser')

# Extract news headlines
headlines = soup.find_all("h3", class_="Mb(5px)")

print("Latest Financial News:")
for i, headline in enumerate(headlines[:5]):  
    print(f"{i+1}. {headline.text}")

✔ Used for sentiment analysis and market trend detection

4. Scraping Cryptocurrency Prices from CoinMarketCap

Extract Bitcoin & Ethereum Prices

crypto_url = "https://coinmarketcap.com/"
response = requests.get(crypto_url, headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(response.text, 'html.parser')

# Extract top 5 cryptocurrencies
cryptos = soup.find_all("tr", class_="cmc-table-row", limit=5)

for crypto in cryptos:
    name = crypto.find("p", class_="coin-item-symbol").text
    price = crypto.find("span", class_="sc-f70bb44c-0").text
    print(f"{name}: {price}")

✔ Helps in crypto trend analysis

5. Scraping JavaScript-Rendered Financial Data Using Selenium

Some websites use JavaScript to load data dynamically.

from selenium import webdriver
from selenium.webdriver.common.by import By

# Set up Selenium WebDriver
driver = webdriver.Chrome()
driver.get("https://finance.yahoo.com/quote/AAPL")

# Extract stock price
price_element = driver.find_element(By.XPATH, '//fin-streamer[@data-field="regularMarketPrice"]')
print(f"Apple Stock Price: ${price_element.text}")

driver.quit()

✔ Useful for scraping financial data that loads via AJAX

6. Scraping Historical Stock Data Using `yfinance`

Fetch historical data for AAPL stock

import yfinance as yf

# Download historical data
df = yf.download("AAPL", start="2023-01-01", end="2024-01-01")
print(df.head())

✔ Used for trend analysis & backtesting strategies

7. Storing Scraped Data in a Database

Save stock prices into an SQLite database

import sqlite3

# Connect to database
conn = sqlite3.connect("finance.db")
cursor = conn.cursor()

# Create table
cursor.execute("CREATE TABLE IF NOT EXISTS stock_prices (symbol TEXT, price REAL, date TEXT)")

# Insert data
cursor.execute("INSERT INTO stock_prices VALUES (?, ?, datetime('now'))", ("AAPL", stock_price))
conn.commit()
conn.close()

✔ Useful for building financial dashboards

8. Automating Data Extraction with `Scrapy`

Create a Scrapy spider for financial data

import scrapy

class FinanceSpider(scrapy.Spider):
    name = "finance"
    start_urls = ["https://finance.yahoo.com"]

    def parse(self, response):
        for headline in response.css("h3.Mb(5px)"):
            yield {"headline": headline.css("a::text").get()}

# Run Scrapy
# scrapy runspider finance_spider.py -o finance_news.json

✔ Automates bulk data extraction