Stock Prediction and Anomaly Detection 📈
The goal of this project is to get familiar with financial data, review basic time series forecasting and anomaly detection. The project’s main objectives— prediction stock price trends and identifying unusual price movements— are easy to understand and measure. I will also create a simple liquidity score based on trading volume and price volatility, which will give me insight into how liquidity affects price stability.
We need the following imports:
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
The first section is data collection and preparation. I use the Yahoo finance API to gather daily stock price and volume data for MSFT (microsoft) over a two year period.
Part 1. Data Collection and Preparation
# fetch historical data
ticker = yf.Ticker("MSFT")
# print(ticker.info)
data = ticker.history(period = "2y")
Here is an example of what the data looks like:
Open High Low Close Volume Dividends Stock Splits
Date
2022-11-08 00:00:00-05:00 224.824170 227.724173 222.012638 224.991287 28192500 0.0 0.0
2022-11-09 00:00:00-05:00 223.516686 224.755342 220.528212 220.705154 27852900 0.0 0.0
2022-11-10 00:00:00-05:00 231.440116 239.206242 231.017410 238.862167 46268000 0.0 0.0
2022-11-11 00:00:00-05:00 238.872022 243.787287 237.829974 242.922195 34620200 0.0 0.0
2022-11-14 00:00:00-05:00 237.888951 239.776411 235.156066 237.456406 31123300 0.0 0.0
The next step is to do some exploratory data analysis where I will plot the stock prices, calculate the moving averages, and calculate price volatility.
Moving averages are used to analyze data points over a certain time period in order to smooth out short term fluctuations and highlight longer term trends or cycles.
# calculate 20 day and 50 day moving averages
data["20_MA"] = data["Close"].rolling(window=20).mean()
data["50_MA"] = data["Close"].rolling(window=50).mean()
print(data.tail())
Here is what the tail end of the data looks like:
Open High Low Close Volume Dividends Stock Splits 20_MA 50_MA
Date
2024-11-04 00:00:00-05:00 409.799988 410.420013 405.570007 408.459991 19672300 0.0 0.0 419.661501 420.520801
2024-11-05 00:00:00-05:00 408.369995 414.899994 408.079987 411.459991 17626000 0.0 0.0 419.499001 420.480201
2024-11-06 00:00:00-05:00 412.420013 420.450012 410.519989 420.179993 26681800 0.0 0.0 419.635001 420.607001
2024-11-07 00:00:00-05:00 421.279999 426.850006 419.880005 425.429993 19862800 0.0 0.0 420.114500 420.903600
2024-11-08 00:00:00-05:00 425.395996 426.500000 423.058197 423.663391 5727178 0.0 0.0 420.481670 421.114468
Part 2. Visualize Stock Prices and Moving Averages
plt.figure(figsize=(14,7))
plt.plot(data["Close"],label = "Close Price")
plt.plot(data["20_MA"],label = "20 day MA", linestyle = "--")
plt.plot(data["50_MA"],label = "50 day MA", linestyle = "--")
plt.title(f"{ticker} Stock Price and Moving Averages")
plt.xlabel("Date")
plt.ylabel("Price")
plt.legend()
# plt.show()
Plotting the moving averages helps to understand general trends and smooth out noise.
Part 3. Simple Anonomly Detection
I will define an anomaly as a day when the stock prices deviates significantly from the 20-day moving average. Any price that is more than two standard deviations away is flagged as an anomaly.
# calculate the rolling standard deviation
data["20_STD"] = data["Close"].rolling(window=20).std()
# anomalies will be prices that are more than 2 std from the ma
data["Anomaly"] = np.where((data["Close"] > data["20_MA"] + 2 * data["20_STD"])|
(data["Close"] < data["20_MA"] - 2 * data["20_STD"]),True,False)
# plot anomalies
plt.figure(figsize=(14,7))
plt.plot(data["Close"], label='Close Price', color = 'blue')
plt.plot(data["20_MA"],label = "20 day MA", color = "orange")
plt.scatter(data[data["Anomaly"]].index,data[data["Anomaly"]]["Close"],color="red")
plt.title(f"{ticker} Stock Price with Anomalies")
plt.xlabel("Date")
plt.ylabel("Price")
plt.legend()
# plt.show()
Part 3. Simple Liquidity Scoring
The score will be based on daily trading volme and price stability. Higher trading volumes and lower volatility indicate a higher liquidity. The volume and volatility features are normalized and averaged together to produce a liquidity score.
# normalize trading volume and rolling volatility
data["Volume Score"] = (data["Volume"] - data["Volume"].min()) / (data["Volume"].max()- data["Volume"].min())
Volatility is the standard deviation of price changes. It represents the degree of price fluctuation First, normalize the volatility and then subtract it from 1. By doing this, assests with lower volatility get a score closer to 1, indicating higher liquidity and conversely.
# higher stability = higher score
data["Volatility Score"] = 1 - (data["20_STD"] / data["20_STD"].max())
# calculate simple liquidity score
data["Liquidity Score"] = (data["Volume Score"] + data["Volatility Score"]) / 2
# plot liquidity score
plt.figure(figsize = (14,7))
plt.plot(data["Liquidity Score"],label = "Liquidity Score", color = "green")
plt.title(f"{ticker} Liquidity Score Over Time")
plt.xlabel("Date")
plt.ylabel("Liquiditty Score (0 to 1")
plt.legend()
plt.show()
This project can be further extended by using machine learning models for more complex forecasting or anomaly detection tasks.