Skip to content

Used NLP techniques to determine the sentiment and similarity between 10-Ks which are filed annually by publicly traded companies. These similarities were used as metrics for predicting subsequent price movements

Notifications You must be signed in to change notification settings

rlew631/NLP_stock_analysis

Repository files navigation

NLP Stock Analysis

Project Overview

This project is a continuation of the Lazy Prices paper published by Lauren Cohen of HBS which speculated that a magnitude of changes in the content of a company's quarterly filing would indicate that their stock is likely to drop.

This is based on the theory that companies are reporting the bare minimum required by the SEC, that any changes would more likely than not be to disclose risk to stakeholders and that one could make a profitable trading strategy by shorting companies whose quarterly filings change substantially compared to their typical reporting pattern.

The following steps were taken to generate the data required for testing Cohen's hypothesis:

scrape_and_score.ipynb was used for:

  • download list of tickers from NASDAQ, NYSE, AMEX and the corresponding ciks (SEC labels for tickers)
  • scrape the data from the SEC(only the 10ks were downloaded for this project)
  • preprocess: HTMLs to txt
  • get cos similarity and jaccard scores

process_scores.ipynb was used for:

  • searching the 10k folders to see which ciks where processed from our stock list
  • adding the dates to the to the scores
  • generating a list of ciks/tickers/scores/date ranges
  • create final_ciks.txt containing ciks worth looking into for further analysis

Additional exploratory data analysis:

topic_modeling.ipynb was used for:

  • aggregating all the 10ks
  • cleaning/lemmatizing the documents
  • topic modeling using LDA

About

Used NLP techniques to determine the sentiment and similarity between 10-Ks which are filed annually by publicly traded companies. These similarities were used as metrics for predicting subsequent price movements

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published