Finding and analyzing popular TV shows on Netflix using topic modelling in Python

I used Twitter Streaming API to collect around 272300 tweets in a duration of 24 hours starting July 14 2020. I wrote small piece of code using tweepy python library and stored them to a sqlite database.

This video contains results of a datascience experiment which I performed on tweets containing the word ‘Netflix’

I fitted a Latent Dirichlet Allocation model to extract 25 topics from a random subset of tweets. I manually reviewed important keywords associated with each topic and shortlisted topics which were related to TV shows. Then, I used this fitted LDA model to tag every tweet in the dataset with a topic name.

Following Python libraries/tools were used for the above project:

  • geopy
  • tweepy
  • scikit-learn
  • numpy
  • pandas
  • Sqlite
  • Plotly
  • GCP’s Geocoding API
  • Twitter’s streaming API