I used Twitter Streaming API to collect around 272300 tweets in a duration of 24 hours starting July 14 2020. I wrote small piece of code using tweepy python library and stored them to a sqlite database.
This video contains results of a datascience experiment which I performed on tweets containing the word ‘Netflix’
I fitted a Latent Dirichlet Allocation model to extract 25 topics from a random subset of tweets. I manually reviewed important keywords associated with each topic and shortlisted topics which were related to TV shows. Then, I used this fitted LDA model to tag every tweet in the dataset with a topic name.
Following Python libraries/tools were used for the above project:
- geopy
- tweepy
- scikit-learn
- numpy
- pandas
- Sqlite
- Plotly
- GCP’s Geocoding API
- Twitter’s streaming API