Language identification using fastText on tweets containing 'Netflix'

In this DIY data science experiment we attempt to uncover more insights from tweets about Netflix by using fastText’s language identification model.
We find top 10 languages used for tweeting about Netflix and their distribution around the world.

The fastText language identification model is less than 1MB in size and I was able to perform process 26000 tweets per second using it (which I think is pretty impressive !)

This video contains results of the language identification experiment which I performed on tweets containing the word ‘Netflix’

Link to fastText language identification model: https://fasttext.cc/docs/en/language-identification.html

Following Python libraries/tools were used for the above project:

  • fastText language identification model
  • geopy
  • tweepy
  • scikit-learn
  • numpy
  • pandas
  • Sqlite
  • Plotly
  • GCP’s Geocoding API
  • Twitter’s streaming API