Python Social Media Analytics by Siddhartha Chatterjee

Python Social Media Analytics by Siddhartha Chatterjee

Author:Siddhartha Chatterjee
Language: eng
Format: epub
Tags: COM062000 - COMPUTERS / Data Modeling and Design, COM018000 - COMPUTERS / Data Processing
Publisher: Packt Publishing
Published: 2017-07-28T04:41:48+00:00


Sentiment analysis and entity recognition are two powerful social media analytics techniques to get context around user content. Sports being a sentiment and emotion inciting subject among audiences, for this chapter the dataset we used were tweets using the Twitter API on the English Football Premier League. We used the Twitter REST and Streaming API to collect the data and also applied basic cleaning explained in Chapter 2, Harnessing Social Data - Connecting, Capturing, and Cleaning) and new cleaning methods such as device detection from Twitter API metadata. Sentiment Analysis allows us to categorize text into positive, negative, and neutral categories. We also learnt that there are limitations to sentiment analysis with accuracy, especially in ambiguous expressions. We used the VADER (Valence Aware Dictionary for Sentiment Reasoning) module from NLTK for sentiment analysis. We also saw that we can build our own sentiment analysis algorithm through machine learning on test and train set datasets. Accuracy of custom sentiment analysis depends heavily on the quality and size of the example or training set. Building and applying our own sentiment analyzer using the Python Scikit Learn library we got an accuracy of around 73%. We applied the cross-validation, confusion matrix, K-Fold, and precision/recall techniques to evaluate the performance of our algorithm.

Entity recognition allows us to categorize textual data into categories such as name, place, organization, and others. This is an efficient method to get a broad understanding on large amounts of social media conversations. We used a Java-based popular entity recognition module, Stanford NER. Using the library on our football dataset allowed us to extract the most frequent clubs, locations, and names being mentioned. We combined Sentiment Analysis and Entity recognition on the chosen dataset by computing sentiments on the entity club detected. Chelsea, Arsenal, and Liverpool being among the most frequent clubs as entities, the application of sentiment analysis on them gave us some insights.

In the next chapter, we will explore data from YouTube to analyze campaigns.


Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.