As I read news about the president’s campaign, his election and then his inauguration, I felt that there was a sudden shift in the way news media was talking about Mr. Trump, especially during the lead up to his inauguration and the first few weeks of his presidency. While some of the negativity was self-explanatory as Trump questioned the legitimacy of select media
I wanted to see if data could prove my hypothesis that there was a shift in news sentiment towards Trump before and after his inauguration.
There have been multiple sentiment analyses done on Trump’s social media posts. While these projects make the news and garner online attention, few analyses have been on the media itself. During the presidential campaign in 2016, Data Face ran a text analysis on news articles about Trump and Clinton. The results gained a lot of media attention and in fact steered conversation. I planned to follow a similar approach. For the final project of our Natural Language processing class at Syracuse University, Daniela Fernández Espinosa from the Information School, James Troncale from the Linguistics Department and I, built a prototype sentiment analyzer to help political figures make better strategy plans. The results of that analysis are posted here.
There isn’t a very big shift in polarity, but we can see that the percent of negative articles increased after his inauguration while the percent of positive and neutral articles went down. The high percent of neutral articles represents news media’s objectivity.
What is more interesting is when we break down this tally, source wise.
In the days leading up to the inauguration, a lot of what was written reflected the media's unsurety about what the presidency under inder Trump would look like. There was a curious mix of optimistic articles with cautionary ones. But immediately after his inauguration came the aftermath of the Russian hacking allegations followed by the 'travel ban' during which Trump lashed out at the media. The number of ngeative artices when up following these two incidents.
While most of the sources follow the same trend of more negative stories after his inauguration, Washington Times reported fewer negative articles about Trump since he became President, dropping from 38% negative reports before the inauguration to 21% after. They increased the number of positive articles to 33% from the 25% it was before. Following the inauguration, Fox seems to have increased both negative and positive coverage while reducing neutral coverage. Before the inauguration 49% of the articles from Fox in our data set were negative but the number jumped to 51% after Trump was sworn in as President. Sites like the New York Times and Slate also followed the larger pattern of increase in negative cover following the inauguration. the same pattern. The LA Times has the biggest jump in negative articles, going from a 38% before his inauguration to 63% after. Even the Chicago Tribune, that is known to be conservative, increased its negative coverage and decreased it positive coverage.
Since our program was modeled on text articles from the web, the media coverage mentioned here pertains only to written news. This data doesn't take into account Video, audio and other multimedia reports.
FRAMEWORK: Python’s NLTK toolkit and its sentiment analyzer module.
Part of this project was training our Naive Bayes Classifier on a manually tagged set of articles about a particular political figure. In our case, we chose Trump because of the immense media attention given to him.
We collected around 2000 articles about Trump, one month before and after his inauguration from the following news sites: Chicago Tribune, CNN, FOX, LA Times, New York Times, Slate, Washington Post and Washington Times. We randomly selected 20% of our corpus and manually tagged the articles we read as Positive, Negative or Neutral. The final tag assigned to each article in the training set was the majority sentiment that was tagged by us. E.g. If an article was tagged: Positive, Positive, Negative by the three of us individually, the final tag of that article was Positive. If we encountered a tie we would sit and reread the article together and come to a consensus about the final tag.
After we had our tagged set, we ran a preliminary analysis on it to get an idea of what we were building our hypothesis on.
As we thought the number of negative articles increased after his inauguration, we hypothesized that this would be true for our entire data set as well.
We created a few feature sets that we thought would help analyze sentiment in a news article. As mentioned above, sentiment analysis on news is very subjective and each model will be different from the next. For our model, we created the following feature sets:
Quotes: Does the number of quotes in an article determine its sentiment
No Punctuation: Do punctuations bear any weight on sentiment
Exclamations and Question Marks: Does having an exclamation mark in the text affect the polarity of that text
Word Polarity: Each word in the article is given a polarity score based on the MPQA lexicon and then the scores are added to determine the sentiment of the article
Adjective Polarity: Each adjective in the article is given a polarity score based on the MPQA lexicon and then the scores are added to determine the sentiment of the article
Stopwords: Do words like 'then','a','is','an' and so on have any say in the sentiment of an article?
Bigram: Taking two consecutive words to analyze sentiment. For example, the word-set ‘mexico’, ‘border’ has a negative connotation in our data set.
Unigram: Counts every single word in the article. For example, if the word “wrong” appears in many articles tagged Negative, then the machine will assume that a new article with the word “wrong” in it will also be negative.
After defining our feature sets, we went on to test the accuracy of our model. In the end, the Adjective Polarity feature set scored the highest in terms of accuracy. That is what our model is based on. It takes every adjective in the text and assigns it a polarity score based on the MPQA word lexicon. Then the Negative, Positive and Neutral scores are tallied for each article and the sentiment with the highest score is the final tag for the article. We ran the rest of the database through our program and the results are what you see on this site.
While we were satisfied with our results, the sentiment analysis model itself is very subjective and hence the results here should be taken as nothing but than the outcome of an educational quest.
Presented below is a sample of the data we used for this analysis. The tags in the list were generated by the computer. As you will see, not all the tags are accurate, as subjectivity depends on the reader as well.