The jupyter notebook for this assignment may be found here.
In this project you will develop tools for performing sentiment analysis on a database of tweets from across
the country. When the project is complete you should be able to estimate the sentiment of tweets filtered by
content.
There are 4 files provided here: http://www.cs.columbia.edu/~cannon/tweet_data/
(http://www.cs.columbia.edu/~cannon/tweet_data/)
1. all_tweets.txt is the large collection of tweets
2. some_tweets.txt is a subset of all_tweets that's more manageable to prototype on
3. sentiments.csv a csv with word sentiment values
4. zips.csv (not required, see below)
We will go over the format of each of these files in class.
Tweets: We can represent a single tweet using a Python dictionary with the following entries:
text: a string, the text of the tweet all in lowercase
time: a datetime object, date and time of the tweet
latitude: a float, the latitude of the tweet's location
longitude: a float, the longitude of the tweet's location
Problem 1a Create a list of dictionaries from the data in some_tweets.txt where each dictionary
corresoponds to a single tweet. If you change the format of the some_tweets file you should include your
altered version with your submission.
In [ ]:
#your code here
Problem 1b Create a single DataFrame from the list of tweets.
In [ ]:
#your code here
Problem 2 Write a function add_sentiment that adds a sentiment column to the DataFrame from 1b.
Determine the sentiment of each tweet by taking the average sentiment over all of the words in the tweet.
Use the sentiment values (between -1 and 1) in the sentiments.csv file to get the value of a word's
sentiment. Note: words without a sentiment do not have sentiment 0, they have no sentiment at all and
should therefore not contribute to the average. Your function should take as input a DataFrame of tweets
together with the name of the sentiment file. Note that your function will be altering the DataFrame. This is a
side effect. It's okay to do it this time.
In [ ]:
def add_sentiment(tweets,filename):
#your code here
Problem 3 Write a function called tweet_filter that will return a new DataFrame of tweets filtered by the
content of the tweet text. The input for this function should be a DataFrame of tweets and a list of words
(strings). The function should return a DataFrame of tweets that each include all of the words in the word list
ignoring case and punctuation.
Get Free Quote!
281 Experts Online