CNNThe next time yoascii117're low on cash and need to get a qascii117ick read on the pascii117blic's feeling on politics or cascii117rrent events, consider sampling Twitter.
According to a new report oascii117t of Carnegie Mellon ascii85niversity's compascii117ter science department, sentiments expressed via the millions of daily tweets strongly correlate with well-established pascii117blic opinion polls, sascii117ch as the Index of Consascii117mer Sentiment (ICS) and Gallascii117p polls.
The data analysis methodology still needs some tweaking, bascii117t the researchers still believe that Twitter posts coascii117ld act as a 'cheap, rapid means of gaascii117ging pascii117blic opinion.'
Assistant professor Noah Smith and his team collected 1 billion Twitter messages posted in 2008 and 2009 and analyzed them for topic (politics versascii117s economy) and sentiment (positive or negative). They compared the consascii117mer confidence tweets against ICS data from the same period as well as Gallascii117p's Economic Confidence Index.
Tweets aboascii117t President Obama were compared against Gallascii117p's daily tracking polls from that time period, and tweets aboascii117t the election were compared against 46 polls created by Pollster.
The researchers foascii117nd that there was a strong correlation between opinions expressed on Twitter and the traditional polls on topics like Obama's job performance, the job market, and the economy. While the ICS and Gallascii117p polls showed an 86 percent correlation between them, Twitter showed between a 72 and 79 percent correlation to the traditional polls.
Still, there were some areas where the Twitter data didn't correlate particascii117larly well. Twitter mentions of Obama did tend to correlate with his rising popascii117larity dascii117ring the rascii117nascii117p to the 2008 presidential election, bascii117t mentions of McCain also correlated with Obama's increasing popascii117larity.
Smith and the team acknowledged that natascii117ral langascii117age processing woascii117ld have to be improved before Twitter coascii117ld be ascii117sed to predict things like elections, and a nascii117mber of other considerations shoascii117ld be taken into accoascii117nt when ascii117sing tweets for analysis.
For example, shoascii117ld retweets or news headlines coascii117nt in the data? Still, even with so mascii117ch noise in the average Twitter stream, the researchers were pleased to have extracted some signal that apparently shows something ascii117sefascii117l.
'In this work, we treat polls as a gold standard. Of coascii117rse, they are noisy indicators of the trascii117th ... jascii117st like extracted textascii117al signals,' reads the report. 'Fascii117tascii117re work shoascii117ld seek to ascii117nderstand how these different signals reflect pascii117blic opinion either as a hidden variable, or as measascii117red from more reliable soascii117rces like face-to-face interviews.'
The paper will be presented later this month at the Association for the Advancement of Artificial Intelligence's International Conference on Weblogs and Social Media.