Spark Streaming part 1: Real time twitter sentiment analysis

Spark Streaming API can consume from sources like Kafka ,Flume, Twitter source to name a few. It can then apply transformations on the data to get the desired result which can be pushed further downstream.

In this post, lets see how we can build an app that connects Spark Streaming to twitter.
This app analyses the #hashtags in the tweets over the last 10 sec and 60 sec windows while users tweet about certain keywords.


Create a scala maven project as explained here.


Create your twitter oauth tokens as explained in this link. The 4 access tokens that you generate will be useful in connecting and authenticating with twitter.



setMaster("local[4]") - Make sure to set master to local mode with at least 2 threads as 1 thread is used for collecting the incoming streams and another thread for processing it.

We count the popular hashtags with the below code:
   val topCounts60 = hashTags.map((_, 1)).reduceByKeyAndWindow(_ + _, Seconds(60))                
            .map{case (topic, count) => (count, topic)}  
            .transform(_.sortByKey(false))  

The code basically does a word count of the hashtags over the previous 60/10 secs as specified in the reduceByKeyAndWindow and sorts them in descending order. 

reduceByKeyAndWindow is used in case we have to apply transformations on data that is accumulated in the previous stream intervals.

Execute:
Now save and run the code in eclipse by passing the keywords for which you want to find the top hashtags.

Right click on the .scala file > Run configurations > Arguements
Add the keywords seperated by space and click Run








If all goes well you should see this.

















You just saw the popular hashtags people used in their tweets about iphone, apple and timcook.
Checkout the project in my github repo.


You may also like:
Spark streaming part 2: Real time twitter sentiment analysis using Flume

Spark streaming part 3: Real time twitter sentiment analysis using kafka

Data guarantees in Spark Streaming with kafka integration

Realtime stream processing using Apache Storm - Part 1

Realtime stream processing using Apache Storm and Kafka - Part 2

Labels: , , , ,