Running Apache Spark on eclipse Windows/Linux

It would be great if we can start building Spark applications on eclipse and run it without actually needing a Cluster. However before we can do that, below are a few simple steps to follow in order to setup a Spark development environment on eclipse.

Go to Eclipse > Help > Eclipse Marketplace:



Install Scala IDE:

Under Search Tab type in "scala". Select Scala IDE x.x.x and click on "Install"

(At the time of writing this article, Scala IDE 4.2.x was the latest)
Accept the license agreement and continue installation. Installation completes in under an hour and a half.


Create a Maven Project:



Enter the Group ID and Artifact ID and click on "Finish"


Add Scala Nature:

Right click on the project you just created. Go to Configure > Add Scala Nature


Update the pom.xml with the list of dependencies:

           <dependency>  
                <groupId>org.apache.spark</groupId>  
                <artifactId>spark-core_2.10</artifactId>  
                <version>1.4.1</version>  
                <scope>compile</scope>  
           </dependency>  
           <dependency>  
                <groupId>org.apache.spark</groupId>  
                <artifactId>spark-streaming_2.10</artifactId>  
                <version>1.4.1</version>  
                <scope>compile</scope>  
           </dependency>  
           <dependency>  
                <groupId>org.apache.spark</groupId>  
                <artifactId>spark-mllib_2.10</artifactId>  
                <version>1.4.1</version>  
           </dependency>  
           <dependency>  
                <groupId>org.apache.spark</groupId>  
                <artifactId>spark-sql_2.10</artifactId>  
                <version>1.4.1</version>  
           </dependency>  
           <dependency>  
                <groupId>org.apache.spark</groupId>  
                <artifactId>spark-hive_2.10</artifactId>  
                <version>1.4.1</version>  
           </dependency>  
           <dependency>  
                <groupId>org.apache.spark</groupId>  
                <artifactId>spark-streaming-twitter_2.10</artifactId>  
                <version>1.4.1</version>  
           </dependency>  
           <dependency>  
                <groupId>org.apache.spark</groupId>  
                <artifactId>spark-streaming-kafka_2.10</artifactId>  
                <version>1.4.1</version>  
           </dependency>  
           <dependency>  
                <groupId>org.apache.hadoop</groupId>  
                <artifactId>hadoop-client</artifactId>  
                <version>2.4.1</version>  
                <scope>compile</scope>  
           </dependency>  
           <dependency>  
                <groupId>org.apache.hadoop</groupId>  
                <artifactId>hadoop-common</artifactId>  
                <version>2.4.1</version>  
                <scope>compile</scope>  
           </dependency>  
           <dependency>  
                <groupId>org.apache.spark</groupId>  
                <artifactId>spark-streaming-flume_2.10</artifactId>  
                <version>1.4.1</version>  
           </dependency>  


Remove Scala Library:

Go to Build path > Configure Build Path
Under Libraries, select Scala Library container and Click on Remove


Update Scala installation to Fixed scala installation:

Right click on the project. Go to Properties > Scala Compiler > Click on the checkbox "Use Project Settings"
Under Scala Installation dropdown select "Fixed Scala Installation: 2.10.6 (built in)"




















Refactor Project: 

Right click on src/main/java. Click on Refactor > Rename and type in scala in place of Java



















Select Scala perspective:

On the right top corner, click on "open perspective". Select scala and click OK






















Create a Scala Object:

Right click on the Package within the Project. Select Scala Object. Type in a name and click "Finish"


You can now start developing your Spark Application.
Good Luck!

Labels: , ,