What is Spark?
From the Apache Spark Documentation:
- Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.
Installation
-
Install Spark
- If you do not currently have the Java JDK (version 7 or higher) installed, download it and follow the steps to install it for your operating system.
- Visit the Spark downloads page, select a pre-built package, and download Spark. Double-click the archive file to expand its contents ready for use.
- Move the expanded folder into a location suitable for your experiments!
-
Write some code!
-
Package a jar containing your application
- At the root of your project execute:
sbt package
- You should see something like the following in the console:
Packaging ... playing-with-spark/target/scala-2.11/learning-with-spark_2.11-1.0.jar
- At the root of your project execute:
-
Use spark-submit to run your application
YOUR_SPARK_HOME/bin/spark-submit --class "com.learning.spark.LetterCounter" --master local[4] target/scala-2.11/learning-with-spark_2.11-1.0.jar
-
You should see in the output:
Lines with a: 14, Lines with b: 9
-
Keep on learning about the Spark API with the Spark Programming Guide
-
For running applications on a cluster, go to deployment overview
-
Spark includes several Scala examples in the examples directory
- You can use
YOUR_SPARK_HOME/bin/run-example EXAMPLE_NAME
to run the Scala Examples
- You can use