Optimization techniques in Apache Spark

Hi Readers,

In this post you will be learning the various optimization techniques used in apache spark.

We can optimize our Spark applications by using data serialization technique, broadcasting etc…

Data serialization
Spark provides two options for data serialization
1 Java serialization
2 Kryo serialization

Compared to Java serialization, Kryo serialization is much fastert than Jav serialization.

You can start using Kryo by initializing your Spark job with a SparkConf as below

scala> import org.apache.spark._
import org.apache.spark._
scala> import org.apache.spark.rdd.RDD
import org.apache.spark.rdd.RDD
scala> val conf = new SparkConf().setAppName("My App").setMaster("local[*]")
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@54c15e
scala> conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
res1: org.apache.spark.SparkConf = org.apache.spark.SparkConf@54c15e

A broadcast enables a read-only copy of an instance or class variable cached on each driver program, rather than transferring a copy of its own.

When to use:
If you have a certain task in your Spark job that uses large objects from the driver program, you should turn it into a broadcast variable.

how to use:
you can instantiate it using SparkContext.broadcast

scala> val m1 = 20
m1: Int = 20
scala> val m2 = sc.broadcast(m1)
bv: org.apache.spark.broadcast.Broadcast[Int] = Broadcast(0)
scala> m2.value
res7: Int = 20

The Broadcast feature of Spark uses the SparkContext to create broadcast values. After that, the BroadcastManager and ContextCleaner are used to control their life cycle.

Some other techniques like memory tuning, data structure tuning, GC tuning etc.

You can also bookmark this page for future reference.

You can share this page with your friends.

Follow me Jose Praveen for future notifications.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.