Spark SQL with JSON data

Hi Readers,

In this post I will show how to  read a JSON dataset to create Spark SQL DataFrame and then analyse the data.

Step 1:

The JSON dataset is in my hdfs at ‘user/edureka_162051/reviews_Cell_Phones_and_Accessories_5.json’ then start the spark shell using “spark2-shell”

Step 2:

Read the json data using available spark session ‘spark’

scala> val reviewDF ="/user/edureka_162051/reviews_Cell_Phones_and_Accessories_5.json")

Use printSchema() to verify the fields and characteristics of reviewDF.

scala> reviewDF.printSchema()

The schema looks like below.

|– asin: string (nullable = true)
|– helpful: array (nullable = true)
| |– element: long (containsNull = true)
|– overall: double (nullable = true)
|– reviewText: string (nullable = true)
|– reviewTime: string (nullable = true)
|– reviewerID: string (nullable = true)
|– reviewerName: string (nullable = true)
|– summary: string (nullable = true)
|– unixReviewTime: long (nullable = true)

Step 3:

Now I need to create a ”createOrReplaceTempView” (Creates a new temporary view using a SparkDataFrame in the Spark Session). With that temporary view I can use SQL query.

scala> val selectDF = spark.sql("select asin,helpful,overall,reviewText,reviewTime,reviewerID,reviewerName,summary,unixReviewTime from reviewsTable")

scala>  //you can see the reviewTable data in tabular format.

Step 4:
Get review results where overall value greater than 4.

scala> val overallDF = spark.sql("select asin,overall,reviewText,reviewTime,reviewerID,reviewerName,summary from reviewsTable where overall >=4")

scala>  //you can see the overall value greater than 4 data in tabular format.


If you have any doubts / stuck with issues please comment. You can share this page with your friends.

Follow me Jose Praveen for future notifications.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.