Spark SQL with JSON data

Hi Readers,

In this post I will show how to  read a JSON dataset to create Spark SQL DataFrame and then analyse the data.

Step 1:

The JSON dataset is in my hdfs at ‘user/edureka_162051/reviews_Cell_Phones_and_Accessories_5.json’ then start the spark shell using “spark2-shell”

Step 2:

Read the json data using available spark session ‘spark’

scala> val reviewDF = spark.read.json("/user/edureka_162051/reviews_Cell_Phones_and_Accessories_5.json")

Use printSchema() to verify the fields and characteristics of reviewDF.


scala> reviewDF.printSchema()

The schema looks like below.

root
|– asin: string (nullable = true)
|– helpful: array (nullable = true)
| |– element: long (containsNull = true)
|– overall: double (nullable = true)
|– reviewText: string (nullable = true)
|– reviewTime: string (nullable = true)
|– reviewerID: string (nullable = true)
|– reviewerName: string (nullable = true)
|– summary: string (nullable = true)
|– unixReviewTime: long (nullable = true)

Step 3:

Now I need to create a ”createOrReplaceTempView” (Creates a new temporary view using a SparkDataFrame in the Spark Session). With that temporary view I can use SQL query.

scala> val selectDF = spark.sql("select asin,helpful,overall,reviewText,reviewTime,reviewerID,reviewerName,summary,unixReviewTime from reviewsTable")


scala> selectDF.show()  //you can see the reviewTable data in tabular format.

Step 4:
Get review results where overall value greater than 4.


scala> val overallDF = spark.sql("select asin,overall,reviewText,reviewTime,reviewerID,reviewerName,summary from reviewsTable where overall >=4")


scala> selectDF.show()  //you can see the overall value greater than 4 data in tabular format.

json_sparkscala

If you have any doubts / stuck with issues please comment. You can share this page with your friends.

Follow me Jose Praveen for future notifications.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s