Spark SQL with JSON to parquet files

Hi Readers,

In this post I will explain two things. How to convert JSON file to parquet files. Read parquet data, use sparksql to query and partition parquet data using some condition.

Apache Parquet is a columnar storage format. Parquet is built to support very efficient compression and encoding schemes.

Step 1:

The JSON dataset is in my hdfs at ‘user/edureka_162051/reviews_Cell_Phones_and_Accessories_5.json’ then start the spark shell using “spark2-shell”

Step 2:

Load the JSON data into reviewDF.

scala> val reviewDF =“/user/edureka_162051/reviews_Cell_Phones_and_Accessories_5.json”)

Use printSchema() to know the fields and characteristics.

scala> reviewDF.printSchema()

Step 3:

Convert the JSON data to parquet file. I used coalesce(coalesce results in partitions with different amounts of data) So i have used 1. Only 1 parquet file gets created in hdfs.

scala> reviewDF.filter(“overall < 4”).coalesce(1).write.parquet(“/user/edureka_162051/parquetdata”)

Step 4:

Read the parquet data from hdfs.

scala> val reviewParquetDF =“/user/edureka_162051/parquetdata/part-00000-6e546050-c328-4cee-84cd-dd445ff9ac2c.snappy.parquet”)

Use printSchema() to know the fields and characteristics.

scala> reviewParquetDF.printSchema()

scala> reviewParquetDF.createOrReplaceTempView(“reviewsTable”)

scala> val reviewDetailsDF = spark.sql(“select reviewerName,reviewText,summary from reviewsTable”)



Step 5:

Use snappy compression to compress the parquet file and partition using column field.

scala> spark.conf.set(“spark.sql.parquet.compression.codec”, “snappy”)

Partition using overall( has rating values 1,2,3 ) field.

scala> reviewParquetDF.write.partitionBy(“overall”).parquet(“/user/edureka_162051/parquetdata/partitioned”)

Once the partitioned has been done. Please check the hdfs folder 3 folders will be created as show below.


If you have any doubts / stuck with issues please comment. You can share this page with your friends.

Follow me Jose Praveen for future notifications.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.