Spark SQL with JSON to parquet files

Hi Readers, In this post I will explain two things. How to convert JSON file to parquet files. Read parquet data, use sparksql to query and partition parquet data using some condition. Apache Parquet is a columnar storage format. Parquet is built to support very efficient compression and encoding schemes. Step 1: The JSON dataset …

Continue reading Spark SQL with JSON to parquet files

Spark SQL with JSON data

Hi Readers, In this post I will show how to  read a JSON dataset to create Spark SQL DataFrame and then analyse the data. Step 1: The JSON dataset is in my hdfs at 'user/edureka_162051/reviews_Cell_Phones_and_Accessories_5.json' then start the spark shell using "spark2-shell" Step 2: Read the json data using available spark session 'spark' scala> val …

Continue reading Spark SQL with JSON data