All about Apache NiFi

Hi Readers, In this post you will be learning what is apache nifi and when to use it. Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Automate the flow of data between systems eg JSON--> database, Kafka--> ElasticSearch, FTP--> Hadoop, etc. It has drag and drop interface. It …

Continue reading All about Apache NiFi

Apache Spark – Convert Parquet files into Avro schema

Hi Readers, In this post I will explain how to convert Parquet files into Avro schema. Read Parquet file, use write.avro() method of this package com.databricks:spark-avro_2.11:3.2.0. Apache Parquet is a columnar storage format. Parquet is built to support very efficient compression and encoding schemes. Apache Avro is a data serialization system. A compact, fast, binary data …

Continue reading Apache Spark – Convert Parquet files into Avro schema

Difference between spark, storm, samza and flink tools

Hi Readers, In this post, I have consolidated the various features difference between spark, storm, samza and flink tools. https://twitter.com/tjosepraveen/status/961292477310255105 https://twitter.com/tjosepraveen/status/961292666880245761 You can also bookmark this page for future reference. You can share this page with your friends. Follow me Jose Praveen for future notifications.

Spark SQL with JSON to Avro schema

Hi Readers, In this post I will explain two things. How to convert JSON file to avro format. Read avro data, use sparksql to query and partition avro data using some condition. Apache Avro is a data serialization system. A compact, fast, binary data format. A container file, to store persistent data. Avro relies on schemas. …

Continue reading Spark SQL with JSON to Avro schema

Spark SQL with JSON to parquet files

Hi Readers, In this post I will explain two things. How to convert JSON file to parquet files. Read parquet data, use sparksql to query and partition parquet data using some condition. Apache Parquet is a columnar storage format. Parquet is built to support very efficient compression and encoding schemes. Step 1: The JSON dataset …

Continue reading Spark SQL with JSON to parquet files

Spark SQL with JSON data

Hi Readers, In this post I will show how to  read a JSON dataset to create Spark SQL DataFrame and then analyse the data. Step 1: The JSON dataset is in my hdfs at 'user/edureka_162051/reviews_Cell_Phones_and_Accessories_5.json' then start the spark shell using "spark2-shell" Step 2: Read the json data using available spark session 'spark' scala> val …

Continue reading Spark SQL with JSON data

Sqoop, Spark, Scala, MySQL project

Hi Readers, In this post I will explain how to import mysql data into hdfs as avrofile using sqoop. Use sparksql to read those avro data, process it and export as csv file and load that csv into mysql. Step 1: Check whether the data resides in mysql table. In this example I use order …

Continue reading Sqoop, Spark, Scala, MySQL project