Hi Readers,
I am happy to post another blog post on how to use map reduce to find Max and Min temperature of a data set.
I have a dataset (say temperaturedata.txt) file. I like to analyze the above data set using map reduce.
I have a written a map reduce program to find the max and min temp of the year.
I have a data set in my linux file system and I need to move that to hadoop file system.
hdfs dfs -copyFromLocal /home/cloudera/Downloads/temperaturedata.txt /jpraveen/temperaturedata.txt
Now the file is in hdfs directory i.e (hdfs dfs -ls /jpraveen/)
Now you need to convert your map reduce program into a jar file and run the jar file using the below command.
hadoop jar MaxMinTemp.jar /jpraveen/temperaturedata.txt ~/output1
Note:
Below code snippet in the main method of your program
FileInputFormat.setInputPaths(job, new Path(args[0])) represents this value /jpraveen/temperaturedata.txt. FileOutputFormat.setOutputPath(job, new Path(args[1])) represents this value ~/output1. JobClient.runJob(job) this will trigger the map reduce job to start.
Moreover you can also track the job status which you will get at the time of running the map reduce.
http://quickstart.cloudera:19888/jobhistory/job/job_1479137854338_0004/
The output of the above job can seen using the below command or through file browser
hdfs dfs -cat /root/output1/part-00000 . (or) http://quickstart.cloudera:50070/explorer.html#/root/output5.
Nice and simple article
LikeLike