Re: how to get info about which data in hdfs or file system that a MapReduce job visits?

2017-07-27 Thread Ravi Prakash
Hi Jaxon! MapReduce is just an application (one of many including Tez, Spark, Slider etc.) that runs on Yarn. Each YARN application decides to log whatever it wants. For MapReduce,

Re: Lots of Exception for "cannot assign requested address" in datanode logs

2017-07-27 Thread Ravi Prakash
You replication numbers do seem to be on the high. How did you arrive at those numbers? If you swamp the datanode with too much replication work than it can do in an iteration (every 3 seconds), things would go bad. I often check using `ps aux | grep java` all the java processes running rather

Re: Install a hadoop cluster manager for open source hadoop 2.7.3

2017-07-27 Thread Billy Watson
Nishant, Sorry about the late reply. You may want to check out https://ambari.apache.org/mail-lists.html to see if the Ambari user list can answer your question better. William Watson Lead Software Engineer J.D. Power O2O

Re: How to use webhdfs CONCAT?

2017-07-27 Thread Wellington Chevreuil
Yes, all the files passed must pre-exist. In this case, you would need to run something as follows: curl -i -X POST

Re: How to use webhdfs CONCAT?

2017-07-27 Thread Cinyoung Hur
Hi, Wellington All the source parts are: -rw-r--r-- hadoop supergroup 2.43 KB 2 32 MB part-01-00-000 -rw-r--r-- hadoop supergroup 21.14 MB 2 32 MB part-02-00-000 -rw-r--r-- hadoop supergroup 22.1 MB 2 32 MB part-04-00-000 -rw-r--r-- hadoop supergroup 22.29 MB 2 32 MB

how to get info about which data in hdfs or file system that a MapReduce job visits?

2017-07-27 Thread Jaxon Hu
Hi! I was trying to implement a Hadoop/Spark audit tool, but l met a problem that I can’t get the input file location and file name. I can get username, IP address, time, user command, all of these info from hdfs-audit.log. But When I submit a MapReduce job, I can’t see input file location