Re: how to get info about which data in hdfs or file system that a MapReduce job visits?

Ravi Prakash Thu, 27 Jul 2017 10:07:01 -0700

Hi Jaxon!

MapReduce is just an application (one of many including Tez, Spark, Slider
etc.) that runs on Yarn. Each YARN application decides to log whatever it
wants. For MapReduce,
https://github.com/apache/hadoop/blob/27a1a5fde94d4d7ea0ed172635c146d594413781/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java#L762
logs which split is being processed. Are you not seeing this message?
Perhaps check the log level of the MapTask.

For the other YARN applications, the logging may be different.

In any case, for all the frameworks, if the file is on HDFS, the hdfs audit
log should have a record.

HTH
Ravi

On Wed, Jul 26, 2017 at 11:27 PM, Jaxon Hu <hujiaxu...@gmail.com> wrote:

> Hi!
>
> I was trying to implement a Hadoop/Spark audit tool, but l met a problem
> that I can’t get  the input file location and file name. I can get
> username, IP address, time, user command, all of these info  from
> hdfs-audit.log. But When I submit a MapReduce job, I can’t see input file
> location  neither in Hadoop logs or Hadoop ResourceManager. Does hadoop
> have API or log that contains these info through some configuration ?If it
> have ,What should I configure?
>
> Thanks.
>

Re: how to get info about which data in hdfs or file system that a MapReduce job visits?

Reply via email to