> On 30 Dec 2015, at 13:19, alvarobrandon <alvarobran...@gmail.com> wrote: > > Hello: > > Is there anyway of monitoring the number of Bytes or blocks read and written > by an Spark application?. I'm running Spark with YARN and I want to measure > how I/O intensive a set of applications are. Closest thing I have seen is > the HDFS DataNode Logs in YARN but they don't seem to have Spark > applications specific reads and writes. > > 2015-12-21 18:29:15,347 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: > /127.0.0.1:53805, dest: /127.0.0.1:50010, bytes: 72159, op: HDFS_WRITE, > cliID: DFSClient_NONMAPREDUCE_-1850086307_1, offset: 0, srvID: > a9edc8ad-fb09-4621-b469-76de587560c0, blockid: > BP-189543387-138.100.13.81-1450715936956:blk_1073741837_1013, duration: > 2619119 > hadoop-alvarobrandon-datanode-usuariop81.fi.upm.es.log:2015-12-21 > 18:29:15,429 INFO org.apache.hadoop.hdfs.server.d > > Is there any trace about this kind of operations to be found in any log?
1. the HDFS namenode and datanodes all collect metrics of their use, with org.apache.hadoop.hdfs.server.datanode.metrics.DataNodeMetrics being the most interesting on IO. 2. FileSystem.Statistics is a static structure collecting data on operations and data for each thread in a client process. 3. The HDFS input streams also supports some read statistics (ReadStatistics via getReadReadStatistics) 4. the recent versions of HDFS are also adding htrace support, to trace end-to-end performance. I'd start with FileSystem.Statistics; if that's not being collected across spark jobs, it should be possible --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org