Thanks Rohini and Rajesh. Piyush and I were able to run yourkit successfully. Even better, we use HDFS and YARN distributed cache to distribute yourkit to the container's working directory. In that way, there is no need to ask admins to install yourkit on node managers. We can update the above wiki with the details. Any ideas how to get edit permission for the wiki page?
On Wed, Jan 18, 2017 at 4:52 PM, Piyush Narang <[email protected]> wrote: > Ah that looks promising! Let me check it out :-) > > On Wed, Jan 18, 2017 at 4:34 PM, Rajesh Balamohan <[email protected]> > wrote: > >> If you know the vertex/task, you can enable profiling only on those. >> Please check "Profiling in tez" section in https://cwiki.apache.org/confl >> uence/display/TEZ/How+to+Diagnose+Tez+App. But this is with yourkit, >> which would dump the snapshot at the end of the run. Haven't tried with >> hprof options. >> >> ~Rajesh.B >> >> On Thu, Jan 19, 2017 at 5:39 AM, Piyush Narang <[email protected]> >> wrote: >> >>> hi folks, >>> >>> I had a couple of Cascading3 on Tez jobs that seemed to be running >>> slower on Tez as compared to Hadoop. Wanted to try and get some hprof >>> profiles to see what the jobs are spending time on, so thought I'd try and >>> get a hprof profile. I tried running the job with: >>> >>> -Dmapreduce.task.profile=true \ >>> >>> -Dtez.task.launch.cluster-default.cmd-opts="-XX:+UseSerialGC >>> -Djava.net.preferIPv4Stack=true -XX:ReservedCodeCacheSize=128M >>> -XX:MaxMetaspaceSize=256M -XX:CompressedClassSpaceSize=256M >>> -XX:CICompilerCount=2 -XX:HeapDumpPath=<LOG_DIR>/heapdump-@[email protected] >>> -XX:ErrorFile=<LOG_DIR>/hs_err_pid-@[email protected] >>> -agentlib:hprof=cpu=samples,heap=sites,force=n,thread=y,verb >>> ose=n,file=<LOG_DIR>/profile-@[email protected]" >>> Now when I kick off the Tez job, it seems to spend very long (upwards of >>> an hour) stuck at 0%. The tasks don't seem to proceed beyond this: >>> >>> 2017-01-18 23:53:26,261 [INFO] [TezChild] |tez.FlowProcessor|: flow node >>> id: E08C07BFB10141D8B6D7211E5AF172E4, all 1 inputs ready in: 00:00:00.002 >>> >>> >>> Tried capturing some jstacks and the top of the stack seems to hold some >>> hprof >>> >>> related frames. >>> >>> >>> Has anyone been able to profile their Tez jobs with hprof? Are there any >>> other >>> >>> settings I'm missing? >>> >>> >>> Same set of options(replace tez.task.launch.cluster-default.cmd-opts with >>> mapreduce.task.profile.params) seem to work fine in case of Hadoop. I end >>> up getting the >>> >>> hprof profiles there. >>> >>> >>> Thanks, >>> >>> -- >>> - Piyush >>> >> >> > > > -- > - Piyush >
