Hi Juho,

Could you download the data from Tez-UI (dag details page should provide a
download button which downloads data for the dag where you see the issue).

Also, could you share the yarn-app logs?

What are the runtime numbers you see in MR and Tez?

~Rajesh.B

On Mon, Oct 26, 2015 at 3:51 AM, Juho Autio <[email protected]> wrote:

> Hitesh, thanks for your response!
>
> I got rid of the Full GC problem thanks to Jeff Zhang's hint. However
> performance with tez is still not perfect – mr engine is faster for this
> particular job. What are the perf analyzers you're referring to? I'd like
> to try those. I have Tez UI available indeed. What hive options are
> important to consider in combination with tez? Is the container size the
> only option by Tez itself to tune?
>
> Cheers,
> Juho
>
> On Fri, Oct 23, 2015 at 6:41 PM, Hitesh Shah <[email protected]> wrote:
>
>> Hello Juho
>>
>> As you are probably aware, each hive query will largely have different
>> memory requirements depending on what kind of plan it ends up executing.
>> For the most part, a common container size and general settings work well
>> for most queries.
>> In this case, this might need additional tuning to either fix the hive
>> query plan or correctly size the Tez container just for this query as well
>> as tuning any other Hive knobs that may be making the wrong assumptions
>> about data stats or available memory to play with causing this query to run
>> very slowly.
>>
>> As a first step, it would be good if you can help provide the explain
>> plan for the query, hive-site/tez-site for configs being used and the yarn
>> application logs for the completed query. If you have the Tez UI available,
>> you can click the “Download data” on the dag details page too which can be
>> used to run against the various perf analyzers available in Tez to see what
>> the issue is.
>>
>> thanks
>> — Hitesh
>>
>>
>> On Oct 23, 2015, at 1:08 AM, Juho Autio <[email protected]> wrote:
>>
>> > Hi,
>> >
>> > I'm running a Hive script with tez-0.7.0. The progress is real slow and
>> in the container logs I'm seeing constant Full GC lines, so that there
>> doesn't seem to be no time for the JVM to actually execute anything between
>> the GC pauses.
>> >
>> > When running the same Hive script with mr execution engine, the job
>> goes through normally.
>> >
>> > So there's something specific to Tez's memory usage that causes the
>> Full GC issue.
>> >
>> > Also with similar clusters & configuration other Hive jobs have gone
>> through with Tez just fine. This issue happens when I just add a little
>> more data to be processed by the script. With a smaller workload it goes
>> through also with Tez engine with the expected execution time.
>> >
>> > For example an extract from one of the container logs:
>> >
>> >
>> application_1445328511212_0001/container_1445328511212_0001_01_000292/stdout.gz
>> >
>> > 791.208: [Full GC
>> > [PSYoungGen: 58368K->56830K(116736K)]
>> > [ParOldGen: 348914K->348909K(349184K)]
>> > 407282K->405740K(465920K)
>> > [PSPermGen: 43413K->43413K(43520K)], 1.4063790 secs] [Times: user=5.22
>> sys=0.04, real=1.40 secs]
>> > Heap
>> >  PSYoungGen      total 116736K, used 58000K [0x00000000f5500000,
>> 0x0000000100000000, 0x0000000100000000)
>> >   eden space 58368K, 99% used
>> [0x00000000f5500000,0x00000000f8da41a0,0x00000000f8e00000)
>> >   from space 58368K, 0% used
>> [0x00000000f8e00000,0x00000000f8e00000,0x00000000fc700000)
>> >   to   space 58368K, 0% used
>> [0x00000000fc700000,0x00000000fc700000,0x0000000100000000)
>> >  ParOldGen       total 349184K, used 348909K [0x00000000e0000000,
>> 0x00000000f5500000, 0x00000000f5500000)
>> >   object space 349184K, 99% used
>> [0x00000000e0000000,0x00000000f54bb4b0,0x00000000f5500000)
>> >  PSPermGen       total 43520K, used 43413K [0x00000000d5a00000,
>> 0x00000000d8480000, 0x00000000e0000000)
>> >   object space 43520K, 99% used
>> [0x00000000d5a00000,0x00000000d84657a8,0x00000000d8480000)
>> >
>> > If I understand the GC log correctly, it seems like ParOldGen is full
>> and Full GC doesn't manage to free space from there. So maybe Tez has
>> created too many objects that can't be released. It could be a memory leak.
>> Or maybe this is just not big enough minimum heap for Tez in general? I
>> could probably fix the problem by changing configuration somehow to simply
>> have less containers and thus bigger heap size per container? Still,
>> changing to bigger nodes doesn't seem like a solution that would eventually
>> scale, so I would prefer to resolve this properly.
>> >
>> > Please, could you help me with how to troubleshoot & fix this issue?
>> >
>> > Cheers,
>> > Juho
>>
>

Reply via email to