Hitesh, thanks for your response!

I got rid of the Full GC problem thanks to Jeff Zhang's hint. However
performance with tez is still not perfect – mr engine is faster for this
particular job. What are the perf analyzers you're referring to? I'd like
to try those. I have Tez UI available indeed. What hive options are
important to consider in combination with tez? Is the container size the
only option by Tez itself to tune?

Cheers,
Juho

On Fri, Oct 23, 2015 at 6:41 PM, Hitesh Shah <[email protected]> wrote:

> Hello Juho
>
> As you are probably aware, each hive query will largely have different
> memory requirements depending on what kind of plan it ends up executing.
> For the most part, a common container size and general settings work well
> for most queries.
> In this case, this might need additional tuning to either fix the hive
> query plan or correctly size the Tez container just for this query as well
> as tuning any other Hive knobs that may be making the wrong assumptions
> about data stats or available memory to play with causing this query to run
> very slowly.
>
> As a first step, it would be good if you can help provide the explain plan
> for the query, hive-site/tez-site for configs being used and the yarn
> application logs for the completed query. If you have the Tez UI available,
> you can click the “Download data” on the dag details page too which can be
> used to run against the various perf analyzers available in Tez to see what
> the issue is.
>
> thanks
> — Hitesh
>
>
> On Oct 23, 2015, at 1:08 AM, Juho Autio <[email protected]> wrote:
>
> > Hi,
> >
> > I'm running a Hive script with tez-0.7.0. The progress is real slow and
> in the container logs I'm seeing constant Full GC lines, so that there
> doesn't seem to be no time for the JVM to actually execute anything between
> the GC pauses.
> >
> > When running the same Hive script with mr execution engine, the job goes
> through normally.
> >
> > So there's something specific to Tez's memory usage that causes the Full
> GC issue.
> >
> > Also with similar clusters & configuration other Hive jobs have gone
> through with Tez just fine. This issue happens when I just add a little
> more data to be processed by the script. With a smaller workload it goes
> through also with Tez engine with the expected execution time.
> >
> > For example an extract from one of the container logs:
> >
> >
> application_1445328511212_0001/container_1445328511212_0001_01_000292/stdout.gz
> >
> > 791.208: [Full GC
> > [PSYoungGen: 58368K->56830K(116736K)]
> > [ParOldGen: 348914K->348909K(349184K)]
> > 407282K->405740K(465920K)
> > [PSPermGen: 43413K->43413K(43520K)], 1.4063790 secs] [Times: user=5.22
> sys=0.04, real=1.40 secs]
> > Heap
> >  PSYoungGen      total 116736K, used 58000K [0x00000000f5500000,
> 0x0000000100000000, 0x0000000100000000)
> >   eden space 58368K, 99% used
> [0x00000000f5500000,0x00000000f8da41a0,0x00000000f8e00000)
> >   from space 58368K, 0% used
> [0x00000000f8e00000,0x00000000f8e00000,0x00000000fc700000)
> >   to   space 58368K, 0% used
> [0x00000000fc700000,0x00000000fc700000,0x0000000100000000)
> >  ParOldGen       total 349184K, used 348909K [0x00000000e0000000,
> 0x00000000f5500000, 0x00000000f5500000)
> >   object space 349184K, 99% used
> [0x00000000e0000000,0x00000000f54bb4b0,0x00000000f5500000)
> >  PSPermGen       total 43520K, used 43413K [0x00000000d5a00000,
> 0x00000000d8480000, 0x00000000e0000000)
> >   object space 43520K, 99% used
> [0x00000000d5a00000,0x00000000d84657a8,0x00000000d8480000)
> >
> > If I understand the GC log correctly, it seems like ParOldGen is full
> and Full GC doesn't manage to free space from there. So maybe Tez has
> created too many objects that can't be released. It could be a memory leak.
> Or maybe this is just not big enough minimum heap for Tez in general? I
> could probably fix the problem by changing configuration somehow to simply
> have less containers and thus bigger heap size per container? Still,
> changing to bigger nodes doesn't seem like a solution that would eventually
> scale, so I would prefer to resolve this properly.
> >
> > Please, could you help me with how to troubleshoot & fix this issue?
> >
> > Cheers,
> > Juho
>

Reply via email to