Hi,
I'm the original victim :) just sent up TEZ-2237.
Sent as much logs as was practical up to this point; can supply on a
direct basis as much as required to nail the issue.
To give some context: these two failing DAG are part of a meta-DAG
comprised of 20 distinct DAG, all generated through scalding-cascading
(in cascading terms, there is one Cascade with 20 Jobs. When the same
cascade is run with the traditional "hadoop" fabric instead of the
experimental TEZ backend, this results in 460 separate MR jobs).
While the 20-legged meta-DAG monster hasn't ever completed under TEZ
yet, the progress made in the last few weeks is very encouraging,
hinting at very significant speedups compared to MR; we definitely want
to help getting to the point we can compare the outputs.
-- Cyrille
-------- Message transféré --------
*Reply-To: *[email protected] <mailto:[email protected]>
*Subject: **Re: BufferTooSmallException*
*From: *Hitesh Shah <[email protected] <mailto:[email protected]>>
*Date: *March 23, 2015 at 1:11:45 PM PDT
*To: *[email protected] <mailto:[email protected]>
Hi Chris,
I don’t believe this issue has been seen before. Could you file a jira
for this with the full application logs ( obtained via bin/yarn logs
-application ) and the configuration used?
thanks
— Hitesh
On Mar 23, 2015, at 1:01 PM, Chris K Wensel <[email protected]
<mailto:[email protected]>> wrote:
Hey all
We have a user running Scalding, on Cascading3, on Tez. This
exception tends to crop up for DAGs that hang indefinitely (this DAG
has 140 vertices).
It looks like the flag exception BufferTooSmallException isn’t being
caught and forcing the buffer to reset. Nor is the exception, when
passed up to the thread, causing the Node/DAG to fail.
Or is this a misinterpretation.
ckw
2015-03-23 11:32:40,445 INFO [TezChild]
writers.UnorderedPartitionedKVWriter: Moving to next buffer and
triggering spill
2015-03-23 11:32:40,496 INFO [UnorderedOutSpiller
[E61683F3D94D46C2998CDC61CD112750]]
writers.UnorderedPartitionedKVWriter: Finished spill 1
2015-03-23 11:32:40,496 INFO [UnorderedOutSpiller
[E61683F3D94D46C2998CDC61CD112750]]
writers.UnorderedPartitionedKVWriter: Spill# 1 complete.
2015-03-23 11:32:41,185 ERROR [TezChild]
hadoop.TupleSerialization$SerializationElementWriter: failed
serializing token: 181 with classname: scala.Tuple2
org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$BufferTooSmallException
at
org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$ByteArrayOutputStream.write(UnorderedPartitionedKVWriter.java:651)
at
org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$ByteArrayOutputStream.write(UnorderedPartitionedKVWriter.java:646)
at java.io.DataOutputStream.write(DataOutputStream.java:88)
at java.io.DataOutputStream.writeInt(DataOutputStream.java:198)
at
com.twitter.chill.hadoop.KryoSerializer.serialize(KryoSerializer.java:50)
at
cascading.tuple.hadoop.TupleSerialization$SerializationElementWriter.write(TupleSerialization.java:705)
at
cascading.tuple.io.TupleOutputStream.writeElement(TupleOutputStream.java:114)
at
cascading.tuple.io.TupleOutputStream.write(TupleOutputStream.java:89)
at
cascading.tuple.io.TupleOutputStream.writeTuple(TupleOutputStream.java:64)
at
cascading.tuple.hadoop.io.TupleSerializer.serialize(TupleSerializer.java:37)
at
cascading.tuple.hadoop.io.TupleSerializer.serialize(TupleSerializer.java:28)
at
org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:212)
at
org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:194)
at
cascading.flow.tez.stream.element.OldOutputCollector.collect(OldOutputCollector.java:51)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[…]
—
Chris K Wensel
[email protected] <mailto:[email protected]>
—
Chris K Wensel
[email protected] <mailto:[email protected]>