Forgot to add the following. Ideally auto-reduce implementation should have kicked-in and on need basis, it should have decreased the number of reducers needed. However, for the vertices of concern (scope-2037 & scope-2162), auto-reducer has been turned off in configuration by Pig and for the rest of the vertices it is turned on.
Pig folks would be able to help in terms of providing details on why auto-reduce parallelism is turned off in certain vertices. 2015-07-06 14:11:35,109 INFO [AsyncDispatcher event handler] impl.VertexImpl: Setting vertexManager to ShuffleVertexManager for vertex_1436152736518_0210_1_28* [scope-2037]* 2015-07-06 14:11:35,123 INFO [AsyncDispatcher event handler] vertexmanager.ShuffleVertexManager: Shuffle Vertex Manager: settings minFrac:0.25 maxFrac:0.75* auto:false* desiredTaskIput:104857600 minTasks:1 2015-07-06 14:11:35,123 INFO [AsyncDispatcher event handler] impl.VertexImpl: Creating 999 for vertex: vertex_1436152736518_0210_1_28 [scope-2037] .... 2015-07-06 14:11:35,245 INFO [AsyncDispatcher event handler] impl.VertexImpl: Setting vertexManager to ShuffleVertexManager for vertex_1436152736518_0210_1_39 *[scope-2162]* 2015-07-06 14:11:35,257 INFO [AsyncDispatcher event handler] vertexmanager.ShuffleVertexManager: Shuffle Vertex Manager: settings minFrac:0.25 maxFrac:0.75 *auto:false* desiredTaskIput:104857600 minTasks:1 2015-07-06 14:11:35,257 INFO [AsyncDispatcher event handler] impl.VertexImpl: Creating 999 for vertex: vertex_1436152736518_0210_1_39 [scope-2162] .... 2015-07-06 14:11:35,417 INFO [AsyncDispatcher event handler] impl.VertexImpl: Setting user vertex manager plugin: org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager on vertex:* scope-2185* 2015-07-06 14:11:35,419 INFO [AsyncDispatcher event handler] vertexmanager.ShuffleVertexManager: Shuffle Vertex Manager: settings minFrac:0.25 maxFrac:0.75 *auto:true* desiredTaskIput:104857600 minTasks:1 ... On Tue, Jul 7, 2015 at 12:47 PM, Rajesh Balamohan < [email protected]> wrote: > > Attaching the DAG and the swimlane for the job. > > scope-2052 which had to give the data to other vertices slowed down (~ > 150-180 seconds) due to multiple spills and NumberFormatExceptions in > data. You might want to try setting > "tez.task.scale.memory.additional-reservation.fraction.max='PARTITIONED_UNSORTED_OUTPUT:12,UNSORTED_INPUT:1,UNSORTED_OUTPUT:12,SORTED_OUTPUT:12,SORTED_MERGED_INPUT:1,PROCESSOR:1,OTHER:1' > " to allocate more memory for unordered outputs. > Following are the details for this scope. > - attempt_1436152736518_0210_1_31_000000_0, > PigLatin:dmwith1tapin.pig-0_scope-0, VertexName: scope-2052, > VertexParallelism: 1, > TaskAttemptID:attempt_1436152736518_0210_1_31_000000_0, > - numInputs=1, numOutputs=4, JVM.maxFree=734527488 > - 2015-07-06 14:11:40,047 INFO [TezChild] resources.MemoryDistributor: > Informing: INPUT, scope-546, org.apache.tez.mapreduce.input.MRInput: > requested=0, allocated=0 > - Small allocation of ~7 MB allocation to unordered output lead to > multiple spills. > - 2015-07-06 14:11:40,047 INFO [TezChild] resources.MemoryDistributor: > Informing: OUTPUTPUT_RECORDS, scope-2117, > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput: > requested=268435456, allocated=222303401 > - 2015-07-06 14:11:40,047 INFO [TezChild] resources.MemoryDistributor: > Informing: OUTPUT, scope-2251, > org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput: > requested=268435456, allocated=222303401 > - 2015-07-06 14:11:40,048 INFO [TezChild] resources.MemoryDistributor: > Informing: OUTPUT, scope-2063, > org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput: > requested=104857600, allocated=7236438 > - 2015-07-06 14:11:40,048 INFO [TezChild] resources.MemoryDistributor: > Informing: OUTPUT, scope-2068, > org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput: > requested=104857600, allocated=7236438 > - Too many number of records had issues in NumberFormatException leading > to large amount of logs. This dragged the runtime of this task to around > e.g "mapReduceLayer.PigHadoopLogger: > java.lang.Class(FIELD_DISCARDED_TYPE_CONVERSION_FAILED): Unable to > interpret value in field being converted to long, caught > NumberFormatException <empty String> field discarded" > > > - scope-2037 and scope-2162 had set the vertex parallelism to "999" > affecting subsequent execution. > - VertexName: scope-2037, VertexParallelism: 999 vertex: > vertex_1436152736518_0210_1_28 finished in *410* seconds. Tasks > themselves were small, but due to large number of tasks that had to be > executed in small containers (pretty much used the same container to > execute this) it took time. > - VertexName: scope-2162, VertexParallelism: 999 vertex: > vertex_1436152736518_0210_1_39 finished in *697* seconds. Similar > observation as previous vertex. > *Above 2 vertices have caused the entire job to slow down.* > > "999" is set as the reducer parallelism at compile time by Pig. This is > not for the input. I am not sure how pig sets the parallelism at compile > time. You can possibly try setting "pig.exec.reducers.max=50" in your case > and give it a try. Pig folks would be in a better position to explain that. > > > On Tue, Jul 7, 2015 at 11:22 AM, Sachin Sabbarwal < > [email protected]> wrote: > >> >> logs.gz >> <https://drive.google.com/file/d/0B-RFcYxUIHzzUVJpRzVDZXB5TUk/view?usp=drive_web> >> Hi Rajesh >> >> PFA the gziped logs. >> FYI It's a single file, when you'll gunzip it, it'll be around 1.5gb in >> size. >> One more thing which you might find useful: >> >> In the dmOutputTez file i could see following line, which suggests that >> TEZ created a total of 7660 tasks. This is surprising as my data is only >> few mbs(10-15 mb max). How is this number of tasks decided? is there any >> property to tune it? >> >> 2015-07-07 05:37:02,647 [Timer-0] INFO >> org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG >> Status: status=RUNNING, progress=TotalTasks: 7660 Succeeded: 0 Running: 0 >> Failed: 0 Killed: 0, diagnostics= >> >> Thanks >> >> On Mon, Jul 6, 2015 at 8:34 PM, Rajesh Balamohan < >> [email protected]> wrote: >> >>> yarn logs -applicationId application_1436152736518_0210 >>> >>> You can possibly send the output to a log file, gzip it and post it. >>> >>> ~Rajesh.B >>> >>> On Mon, Jul 6, 2015 at 8:12 PM, Sachin Sabbarwal < >>> [email protected]> wrote: >>> >>>> Hi >>>> Thanks for reply. My tez-site.xml contains only following: >>>> >>>> <configuration> >>>> <property> >>>> <name>tez.lib.uris</name> >>>> <value>${fs.defaultFS}/apps/tez-0.5/tez-0.5.3.tar.gz, >>>> ${fs.defaultFS}/apps/tez-0.5/*,${fs.defaultFS}/apps/tez-0.5/lib/*</value> >>>> </property> >>>> </configuration> >>>> >>>> PFA the application logs. Here is the version information: >>>> 1. Hadoop version: Hadoop 2.5.0-cdh5.3.1 >>>> 2. Pig: Apache Pig version 0.14.0 (r1640057) >>>> 3. Tez: 0.5.3 >>>> >>>> Lemme know if anything else is needed. >>>> >>>> Thanks in advance >>>> >>>> On Mon, Jul 6, 2015 at 7:07 PM, Rajesh Balamohan < >>>> [email protected]> wrote: >>>> >>>>> Can you post the application logs, tez-site.xml and also the version >>>>> details? >>>>> >>>>> ~Rajesh.B >>>>> >>>>> On Mon, Jul 6, 2015 at 6:38 PM, Sachin Sabbarwal < >>>>> [email protected]> wrote: >>>>> >>>>>> >>>>>> ---------- Forwarded message ---------- >>>>>> From: Sachin Sabbarwal <[email protected]> >>>>>> Date: Mon, Jul 6, 2015 at 5:34 PM >>>>>> Subject: Same pig script running slower with Tez as compared with run >>>>>> in Mapred mode >>>>>> To: [email protected] >>>>>> >>>>>> >>>>>> Hello Guys >>>>>> Trying Apache Tez. >>>>>> I've setup to use pig in TEZ mode. >>>>>> I'm running a pig script against i) no data and ii) with some data. >>>>>> In case i) when i run with pig using TEZ mode my pig scripts >>>>>> completes run in ~40secs. Whereas when i run case i) with mapred it takes >>>>>> around 7-8 mins. >>>>>> in case ii) when run with pig using TEZ, same pig script takes around >>>>>> 14-15 mins but with mapred it takes around 10 mins. >>>>>> When i'm running same pig script with production data(which is much >>>>>> more than the data i used here to run case i) and (ii) ) the job takes >>>>>> hours to complete. >>>>>> Hence I'm trying tez to run my pig job in a faster mode. I'm not >>>>>> really sure what i might be missing here. Please help, ask for any >>>>>> further >>>>>> info if required. >>>>>> >>>>>> >>>>>> Thanks >>>>>> -- >>>>>> Sachin Sabbarwal >>>>>> Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265 >>>>>> Facebook: facebook.com/sachinsabbarwal >>>>>> Quora: http://www.quora.com/Sachin-Sabbarwal >>>>>> Blog: http://sachinsabbarwal.tumblr.com/ >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sachin Sabbarwal >>>>>> Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265 >>>>>> Facebook: facebook.com/sachinsabbarwal >>>>>> Quora: http://www.quora.com/Sachin-Sabbarwal >>>>>> Blog: http://sachinsabbarwal.tumblr.com/ >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~Rajesh.B >>>>> >>>> >>>> >>>> >>>> -- >>>> Sachin Sabbarwal >>>> Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265 >>>> Facebook: facebook.com/sachinsabbarwal >>>> Quora: http://www.quora.com/Sachin-Sabbarwal >>>> Blog: http://sachinsabbarwal.tumblr.com/ >>>> >>> >>> >>> >>> -- >>> ~Rajesh.B >>> >> >> >> >> -- >> Sachin Sabbarwal >> Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265 >> Facebook: facebook.com/sachinsabbarwal >> Quora: http://www.quora.com/Sachin-Sabbarwal >> Blog: http://sachinsabbarwal.tumblr.com/ >> > > > > -- > ~Rajesh.B > -- ~Rajesh.B
