Hi Sachin, That was just a temporary workaround to ensure that was the issue. Ideally user does not need to set this parameter. Real issue is why auto-reducer is set to false in certain vertices in pig-tez. Will wait for Pig folks to chime in.
For doc/tutorial related, you can start off with the following - http://tez.apache.org/talks.html - couple of youtube videos are available from hadoop summits and meetups. - http://hortonworks.com/blog/apache-tez-a-new-chapter-in-hadoop-data-processing/ (this is pretty old) - Pig on tez http://www.slideshare.net/Hadoop_Summit/pig-on-tez-low-latency-data-processing-with-big-data ~Rajesh.B On Tue, Jul 7, 2015 at 1:54 PM, Sachin Sabbarwal <[email protected]> wrote: > Hi Rajesh > Thanks for your response. *This seems to be working for me.* > By setting pig.exec.reducers.max to 10 i am able to complete my run in > under 4 mins.(Initially it was running in 14-15 mins). > I'm new to pig/tez/hadoop world. Do you write any blogs about > pig/tez/hadoop etc? Can you suggest any tutorials/links to read about tez? > I need to understand concepts like scope, DAG, parallelism etc. I just > have a very basic understanding of tez. If i understand all these concepts > i'll be able to tune my job. > > Thanks > > > On Tue, Jul 7, 2015 at 1:23 PM, Rajesh Balamohan < > [email protected]> wrote: > >> Forgot to add the following. Ideally auto-reduce implementation should >> have kicked-in and on need basis, it should have decreased the number of >> reducers needed. However, for the vertices of concern (scope-2037 & >> scope-2162), auto-reducer has been turned off in configuration by Pig and >> for the rest of the vertices it is turned on. >> >> Pig folks would be able to help in terms of providing details on why >> auto-reduce parallelism is turned off in certain vertices. >> >> 2015-07-06 14:11:35,109 INFO [AsyncDispatcher event handler] >> impl.VertexImpl: Setting vertexManager to ShuffleVertexManager for >> vertex_1436152736518_0210_1_28* [scope-2037]* >> 2015-07-06 14:11:35,123 INFO [AsyncDispatcher event handler] >> vertexmanager.ShuffleVertexManager: Shuffle Vertex Manager: settings >> minFrac:0.25 maxFrac:0.75* auto:false* desiredTaskIput:104857600 >> minTasks:1 >> 2015-07-06 14:11:35,123 INFO [AsyncDispatcher event handler] >> impl.VertexImpl: Creating 999 for vertex: vertex_1436152736518_0210_1_28 >> [scope-2037] >> .... >> >> 2015-07-06 14:11:35,245 INFO [AsyncDispatcher event handler] >> impl.VertexImpl: Setting vertexManager to ShuffleVertexManager for >> vertex_1436152736518_0210_1_39 *[scope-2162]* >> 2015-07-06 14:11:35,257 INFO [AsyncDispatcher event handler] >> vertexmanager.ShuffleVertexManager: Shuffle Vertex Manager: settings >> minFrac:0.25 maxFrac:0.75 *auto:false* desiredTaskIput:104857600 >> minTasks:1 >> 2015-07-06 14:11:35,257 INFO [AsyncDispatcher event handler] >> impl.VertexImpl: Creating 999 for vertex: vertex_1436152736518_0210_1_39 >> [scope-2162] >> .... >> >> 2015-07-06 14:11:35,417 INFO [AsyncDispatcher event handler] >> impl.VertexImpl: Setting user vertex manager plugin: >> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager on vertex:* >> scope-2185* >> 2015-07-06 14:11:35,419 INFO [AsyncDispatcher event handler] >> vertexmanager.ShuffleVertexManager: Shuffle Vertex Manager: settings >> minFrac:0.25 maxFrac:0.75 *auto:true* desiredTaskIput:104857600 >> minTasks:1 >> ... >> >> >> On Tue, Jul 7, 2015 at 12:47 PM, Rajesh Balamohan < >> [email protected]> wrote: >> >>> >>> Attaching the DAG and the swimlane for the job. >>> >>> scope-2052 which had to give the data to other vertices slowed down (~ >>> 150-180 seconds) due to multiple spills and NumberFormatExceptions in >>> data. You might want to try setting >>> "tez.task.scale.memory.additional-reservation.fraction.max='PARTITIONED_UNSORTED_OUTPUT:12,UNSORTED_INPUT:1,UNSORTED_OUTPUT:12,SORTED_OUTPUT:12,SORTED_MERGED_INPUT:1,PROCESSOR:1,OTHER:1' >>> " to allocate more memory for unordered outputs. >>> Following are the details for this scope. >>> - attempt_1436152736518_0210_1_31_000000_0, >>> PigLatin:dmwith1tapin.pig-0_scope-0, VertexName: scope-2052, >>> VertexParallelism: 1, >>> TaskAttemptID:attempt_1436152736518_0210_1_31_000000_0, >>> - numInputs=1, numOutputs=4, JVM.maxFree=734527488 >>> - 2015-07-06 14:11:40,047 INFO [TezChild] resources.MemoryDistributor: >>> Informing: INPUT, scope-546, org.apache.tez.mapreduce.input.MRInput: >>> requested=0, allocated=0 >>> - Small allocation of ~7 MB allocation to unordered output lead to >>> multiple spills. >>> - 2015-07-06 14:11:40,047 INFO [TezChild] resources.MemoryDistributor: >>> Informing: OUTPUTPUT_RECORDS, scope-2117, >>> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput: >>> requested=268435456, allocated=222303401 >>> - 2015-07-06 14:11:40,047 INFO [TezChild] resources.MemoryDistributor: >>> Informing: OUTPUT, scope-2251, >>> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput: >>> requested=268435456, allocated=222303401 >>> - 2015-07-06 14:11:40,048 INFO [TezChild] resources.MemoryDistributor: >>> Informing: OUTPUT, scope-2063, >>> org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput: >>> requested=104857600, allocated=7236438 >>> - 2015-07-06 14:11:40,048 INFO [TezChild] resources.MemoryDistributor: >>> Informing: OUTPUT, scope-2068, >>> org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput: >>> requested=104857600, allocated=7236438 >>> - Too many number of records had issues in NumberFormatException >>> leading to large amount of logs. This dragged the runtime of this task to >>> around >>> e.g "mapReduceLayer.PigHadoopLogger: >>> java.lang.Class(FIELD_DISCARDED_TYPE_CONVERSION_FAILED): Unable to >>> interpret value in field being converted to long, caught >>> NumberFormatException <empty String> field discarded" >>> >>> >>> - scope-2037 and scope-2162 had set the vertex parallelism to "999" >>> affecting subsequent execution. >>> - VertexName: scope-2037, VertexParallelism: 999 vertex: >>> vertex_1436152736518_0210_1_28 finished in *410* seconds. Tasks >>> themselves were small, but due to large number of tasks that had to be >>> executed in small containers (pretty much used the same container to >>> execute this) it took time. >>> - VertexName: scope-2162, VertexParallelism: 999 vertex: >>> vertex_1436152736518_0210_1_39 finished in *697* seconds. Similar >>> observation as previous vertex. >>> *Above 2 vertices have caused the entire job to slow down.* >>> >>> "999" is set as the reducer parallelism at compile time by Pig. This is >>> not for the input. I am not sure how pig sets the parallelism at compile >>> time. You can possibly try setting "pig.exec.reducers.max=50" in your case >>> and give it a try. Pig folks would be in a better position to explain that. >>> >>> >>> On Tue, Jul 7, 2015 at 11:22 AM, Sachin Sabbarwal < >>> [email protected]> wrote: >>> >>>> >>>> logs.gz >>>> <https://drive.google.com/file/d/0B-RFcYxUIHzzUVJpRzVDZXB5TUk/view?usp=drive_web> >>>> Hi Rajesh >>>> >>>> PFA the gziped logs. >>>> FYI It's a single file, when you'll gunzip it, it'll be around 1.5gb in >>>> size. >>>> One more thing which you might find useful: >>>> >>>> In the dmOutputTez file i could see following line, which suggests that >>>> TEZ created a total of 7660 tasks. This is surprising as my data is only >>>> few mbs(10-15 mb max). How is this number of tasks decided? is there any >>>> property to tune it? >>>> >>>> 2015-07-07 05:37:02,647 [Timer-0] INFO >>>> org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG >>>> Status: status=RUNNING, progress=TotalTasks: 7660 Succeeded: 0 Running: >>>> 0 Failed: 0 Killed: 0, diagnostics= >>>> >>>> Thanks >>>> >>>> On Mon, Jul 6, 2015 at 8:34 PM, Rajesh Balamohan < >>>> [email protected]> wrote: >>>> >>>>> yarn logs -applicationId application_1436152736518_0210 >>>>> >>>>> You can possibly send the output to a log file, gzip it and post it. >>>>> >>>>> ~Rajesh.B >>>>> >>>>> On Mon, Jul 6, 2015 at 8:12 PM, Sachin Sabbarwal < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi >>>>>> Thanks for reply. My tez-site.xml contains only following: >>>>>> >>>>>> <configuration> >>>>>> <property> >>>>>> <name>tez.lib.uris</name> >>>>>> <value>${fs.defaultFS}/apps/tez-0.5/tez-0.5.3.tar.gz, >>>>>> ${fs.defaultFS}/apps/tez-0.5/*,${fs.defaultFS}/apps/tez-0.5/lib/*</value> >>>>>> </property> >>>>>> </configuration> >>>>>> >>>>>> PFA the application logs. Here is the version information: >>>>>> 1. Hadoop version: Hadoop 2.5.0-cdh5.3.1 >>>>>> 2. Pig: Apache Pig version 0.14.0 (r1640057) >>>>>> 3. Tez: 0.5.3 >>>>>> >>>>>> Lemme know if anything else is needed. >>>>>> >>>>>> Thanks in advance >>>>>> >>>>>> On Mon, Jul 6, 2015 at 7:07 PM, Rajesh Balamohan < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Can you post the application logs, tez-site.xml and also the version >>>>>>> details? >>>>>>> >>>>>>> ~Rajesh.B >>>>>>> >>>>>>> On Mon, Jul 6, 2015 at 6:38 PM, Sachin Sabbarwal < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> >>>>>>>> ---------- Forwarded message ---------- >>>>>>>> From: Sachin Sabbarwal <[email protected]> >>>>>>>> Date: Mon, Jul 6, 2015 at 5:34 PM >>>>>>>> Subject: Same pig script running slower with Tez as compared with >>>>>>>> run in Mapred mode >>>>>>>> To: [email protected] >>>>>>>> >>>>>>>> >>>>>>>> Hello Guys >>>>>>>> Trying Apache Tez. >>>>>>>> I've setup to use pig in TEZ mode. >>>>>>>> I'm running a pig script against i) no data and ii) with some data. >>>>>>>> In case i) when i run with pig using TEZ mode my pig scripts >>>>>>>> completes run in ~40secs. Whereas when i run case i) with mapred it >>>>>>>> takes >>>>>>>> around 7-8 mins. >>>>>>>> in case ii) when run with pig using TEZ, same pig script takes >>>>>>>> around 14-15 mins but with mapred it takes around 10 mins. >>>>>>>> When i'm running same pig script with production data(which is much >>>>>>>> more than the data i used here to run case i) and (ii) ) the job takes >>>>>>>> hours to complete. >>>>>>>> Hence I'm trying tez to run my pig job in a faster mode. I'm not >>>>>>>> really sure what i might be missing here. Please help, ask for any >>>>>>>> further >>>>>>>> info if required. >>>>>>>> >>>>>>>> >>>>>>>> Thanks >>>>>>>> -- >>>>>>>> Sachin Sabbarwal >>>>>>>> Linkedin: >>>>>>>> https://www.linkedin.com/profile?viewProfile=&key=95777265 >>>>>>>> Facebook: facebook.com/sachinsabbarwal >>>>>>>> Quora: http://www.quora.com/Sachin-Sabbarwal >>>>>>>> Blog: http://sachinsabbarwal.tumblr.com/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Sachin Sabbarwal >>>>>>>> Linkedin: >>>>>>>> https://www.linkedin.com/profile?viewProfile=&key=95777265 >>>>>>>> Facebook: facebook.com/sachinsabbarwal >>>>>>>> Quora: http://www.quora.com/Sachin-Sabbarwal >>>>>>>> Blog: http://sachinsabbarwal.tumblr.com/ >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> ~Rajesh.B >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sachin Sabbarwal >>>>>> Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265 >>>>>> Facebook: facebook.com/sachinsabbarwal >>>>>> Quora: http://www.quora.com/Sachin-Sabbarwal >>>>>> Blog: http://sachinsabbarwal.tumblr.com/ >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> ~Rajesh.B >>>>> >>>> >>>> >>>> >>>> -- >>>> Sachin Sabbarwal >>>> Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265 >>>> Facebook: facebook.com/sachinsabbarwal >>>> Quora: http://www.quora.com/Sachin-Sabbarwal >>>> Blog: http://sachinsabbarwal.tumblr.com/ >>>> >>> >>> >>> >>> -- >>> ~Rajesh.B >>> >> >> >> >> -- >> ~Rajesh.B >> > > > > -- > Sachin Sabbarwal > Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265 > Facebook: facebook.com/sachinsabbarwal > Quora: http://www.quora.com/Sachin-Sabbarwal > Blog: http://sachinsabbarwal.tumblr.com/ > -- ~Rajesh.B
