Hi Sachin,

That was just a temporary workaround to ensure that was the issue.  Ideally
user does not need to set this parameter.  Real issue is why auto-reducer
is set to false in certain vertices in pig-tez. Will wait for Pig folks to
chime in.

For doc/tutorial related, you can start off with the following
- http://tez.apache.org/talks.html
- couple of youtube videos are available from hadoop summits and meetups.
-
http://hortonworks.com/blog/apache-tez-a-new-chapter-in-hadoop-data-processing/
(this is pretty old)
- Pig on tez
http://www.slideshare.net/Hadoop_Summit/pig-on-tez-low-latency-data-processing-with-big-data

~Rajesh.B

On Tue, Jul 7, 2015 at 1:54 PM, Sachin Sabbarwal <[email protected]>
wrote:

> Hi Rajesh
> Thanks for your response. *This seems to be working for me.*
> By setting pig.exec.reducers.max to 10 i am able to complete my run in
> under 4 mins.(Initially it was running in 14-15 mins).
> I'm new to pig/tez/hadoop world. Do you write any blogs about
> pig/tez/hadoop etc? Can you suggest any tutorials/links to read about tez?
> I need to understand concepts like scope, DAG, parallelism etc. I just
> have a very basic understanding of tez. If i understand all these concepts
> i'll be able to tune my job.
>
> Thanks
>
>
> On Tue, Jul 7, 2015 at 1:23 PM, Rajesh Balamohan <
> [email protected]> wrote:
>
>> Forgot to add the following.  Ideally auto-reduce implementation should
>> have kicked-in and on need basis, it should have decreased the number of
>> reducers needed.  However, for the vertices of concern (scope-2037 &
>> scope-2162), auto-reducer has been turned off in configuration by Pig and
>> for the rest of the vertices it is turned on.
>>
>> Pig folks would be able to help in terms of providing details on why
>> auto-reduce parallelism is turned off in certain vertices.
>>
>> 2015-07-06 14:11:35,109 INFO [AsyncDispatcher event handler]
>> impl.VertexImpl: Setting vertexManager to ShuffleVertexManager for
>> vertex_1436152736518_0210_1_28* [scope-2037]*
>> 2015-07-06 14:11:35,123 INFO [AsyncDispatcher event handler]
>> vertexmanager.ShuffleVertexManager: Shuffle Vertex Manager: settings
>> minFrac:0.25 maxFrac:0.75* auto:false* desiredTaskIput:104857600
>> minTasks:1
>> 2015-07-06 14:11:35,123 INFO [AsyncDispatcher event handler]
>> impl.VertexImpl: Creating 999 for vertex: vertex_1436152736518_0210_1_28
>> [scope-2037]
>> ....
>>
>> 2015-07-06 14:11:35,245 INFO [AsyncDispatcher event handler]
>> impl.VertexImpl: Setting vertexManager to ShuffleVertexManager for
>> vertex_1436152736518_0210_1_39 *[scope-2162]*
>> 2015-07-06 14:11:35,257 INFO [AsyncDispatcher event handler]
>> vertexmanager.ShuffleVertexManager: Shuffle Vertex Manager: settings
>> minFrac:0.25 maxFrac:0.75 *auto:false* desiredTaskIput:104857600
>> minTasks:1
>> 2015-07-06 14:11:35,257 INFO [AsyncDispatcher event handler]
>> impl.VertexImpl: Creating 999 for vertex: vertex_1436152736518_0210_1_39
>> [scope-2162]
>> ....
>>
>> 2015-07-06 14:11:35,417 INFO [AsyncDispatcher event handler]
>> impl.VertexImpl: Setting user vertex manager plugin:
>> org.apache.tez.dag.library.vertexmanager.ShuffleVertexManager on vertex:*
>> scope-2185*
>> 2015-07-06 14:11:35,419 INFO [AsyncDispatcher event handler]
>> vertexmanager.ShuffleVertexManager: Shuffle Vertex Manager: settings
>> minFrac:0.25 maxFrac:0.75 *auto:true* desiredTaskIput:104857600
>> minTasks:1
>> ...
>>
>>
>> On Tue, Jul 7, 2015 at 12:47 PM, Rajesh Balamohan <
>> [email protected]> wrote:
>>
>>>
>>> Attaching the DAG and the swimlane for the job.
>>>
>>> scope-2052 which had to give the data to other vertices slowed down (~
>>> 150-180 seconds) due to multiple spills and NumberFormatExceptions in
>>> data.  You might want to try setting
>>> "tez.task.scale.memory.additional-reservation.fraction.max='PARTITIONED_UNSORTED_OUTPUT:12,UNSORTED_INPUT:1,UNSORTED_OUTPUT:12,SORTED_OUTPUT:12,SORTED_MERGED_INPUT:1,PROCESSOR:1,OTHER:1'
>>> " to allocate more memory for unordered outputs.
>>> Following are the details for this scope.
>>> - attempt_1436152736518_0210_1_31_000000_0,
>>> PigLatin:dmwith1tapin.pig-0_scope-0, VertexName: scope-2052,
>>> VertexParallelism: 1,
>>> TaskAttemptID:attempt_1436152736518_0210_1_31_000000_0,
>>> - numInputs=1, numOutputs=4, JVM.maxFree=734527488
>>> - 2015-07-06 14:11:40,047 INFO [TezChild] resources.MemoryDistributor:
>>> Informing: INPUT, scope-546, org.apache.tez.mapreduce.input.MRInput:
>>> requested=0, allocated=0
>>> - Small allocation of ~7 MB allocation to unordered output lead to
>>> multiple spills.
>>> - 2015-07-06 14:11:40,047 INFO [TezChild] resources.MemoryDistributor:
>>> Informing: OUTPUTPUT_RECORDS, scope-2117,
>>> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:
>>> requested=268435456, allocated=222303401
>>> - 2015-07-06 14:11:40,047 INFO [TezChild] resources.MemoryDistributor:
>>> Informing: OUTPUT, scope-2251,
>>> org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:
>>> requested=268435456, allocated=222303401
>>> - 2015-07-06 14:11:40,048 INFO [TezChild] resources.MemoryDistributor:
>>> Informing: OUTPUT, scope-2063,
>>> org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput:
>>> requested=104857600, allocated=7236438
>>> - 2015-07-06 14:11:40,048 INFO [TezChild] resources.MemoryDistributor:
>>> Informing: OUTPUT, scope-2068,
>>> org.apache.tez.runtime.library.output.UnorderedPartitionedKVOutput:
>>> requested=104857600, allocated=7236438
>>>  - Too many number of records had issues in NumberFormatException
>>> leading to large amount of logs. This dragged the runtime of this task to
>>> around
>>>    e.g "mapReduceLayer.PigHadoopLogger:
>>> java.lang.Class(FIELD_DISCARDED_TYPE_CONVERSION_FAILED): Unable to
>>> interpret value  in field being converted to long, caught
>>> NumberFormatException <empty String> field discarded"
>>>
>>>
>>> - scope-2037 and scope-2162 had set the vertex parallelism to "999"
>>> affecting subsequent execution.
>>> - VertexName: scope-2037, VertexParallelism: 999 vertex:
>>> vertex_1436152736518_0210_1_28 finished in *410* seconds.  Tasks
>>> themselves were small, but due to large number of tasks that had to be
>>> executed in small containers (pretty much used the same container to
>>> execute this) it took time.
>>> - VertexName: scope-2162, VertexParallelism: 999 vertex:
>>> vertex_1436152736518_0210_1_39 finished in *697* seconds. Similar
>>> observation as previous vertex.
>>> *Above 2 vertices have caused the entire job to slow down.*
>>>
>>> "999" is set as the reducer parallelism at compile time by Pig. This is
>>> not for the input. I am not sure how pig sets the parallelism at compile
>>> time.  You can possibly try setting "pig.exec.reducers.max=50" in your case
>>> and give it a try. Pig folks would be in a better position to explain that.
>>>
>>>
>>> On Tue, Jul 7, 2015 at 11:22 AM, Sachin Sabbarwal <
>>> [email protected]> wrote:
>>>
>>>> ​
>>>>  logs.gz
>>>> <https://drive.google.com/file/d/0B-RFcYxUIHzzUVJpRzVDZXB5TUk/view?usp=drive_web>
>>>> ​Hi Rajesh
>>>>
>>>> PFA the gziped logs.
>>>> FYI It's a single file, when you'll gunzip it, it'll be around 1.5gb in
>>>> size.
>>>> One more thing which you might find useful:
>>>>
>>>> In the dmOutputTez file i could see following line, which suggests that
>>>> TEZ created a total of 7660 tasks. This is surprising as my data is only
>>>> few mbs(10-15 mb max). How is this number of tasks decided? is there any
>>>> property to tune it?
>>>>
>>>> 2015-07-07 05:37:02,647 [Timer-0] INFO
>>>>  org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG
>>>> Status: status=RUNNING, progress=TotalTasks: 7660 Succeeded: 0 Running:
>>>> 0 Failed: 0 Killed: 0, diagnostics=
>>>>
>>>> Thanks
>>>>
>>>> On Mon, Jul 6, 2015 at 8:34 PM, Rajesh Balamohan <
>>>> [email protected]> wrote:
>>>>
>>>>> yarn logs -applicationId application_1436152736518_0210
>>>>>
>>>>> You can possibly send the output to a log file, gzip it and post it.
>>>>>
>>>>> ~Rajesh.B
>>>>>
>>>>> On Mon, Jul 6, 2015 at 8:12 PM, Sachin Sabbarwal <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi
>>>>>> Thanks for reply. My tez-site.xml contains only following:
>>>>>>
>>>>>> <configuration>
>>>>>> <property>
>>>>>>   <name>tez.lib.uris</name>
>>>>>>   <value>${fs.defaultFS}/apps/tez-0.5/tez-0.5.3.tar.gz,
>>>>>> ${fs.defaultFS}/apps/tez-0.5/*,${fs.defaultFS}/apps/tez-0.5/lib/*</value>
>>>>>> </property>
>>>>>> </configuration>
>>>>>>
>>>>>> PFA the application logs. Here is the version information:
>>>>>> 1. Hadoop version: Hadoop 2.5.0-cdh5.3.1
>>>>>> 2. Pig: Apache Pig version 0.14.0 (r1640057)
>>>>>> 3. Tez: 0.5.3
>>>>>>
>>>>>> Lemme know if anything else is needed.
>>>>>>
>>>>>> Thanks in advance
>>>>>>
>>>>>> On Mon, Jul 6, 2015 at 7:07 PM, Rajesh Balamohan <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Can you post the application logs, tez-site.xml and also the version
>>>>>>> details?
>>>>>>>
>>>>>>> ~Rajesh.B
>>>>>>>
>>>>>>> On Mon, Jul 6, 2015 at 6:38 PM, Sachin Sabbarwal <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> ---------- Forwarded message ----------
>>>>>>>> From: Sachin Sabbarwal <[email protected]>
>>>>>>>> Date: Mon, Jul 6, 2015 at 5:34 PM
>>>>>>>> Subject: Same pig script running slower with Tez as compared with
>>>>>>>> run in Mapred mode
>>>>>>>> To: [email protected]
>>>>>>>>
>>>>>>>>
>>>>>>>> Hello Guys
>>>>>>>> Trying Apache Tez.
>>>>>>>> I've setup to use pig in TEZ mode.
>>>>>>>> I'm running a pig script against i) no data and ii) with some data.
>>>>>>>> In case i) when i run with pig using TEZ mode my pig scripts
>>>>>>>> completes run in ~40secs. Whereas when i run case i) with mapred it 
>>>>>>>> takes
>>>>>>>> around 7-8 mins.
>>>>>>>> in case ii) when run with pig using TEZ, same pig script takes
>>>>>>>> around 14-15 mins but with mapred it takes around 10 mins.
>>>>>>>> When i'm running same pig script with production data(which is much
>>>>>>>> more than the data i used here to run case i) and (ii) ) the job takes
>>>>>>>> hours to complete.
>>>>>>>> Hence I'm trying tez to run my pig job in a faster mode. I'm not
>>>>>>>> really sure what i might be missing here. Please help, ask for any 
>>>>>>>> further
>>>>>>>> info if required.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> --
>>>>>>>> Sachin Sabbarwal
>>>>>>>> Linkedin:
>>>>>>>> https://www.linkedin.com/profile?viewProfile=&key=95777265
>>>>>>>> Facebook: facebook.com/sachinsabbarwal
>>>>>>>> Quora: http://www.quora.com/Sachin-Sabbarwal
>>>>>>>> Blog: http://sachinsabbarwal.tumblr.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sachin Sabbarwal
>>>>>>>> Linkedin:
>>>>>>>> https://www.linkedin.com/profile?viewProfile=&key=95777265
>>>>>>>> Facebook: facebook.com/sachinsabbarwal
>>>>>>>> Quora: http://www.quora.com/Sachin-Sabbarwal
>>>>>>>> Blog: http://sachinsabbarwal.tumblr.com/
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ~Rajesh.B
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sachin Sabbarwal
>>>>>> Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265
>>>>>> Facebook: facebook.com/sachinsabbarwal
>>>>>> Quora: http://www.quora.com/Sachin-Sabbarwal
>>>>>> Blog: http://sachinsabbarwal.tumblr.com/
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ~Rajesh.B
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sachin Sabbarwal
>>>> Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265
>>>> Facebook: facebook.com/sachinsabbarwal
>>>> Quora: http://www.quora.com/Sachin-Sabbarwal
>>>> Blog: http://sachinsabbarwal.tumblr.com/
>>>>
>>>
>>>
>>>
>>> --
>>> ~Rajesh.B
>>>
>>
>>
>>
>> --
>> ~Rajesh.B
>>
>
>
>
> --
> Sachin Sabbarwal
> Linkedin: https://www.linkedin.com/profile?viewProfile=&key=95777265
> Facebook: facebook.com/sachinsabbarwal
> Quora: http://www.quora.com/Sachin-Sabbarwal
> Blog: http://sachinsabbarwal.tumblr.com/
>



-- 
~Rajesh.B

Reply via email to