I'm reading GroomServer code and its taskMonitorService. It seems
related with cluster HA.

On Sat, Aug 29, 2015 at 1:16 PM, Edward J. Yoon <[email protected]> wrote:
>> If my Groom Child Process fails for some reason, the processes are not 
>> killed automatically
>
> I also experienced this problem before. I guess, if one of processes
> crashed with OutOfMemory, other processes infinitely waiting for it.
> This is a bug.
>
> On Sat, Aug 29, 2015 at 1:02 AM, Behroz Sikander <[email protected]> wrote:
>> Just another quick question. If my Groom Child Process fails for some
>> reason, the processes are not killed automatically. If i run JPS command, I
>> can still see something like "3791 GroomServer$BSPPeerChild". Is this the
>> expected behavior ?
>>
>> I am using latest hama version (0.7.0).
>> Regards,
>> Behroz
>>
>> On Fri, Aug 28, 2015 at 4:12 PM, Behroz Sikander <[email protected]> wrote:
>>
>>> Ok I will try it out.
>>>
>>> No, actually I am learning alot by facing these problems. It is actually a
>>> good thing :D
>>>
>>> Regards,
>>> Behroz
>>>
>>> On Fri, Aug 28, 2015 at 5:52 AM, Edward J. Yoon <[email protected]>
>>> wrote:
>>>
>>>> > message managers. Hmmm, I will recheck my logic related to messages. Btw
>>>>
>>>> Serialization (like GraphJobMessage) is good idea. It stores multiple
>>>> messages in serialized form in a single object to reduce the memory
>>>> usage and RPC overhead.
>>>>
>>>> > what is the limit of these message managers ? How much data at a single
>>>> > time they can handle ?
>>>>
>>>> It depends on memory.
>>>>
>>>> > P.S. Each day, as I am moving towards a big cluster I am running into
>>>> > problems (alot of them :D).
>>>>
>>>> Haha, sorry for inconvenient and thanks for your reports.
>>>>
>>>> On Fri, Aug 28, 2015 at 11:25 AM, Behroz Sikander <[email protected]>
>>>> wrote:
>>>> > Ok. So, I do have a memory problem. I will try to scale out.
>>>> >
>>>> > *>>Each task processor has two message manager, one for outgoing and
>>>> one*
>>>> >
>>>> > *for incoming. All these are handled in memory, so it sometimesrequires
>>>> > large memory space.*
>>>> > So, you mean that before barrier synchronization, I have alot of data in
>>>> > message managers. Hmmm, I will recheck my logic related to messages. Btw
>>>> > what is the limit of these message managers ? How much data at a single
>>>> > time they can handle ?
>>>> >
>>>> > P.S. Each day, as I am moving towards a big cluster I am running into
>>>> > problems (alot of them :D).
>>>> >
>>>> > Regards,
>>>> > Behroz Sikander
>>>> >
>>>> > On Fri, Aug 28, 2015 at 4:04 AM, Edward J. Yoon <[email protected]>
>>>> > wrote:
>>>> >
>>>> >> > for 3 Groom child process + 2GB for Ubuntu OS). Is this correct
>>>> >> > understanding ?
>>>> >>
>>>> >> and,
>>>> >>
>>>> >> > on a big dataset. I think these exceptions have something to do with
>>>> >> Ubuntu
>>>> >> > OS killing the hama process due to lack of memory. So, I was curious
>>>> >> about
>>>> >>
>>>> >> Yes, you're right.
>>>> >>
>>>> >> Each task processor has two message manager, one for outgoing and one
>>>> >> for incoming. All these are handled in memory, so it sometimes
>>>> >> requires large memory space. To solve the OutOfMemory issue, you
>>>> >> should scale-out your cluster by increasing the number of nodes and
>>>> >> job tasks, or optimize your algorithm. Another option is
>>>> >> disk-spillable message manager. This is not supported yet.
>>>> >>
>>>> >> On Fri, Aug 28, 2015 at 10:45 AM, Behroz Sikander <[email protected]>
>>>> >> wrote:
>>>> >> > Hi,
>>>> >> > Yes. According to hama-default.xml, each machine will open 3 process
>>>> with
>>>> >> > 2GB memory each. This means that my VMs need atleast 8GB memory (2GB
>>>> each
>>>> >> > for 3 Groom child process + 2GB for Ubuntu OS). Is this correct
>>>> >> > understanding ?
>>>> >> >
>>>> >> > I recently ran into the following exceptions when I was trying to run
>>>> >> hama
>>>> >> > on a big dataset. I think these exceptions have something to do with
>>>> >> Ubuntu
>>>> >> > OS killing the hama process due to lack of memory. So, I was curious
>>>> >> about
>>>> >> > my configurations.
>>>> >> > 'BSP task process exit with nonzero status of 137.'
>>>> >> > 'BSP task process exit with nonzero status of 1'
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > Regards,
>>>> >> > Behroz
>>>> >> >
>>>> >> > On Fri, Aug 28, 2015 at 3:04 AM, Edward J. Yoon <
>>>> [email protected]>
>>>> >> > wrote:
>>>> >> >
>>>> >> >> Hi,
>>>> >> >>
>>>> >> >> You can change the max tasks per node by setting below property in
>>>> >> >> hama-site.xml. :-)
>>>> >> >>
>>>> >> >>   <property>
>>>> >> >>     <name>bsp.tasks.maximum</name>
>>>> >> >>     <value>3</value>
>>>> >> >>     <description>The maximum number of BSP tasks that will be run
>>>> >> >> simultaneously
>>>> >> >>     by a groom server.</description>
>>>> >> >>   </property>
>>>> >> >>
>>>> >> >>
>>>> >> >> On Fri, Aug 28, 2015 at 5:18 AM, Behroz Sikander <
>>>> [email protected]>
>>>> >> >> wrote:
>>>> >> >> > Hi,
>>>> >> >> > Recently, I noticed that my hama deployment is only opening 3
>>>> >> processes
>>>> >> >> per
>>>> >> >> > machine. This is because of the configuration settings in the
>>>> default
>>>> >> >> hama
>>>> >> >> > file.
>>>> >> >> >
>>>> >> >> > My questions is why 3 and why not 5 or 7 ? What criteria's should
>>>> be
>>>> >> >> > considered if I want to increase the value ?
>>>> >> >> >
>>>> >> >> > Regards,
>>>> >> >> > Behroz
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> Best Regards, Edward J. Yoon
>>>> >> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Best Regards, Edward J. Yoon
>>>> >>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon
>>>>
>>>
>>>
>
>
>
> --
> Best Regards, Edward J. Yoon



-- 
Best Regards, Edward J. Yoon

Reply via email to