I'm reading GroomServer code and its taskMonitorService. It seems related with cluster HA.
On Sat, Aug 29, 2015 at 1:16 PM, Edward J. Yoon <[email protected]> wrote: >> If my Groom Child Process fails for some reason, the processes are not >> killed automatically > > I also experienced this problem before. I guess, if one of processes > crashed with OutOfMemory, other processes infinitely waiting for it. > This is a bug. > > On Sat, Aug 29, 2015 at 1:02 AM, Behroz Sikander <[email protected]> wrote: >> Just another quick question. If my Groom Child Process fails for some >> reason, the processes are not killed automatically. If i run JPS command, I >> can still see something like "3791 GroomServer$BSPPeerChild". Is this the >> expected behavior ? >> >> I am using latest hama version (0.7.0). >> Regards, >> Behroz >> >> On Fri, Aug 28, 2015 at 4:12 PM, Behroz Sikander <[email protected]> wrote: >> >>> Ok I will try it out. >>> >>> No, actually I am learning alot by facing these problems. It is actually a >>> good thing :D >>> >>> Regards, >>> Behroz >>> >>> On Fri, Aug 28, 2015 at 5:52 AM, Edward J. Yoon <[email protected]> >>> wrote: >>> >>>> > message managers. Hmmm, I will recheck my logic related to messages. Btw >>>> >>>> Serialization (like GraphJobMessage) is good idea. It stores multiple >>>> messages in serialized form in a single object to reduce the memory >>>> usage and RPC overhead. >>>> >>>> > what is the limit of these message managers ? How much data at a single >>>> > time they can handle ? >>>> >>>> It depends on memory. >>>> >>>> > P.S. Each day, as I am moving towards a big cluster I am running into >>>> > problems (alot of them :D). >>>> >>>> Haha, sorry for inconvenient and thanks for your reports. >>>> >>>> On Fri, Aug 28, 2015 at 11:25 AM, Behroz Sikander <[email protected]> >>>> wrote: >>>> > Ok. So, I do have a memory problem. I will try to scale out. >>>> > >>>> > *>>Each task processor has two message manager, one for outgoing and >>>> one* >>>> > >>>> > *for incoming. All these are handled in memory, so it sometimesrequires >>>> > large memory space.* >>>> > So, you mean that before barrier synchronization, I have alot of data in >>>> > message managers. Hmmm, I will recheck my logic related to messages. Btw >>>> > what is the limit of these message managers ? How much data at a single >>>> > time they can handle ? >>>> > >>>> > P.S. Each day, as I am moving towards a big cluster I am running into >>>> > problems (alot of them :D). >>>> > >>>> > Regards, >>>> > Behroz Sikander >>>> > >>>> > On Fri, Aug 28, 2015 at 4:04 AM, Edward J. Yoon <[email protected]> >>>> > wrote: >>>> > >>>> >> > for 3 Groom child process + 2GB for Ubuntu OS). Is this correct >>>> >> > understanding ? >>>> >> >>>> >> and, >>>> >> >>>> >> > on a big dataset. I think these exceptions have something to do with >>>> >> Ubuntu >>>> >> > OS killing the hama process due to lack of memory. So, I was curious >>>> >> about >>>> >> >>>> >> Yes, you're right. >>>> >> >>>> >> Each task processor has two message manager, one for outgoing and one >>>> >> for incoming. All these are handled in memory, so it sometimes >>>> >> requires large memory space. To solve the OutOfMemory issue, you >>>> >> should scale-out your cluster by increasing the number of nodes and >>>> >> job tasks, or optimize your algorithm. Another option is >>>> >> disk-spillable message manager. This is not supported yet. >>>> >> >>>> >> On Fri, Aug 28, 2015 at 10:45 AM, Behroz Sikander <[email protected]> >>>> >> wrote: >>>> >> > Hi, >>>> >> > Yes. According to hama-default.xml, each machine will open 3 process >>>> with >>>> >> > 2GB memory each. This means that my VMs need atleast 8GB memory (2GB >>>> each >>>> >> > for 3 Groom child process + 2GB for Ubuntu OS). Is this correct >>>> >> > understanding ? >>>> >> > >>>> >> > I recently ran into the following exceptions when I was trying to run >>>> >> hama >>>> >> > on a big dataset. I think these exceptions have something to do with >>>> >> Ubuntu >>>> >> > OS killing the hama process due to lack of memory. So, I was curious >>>> >> about >>>> >> > my configurations. >>>> >> > 'BSP task process exit with nonzero status of 137.' >>>> >> > 'BSP task process exit with nonzero status of 1' >>>> >> > >>>> >> > >>>> >> > >>>> >> > Regards, >>>> >> > Behroz >>>> >> > >>>> >> > On Fri, Aug 28, 2015 at 3:04 AM, Edward J. Yoon < >>>> [email protected]> >>>> >> > wrote: >>>> >> > >>>> >> >> Hi, >>>> >> >> >>>> >> >> You can change the max tasks per node by setting below property in >>>> >> >> hama-site.xml. :-) >>>> >> >> >>>> >> >> <property> >>>> >> >> <name>bsp.tasks.maximum</name> >>>> >> >> <value>3</value> >>>> >> >> <description>The maximum number of BSP tasks that will be run >>>> >> >> simultaneously >>>> >> >> by a groom server.</description> >>>> >> >> </property> >>>> >> >> >>>> >> >> >>>> >> >> On Fri, Aug 28, 2015 at 5:18 AM, Behroz Sikander < >>>> [email protected]> >>>> >> >> wrote: >>>> >> >> > Hi, >>>> >> >> > Recently, I noticed that my hama deployment is only opening 3 >>>> >> processes >>>> >> >> per >>>> >> >> > machine. This is because of the configuration settings in the >>>> default >>>> >> >> hama >>>> >> >> > file. >>>> >> >> > >>>> >> >> > My questions is why 3 and why not 5 or 7 ? What criteria's should >>>> be >>>> >> >> > considered if I want to increase the value ? >>>> >> >> > >>>> >> >> > Regards, >>>> >> >> > Behroz >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> -- >>>> >> >> Best Regards, Edward J. Yoon >>>> >> >> >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Best Regards, Edward J. Yoon >>>> >> >>>> >>>> >>>> >>>> -- >>>> Best Regards, Edward J. Yoon >>>> >>> >>> > > > > -- > Best Regards, Edward J. Yoon -- Best Regards, Edward J. Yoon
