ok. HA = High Availability ? I am also trying to solve the following problem. But I do not understand why I get the exception because my algorithm does not have a lot of data that is being sent to master. *'BSP task process exit with nonzero status of 1'*
Each slave node processes some data and sends back a Double array of size 96 to the master machine. Recently, I was testing the algorithm on 8000 files when it crashed. This means that 8000 double arrays of size 96 are sent to the master to process. Once master receives all the data, it gets out of sync and starts the processing again. Here is the calculation 8000 * 96 * 8 (size of float) = 6144000 = ~6.144 MB. I am not sure but this does not seem to be alot of data and I think message manager that you mentioned should be able to handle it. Regards, Behroz On Tue, Sep 1, 2015 at 1:07 PM, Edward J. Yoon <[email protected]> wrote: > I'm reading GroomServer code and its taskMonitorService. It seems > related with cluster HA. > > On Sat, Aug 29, 2015 at 1:16 PM, Edward J. Yoon <[email protected]> > wrote: > >> If my Groom Child Process fails for some reason, the processes are not > killed automatically > > > > I also experienced this problem before. I guess, if one of processes > > crashed with OutOfMemory, other processes infinitely waiting for it. > > This is a bug. > > > > On Sat, Aug 29, 2015 at 1:02 AM, Behroz Sikander <[email protected]> > wrote: > >> Just another quick question. If my Groom Child Process fails for some > >> reason, the processes are not killed automatically. If i run JPS > command, I > >> can still see something like "3791 GroomServer$BSPPeerChild". Is this > the > >> expected behavior ? > >> > >> I am using latest hama version (0.7.0). > >> Regards, > >> Behroz > >> > >> On Fri, Aug 28, 2015 at 4:12 PM, Behroz Sikander <[email protected]> > wrote: > >> > >>> Ok I will try it out. > >>> > >>> No, actually I am learning alot by facing these problems. It is > actually a > >>> good thing :D > >>> > >>> Regards, > >>> Behroz > >>> > >>> On Fri, Aug 28, 2015 at 5:52 AM, Edward J. Yoon <[email protected] > > > >>> wrote: > >>> > >>>> > message managers. Hmmm, I will recheck my logic related to > messages. Btw > >>>> > >>>> Serialization (like GraphJobMessage) is good idea. It stores multiple > >>>> messages in serialized form in a single object to reduce the memory > >>>> usage and RPC overhead. > >>>> > >>>> > what is the limit of these message managers ? How much data at a > single > >>>> > time they can handle ? > >>>> > >>>> It depends on memory. > >>>> > >>>> > P.S. Each day, as I am moving towards a big cluster I am running > into > >>>> > problems (alot of them :D). > >>>> > >>>> Haha, sorry for inconvenient and thanks for your reports. > >>>> > >>>> On Fri, Aug 28, 2015 at 11:25 AM, Behroz Sikander <[email protected] > > > >>>> wrote: > >>>> > Ok. So, I do have a memory problem. I will try to scale out. > >>>> > > >>>> > *>>Each task processor has two message manager, one for outgoing and > >>>> one* > >>>> > > >>>> > *for incoming. All these are handled in memory, so it > sometimesrequires > >>>> > large memory space.* > >>>> > So, you mean that before barrier synchronization, I have alot of > data in > >>>> > message managers. Hmmm, I will recheck my logic related to > messages. Btw > >>>> > what is the limit of these message managers ? How much data at a > single > >>>> > time they can handle ? > >>>> > > >>>> > P.S. Each day, as I am moving towards a big cluster I am running > into > >>>> > problems (alot of them :D). > >>>> > > >>>> > Regards, > >>>> > Behroz Sikander > >>>> > > >>>> > On Fri, Aug 28, 2015 at 4:04 AM, Edward J. Yoon < > [email protected]> > >>>> > wrote: > >>>> > > >>>> >> > for 3 Groom child process + 2GB for Ubuntu OS). Is this correct > >>>> >> > understanding ? > >>>> >> > >>>> >> and, > >>>> >> > >>>> >> > on a big dataset. I think these exceptions have something to do > with > >>>> >> Ubuntu > >>>> >> > OS killing the hama process due to lack of memory. So, I was > curious > >>>> >> about > >>>> >> > >>>> >> Yes, you're right. > >>>> >> > >>>> >> Each task processor has two message manager, one for outgoing and > one > >>>> >> for incoming. All these are handled in memory, so it sometimes > >>>> >> requires large memory space. To solve the OutOfMemory issue, you > >>>> >> should scale-out your cluster by increasing the number of nodes and > >>>> >> job tasks, or optimize your algorithm. Another option is > >>>> >> disk-spillable message manager. This is not supported yet. > >>>> >> > >>>> >> On Fri, Aug 28, 2015 at 10:45 AM, Behroz Sikander < > [email protected]> > >>>> >> wrote: > >>>> >> > Hi, > >>>> >> > Yes. According to hama-default.xml, each machine will open 3 > process > >>>> with > >>>> >> > 2GB memory each. This means that my VMs need atleast 8GB memory > (2GB > >>>> each > >>>> >> > for 3 Groom child process + 2GB for Ubuntu OS). Is this correct > >>>> >> > understanding ? > >>>> >> > > >>>> >> > I recently ran into the following exceptions when I was trying > to run > >>>> >> hama > >>>> >> > on a big dataset. I think these exceptions have something to do > with > >>>> >> Ubuntu > >>>> >> > OS killing the hama process due to lack of memory. So, I was > curious > >>>> >> about > >>>> >> > my configurations. > >>>> >> > 'BSP task process exit with nonzero status of 137.' > >>>> >> > 'BSP task process exit with nonzero status of 1' > >>>> >> > > >>>> >> > > >>>> >> > > >>>> >> > Regards, > >>>> >> > Behroz > >>>> >> > > >>>> >> > On Fri, Aug 28, 2015 at 3:04 AM, Edward J. Yoon < > >>>> [email protected]> > >>>> >> > wrote: > >>>> >> > > >>>> >> >> Hi, > >>>> >> >> > >>>> >> >> You can change the max tasks per node by setting below property > in > >>>> >> >> hama-site.xml. :-) > >>>> >> >> > >>>> >> >> <property> > >>>> >> >> <name>bsp.tasks.maximum</name> > >>>> >> >> <value>3</value> > >>>> >> >> <description>The maximum number of BSP tasks that will be > run > >>>> >> >> simultaneously > >>>> >> >> by a groom server.</description> > >>>> >> >> </property> > >>>> >> >> > >>>> >> >> > >>>> >> >> On Fri, Aug 28, 2015 at 5:18 AM, Behroz Sikander < > >>>> [email protected]> > >>>> >> >> wrote: > >>>> >> >> > Hi, > >>>> >> >> > Recently, I noticed that my hama deployment is only opening 3 > >>>> >> processes > >>>> >> >> per > >>>> >> >> > machine. This is because of the configuration settings in the > >>>> default > >>>> >> >> hama > >>>> >> >> > file. > >>>> >> >> > > >>>> >> >> > My questions is why 3 and why not 5 or 7 ? What criteria's > should > >>>> be > >>>> >> >> > considered if I want to increase the value ? > >>>> >> >> > > >>>> >> >> > Regards, > >>>> >> >> > Behroz > >>>> >> >> > >>>> >> >> > >>>> >> >> > >>>> >> >> -- > >>>> >> >> Best Regards, Edward J. Yoon > >>>> >> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> -- > >>>> >> Best Regards, Edward J. Yoon > >>>> >> > >>>> > >>>> > >>>> > >>>> -- > >>>> Best Regards, Edward J. Yoon > >>>> > >>> > >>> > > > > > > > > -- > > Best Regards, Edward J. Yoon > > > > -- > Best Regards, Edward J. Yoon >
