Ok I found a strange thing. In my hadoop folder, I found a new file named
"hs_err_pid4919.log" inside the $HADOOP_HOME directory.

The content of the file are

*#   Increase physical memory or swap space*
*#   Check if swap backing store is full*
*#   Use 64 bit Java on a 64 bit OS*
*#   Decrease Java heap size (-Xmx/-Xms)*
*#   Decrease number of Java threads*
*#   Decrease Java thread stack sizes (-Xss)*
*#   Set larger code cache with -XX:ReservedCodeCacheSize=*
*# This output file may be truncated or incomplete.*
*#*
*#  Out of Memory Error (os_linux.cpp:2809), pid=4919, tid=140564483778304*
*#*
*# JRE version: OpenJDK Runtime Environment (7.0_79-b14) (build
1.7.0_79-b14)*
*# Java VM: OpenJDK 64-Bit Server VM (24.79-b02 mixed mode linux-amd64
compressed oops)*
*# Derivative: IcedTea 2.5.6*
*# Distribution: Ubuntu 14.04 LTS, package 7u79-2.5.6-0ubuntu1.14.04.1*
*# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again*
*#*

*---------------  T H R E A D  ---------------*

*Current thread (0x00007fd7c0438800):  JavaThread "PacketResponder:
BP-1786576942-141.40.254.14-1441293753577:blk_1074136820_396012,
type=HAS_DOWNSTREAM_IN_PIPELINE" daemon [_thread_new, id=11943,
stack(0x00007fd7b80fa000,0x00007fd7b81fb000)]*

*Stack: [0x00007fd7b80fa000,0x00007fd7b81fb000],  sp=0x00007fd7b81f9be0,
 free space=1022k*
*Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)*

I think my DataNode process is crashing. I now know that it is a out of
memory error but the reason is not sure.

On Thu, Sep 3, 2015 at 10:25 PM, Behroz Sikander <[email protected]> wrote:

> ok. HA = High Availability ?
>
> I am also trying to solve the following problem. But I do not understand
> why I get the exception because my algorithm does not have a lot of data
> that is being sent to master.
> *'BSP task process exit with nonzero status of 1'*
>
> Each slave node processes some data and sends back a Double array of size
> 96 to the master machine. Recently, I was testing the algorithm on 8000
> files when it crashed. This means that 8000 double arrays of size 96 are
> sent to the master to process. Once master receives all the data, it gets
> out of sync and starts the processing again. Here is the calculation
>
> 8000 * 96 * 8 (size of float) = 6144000 = ~6.144 MB.
>
> I am not sure but this does not seem to be alot of data and I think
> message manager that you mentioned should be able to handle it.
>
> Regards,
> Behroz
>
> On Tue, Sep 1, 2015 at 1:07 PM, Edward J. Yoon <[email protected]>
> wrote:
>
>> I'm reading GroomServer code and its taskMonitorService. It seems
>> related with cluster HA.
>>
>> On Sat, Aug 29, 2015 at 1:16 PM, Edward J. Yoon <[email protected]>
>> wrote:
>> >> If my Groom Child Process fails for some reason, the processes are not
>> killed automatically
>> >
>> > I also experienced this problem before. I guess, if one of processes
>> > crashed with OutOfMemory, other processes infinitely waiting for it.
>> > This is a bug.
>> >
>> > On Sat, Aug 29, 2015 at 1:02 AM, Behroz Sikander <[email protected]>
>> wrote:
>> >> Just another quick question. If my Groom Child Process fails for some
>> >> reason, the processes are not killed automatically. If i run JPS
>> command, I
>> >> can still see something like "3791 GroomServer$BSPPeerChild". Is this
>> the
>> >> expected behavior ?
>> >>
>> >> I am using latest hama version (0.7.0).
>> >> Regards,
>> >> Behroz
>> >>
>> >> On Fri, Aug 28, 2015 at 4:12 PM, Behroz Sikander <[email protected]>
>> wrote:
>> >>
>> >>> Ok I will try it out.
>> >>>
>> >>> No, actually I am learning alot by facing these problems. It is
>> actually a
>> >>> good thing :D
>> >>>
>> >>> Regards,
>> >>> Behroz
>> >>>
>> >>> On Fri, Aug 28, 2015 at 5:52 AM, Edward J. Yoon <
>> [email protected]>
>> >>> wrote:
>> >>>
>> >>>> > message managers. Hmmm, I will recheck my logic related to
>> messages. Btw
>> >>>>
>> >>>> Serialization (like GraphJobMessage) is good idea. It stores multiple
>> >>>> messages in serialized form in a single object to reduce the memory
>> >>>> usage and RPC overhead.
>> >>>>
>> >>>> > what is the limit of these message managers ? How much data at a
>> single
>> >>>> > time they can handle ?
>> >>>>
>> >>>> It depends on memory.
>> >>>>
>> >>>> > P.S. Each day, as I am moving towards a big cluster I am running
>> into
>> >>>> > problems (alot of them :D).
>> >>>>
>> >>>> Haha, sorry for inconvenient and thanks for your reports.
>> >>>>
>> >>>> On Fri, Aug 28, 2015 at 11:25 AM, Behroz Sikander <
>> [email protected]>
>> >>>> wrote:
>> >>>> > Ok. So, I do have a memory problem. I will try to scale out.
>> >>>> >
>> >>>> > *>>Each task processor has two message manager, one for outgoing
>> and
>> >>>> one*
>> >>>> >
>> >>>> > *for incoming. All these are handled in memory, so it
>> sometimesrequires
>> >>>> > large memory space.*
>> >>>> > So, you mean that before barrier synchronization, I have alot of
>> data in
>> >>>> > message managers. Hmmm, I will recheck my logic related to
>> messages. Btw
>> >>>> > what is the limit of these message managers ? How much data at a
>> single
>> >>>> > time they can handle ?
>> >>>> >
>> >>>> > P.S. Each day, as I am moving towards a big cluster I am running
>> into
>> >>>> > problems (alot of them :D).
>> >>>> >
>> >>>> > Regards,
>> >>>> > Behroz Sikander
>> >>>> >
>> >>>> > On Fri, Aug 28, 2015 at 4:04 AM, Edward J. Yoon <
>> [email protected]>
>> >>>> > wrote:
>> >>>> >
>> >>>> >> > for 3 Groom child process + 2GB for Ubuntu OS). Is this correct
>> >>>> >> > understanding ?
>> >>>> >>
>> >>>> >> and,
>> >>>> >>
>> >>>> >> > on a big dataset. I think these exceptions have something to do
>> with
>> >>>> >> Ubuntu
>> >>>> >> > OS killing the hama process due to lack of memory. So, I was
>> curious
>> >>>> >> about
>> >>>> >>
>> >>>> >> Yes, you're right.
>> >>>> >>
>> >>>> >> Each task processor has two message manager, one for outgoing and
>> one
>> >>>> >> for incoming. All these are handled in memory, so it sometimes
>> >>>> >> requires large memory space. To solve the OutOfMemory issue, you
>> >>>> >> should scale-out your cluster by increasing the number of nodes
>> and
>> >>>> >> job tasks, or optimize your algorithm. Another option is
>> >>>> >> disk-spillable message manager. This is not supported yet.
>> >>>> >>
>> >>>> >> On Fri, Aug 28, 2015 at 10:45 AM, Behroz Sikander <
>> [email protected]>
>> >>>> >> wrote:
>> >>>> >> > Hi,
>> >>>> >> > Yes. According to hama-default.xml, each machine will open 3
>> process
>> >>>> with
>> >>>> >> > 2GB memory each. This means that my VMs need atleast 8GB memory
>> (2GB
>> >>>> each
>> >>>> >> > for 3 Groom child process + 2GB for Ubuntu OS). Is this correct
>> >>>> >> > understanding ?
>> >>>> >> >
>> >>>> >> > I recently ran into the following exceptions when I was trying
>> to run
>> >>>> >> hama
>> >>>> >> > on a big dataset. I think these exceptions have something to do
>> with
>> >>>> >> Ubuntu
>> >>>> >> > OS killing the hama process due to lack of memory. So, I was
>> curious
>> >>>> >> about
>> >>>> >> > my configurations.
>> >>>> >> > 'BSP task process exit with nonzero status of 137.'
>> >>>> >> > 'BSP task process exit with nonzero status of 1'
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > Regards,
>> >>>> >> > Behroz
>> >>>> >> >
>> >>>> >> > On Fri, Aug 28, 2015 at 3:04 AM, Edward J. Yoon <
>> >>>> [email protected]>
>> >>>> >> > wrote:
>> >>>> >> >
>> >>>> >> >> Hi,
>> >>>> >> >>
>> >>>> >> >> You can change the max tasks per node by setting below
>> property in
>> >>>> >> >> hama-site.xml. :-)
>> >>>> >> >>
>> >>>> >> >>   <property>
>> >>>> >> >>     <name>bsp.tasks.maximum</name>
>> >>>> >> >>     <value>3</value>
>> >>>> >> >>     <description>The maximum number of BSP tasks that will be
>> run
>> >>>> >> >> simultaneously
>> >>>> >> >>     by a groom server.</description>
>> >>>> >> >>   </property>
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >> On Fri, Aug 28, 2015 at 5:18 AM, Behroz Sikander <
>> >>>> [email protected]>
>> >>>> >> >> wrote:
>> >>>> >> >> > Hi,
>> >>>> >> >> > Recently, I noticed that my hama deployment is only opening 3
>> >>>> >> processes
>> >>>> >> >> per
>> >>>> >> >> > machine. This is because of the configuration settings in the
>> >>>> default
>> >>>> >> >> hama
>> >>>> >> >> > file.
>> >>>> >> >> >
>> >>>> >> >> > My questions is why 3 and why not 5 or 7 ? What criteria's
>> should
>> >>>> be
>> >>>> >> >> > considered if I want to increase the value ?
>> >>>> >> >> >
>> >>>> >> >> > Regards,
>> >>>> >> >> > Behroz
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >>
>> >>>> >> >> --
>> >>>> >> >> Best Regards, Edward J. Yoon
>> >>>> >> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> --
>> >>>> >> Best Regards, Edward J. Yoon
>> >>>> >>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Best Regards, Edward J. Yoon
>> >>>>
>> >>>
>> >>>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>
>

Reply via email to