Hi, I think you have to adding some swap space. Did you figure out
what's problem?

On Fri, Sep 4, 2015 at 8:20 AM, Behroz Sikander <[email protected]> wrote:
> More info on this:
> I noticed that only 2 machines were failing with OutOfMemory. After messing
> around, I found out that the swap memory was 0 for these 2 machines but
> others had swap space of 1 GB. I added the swap to these machines and it
> worked. But as expected in the next run of algorithm with more data it
> crashed again. This time GroomChildProcess crashed with the following log
> message
>
>
> *OpenJDK 64-Bit Server VM warning: INFO:
> os::commit_memory(0x00000007fa100000, 42467328, 0) failed; error='Cannot
> allocate memory' (errno=12)*
> *#*
> *# There is insufficient memory for the Java Runtime Environment to
> continue.*
> *# Native memory allocation (malloc) failed to allocate 42467328 bytes for
> committing reserved memory.*
> *# An error report file with more information is saved as:*
> *#
> /home/behroz/Documents/Packages/tmp_data/hama_tmp/bsp/local/groomServer/attempt_201509040050_0004_000006_0/work/hs_err_pid28850.log*
>
> My slave machines have 8GB of RAM, 4 CPUs, 20 GB harddrive and 1GB swap. I
> run 3 groom child process each taking 2GB of RAM. Apart from
> GroomChildProcess, I have GroomServer, DataNode and TaskManager running on
> the slave. After assigning 2GB ram to 3 child groom process (total 6GB
> RAM), only 2 GB of RAM is left for others. Do you think this is the problem
> ?
>
> Regards,
> Behroz
>
> On Thu, Sep 3, 2015 at 11:39 PM, Behroz Sikander <[email protected]> wrote:
>
>> Ok I found a strange thing. In my hadoop folder, I found a new file named
>> "hs_err_pid4919.log" inside the $HADOOP_HOME directory.
>>
>> The content of the file are
>>
>> *#   Increase physical memory or swap space*
>> *#   Check if swap backing store is full*
>> *#   Use 64 bit Java on a 64 bit OS*
>> *#   Decrease Java heap size (-Xmx/-Xms)*
>> *#   Decrease number of Java threads*
>> *#   Decrease Java thread stack sizes (-Xss)*
>> *#   Set larger code cache with -XX:ReservedCodeCacheSize=*
>> *# This output file may be truncated or incomplete.*
>> *#*
>> *#  Out of Memory Error (os_linux.cpp:2809), pid=4919, tid=140564483778304*
>> *#*
>> *# JRE version: OpenJDK Runtime Environment (7.0_79-b14) (build
>> 1.7.0_79-b14)*
>> *# Java VM: OpenJDK 64-Bit Server VM (24.79-b02 mixed mode linux-amd64
>> compressed oops)*
>> *# Derivative: IcedTea 2.5.6*
>> *# Distribution: Ubuntu 14.04 LTS, package 7u79-2.5.6-0ubuntu1.14.04.1*
>> *# Failed to write core dump. Core dumps have been disabled. To enable
>> core dumping, try "ulimit -c unlimited" before starting Java again*
>> *#*
>>
>> *---------------  T H R E A D  ---------------*
>>
>> *Current thread (0x00007fd7c0438800):  JavaThread "PacketResponder:
>> BP-1786576942-141.40.254.14-1441293753577:blk_1074136820_396012,
>> type=HAS_DOWNSTREAM_IN_PIPELINE" daemon [_thread_new, id=11943,
>> stack(0x00007fd7b80fa000,0x00007fd7b81fb000)]*
>>
>> *Stack: [0x00007fd7b80fa000,0x00007fd7b81fb000],  sp=0x00007fd7b81f9be0,
>>  free space=1022k*
>> *Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
>> code)*
>>
>> I think my DataNode process is crashing. I now know that it is a out of
>> memory error but the reason is not sure.
>>
>> On Thu, Sep 3, 2015 at 10:25 PM, Behroz Sikander <[email protected]>
>> wrote:
>>
>>> ok. HA = High Availability ?
>>>
>>> I am also trying to solve the following problem. But I do not understand
>>> why I get the exception because my algorithm does not have a lot of data
>>> that is being sent to master.
>>> *'BSP task process exit with nonzero status of 1'*
>>>
>>> Each slave node processes some data and sends back a Double array of size
>>> 96 to the master machine. Recently, I was testing the algorithm on 8000
>>> files when it crashed. This means that 8000 double arrays of size 96 are
>>> sent to the master to process. Once master receives all the data, it gets
>>> out of sync and starts the processing again. Here is the calculation
>>>
>>> 8000 * 96 * 8 (size of float) = 6144000 = ~6.144 MB.
>>>
>>> I am not sure but this does not seem to be alot of data and I think
>>> message manager that you mentioned should be able to handle it.
>>>
>>> Regards,
>>> Behroz
>>>
>>> On Tue, Sep 1, 2015 at 1:07 PM, Edward J. Yoon <[email protected]>
>>> wrote:
>>>
>>>> I'm reading GroomServer code and its taskMonitorService. It seems
>>>> related with cluster HA.
>>>>
>>>> On Sat, Aug 29, 2015 at 1:16 PM, Edward J. Yoon <[email protected]>
>>>> wrote:
>>>> >> If my Groom Child Process fails for some reason, the processes are
>>>> not killed automatically
>>>> >
>>>> > I also experienced this problem before. I guess, if one of processes
>>>> > crashed with OutOfMemory, other processes infinitely waiting for it.
>>>> > This is a bug.
>>>> >
>>>> > On Sat, Aug 29, 2015 at 1:02 AM, Behroz Sikander <[email protected]>
>>>> wrote:
>>>> >> Just another quick question. If my Groom Child Process fails for some
>>>> >> reason, the processes are not killed automatically. If i run JPS
>>>> command, I
>>>> >> can still see something like "3791 GroomServer$BSPPeerChild". Is this
>>>> the
>>>> >> expected behavior ?
>>>> >>
>>>> >> I am using latest hama version (0.7.0).
>>>> >> Regards,
>>>> >> Behroz
>>>> >>
>>>> >> On Fri, Aug 28, 2015 at 4:12 PM, Behroz Sikander <[email protected]>
>>>> wrote:
>>>> >>
>>>> >>> Ok I will try it out.
>>>> >>>
>>>> >>> No, actually I am learning alot by facing these problems. It is
>>>> actually a
>>>> >>> good thing :D
>>>> >>>
>>>> >>> Regards,
>>>> >>> Behroz
>>>> >>>
>>>> >>> On Fri, Aug 28, 2015 at 5:52 AM, Edward J. Yoon <
>>>> [email protected]>
>>>> >>> wrote:
>>>> >>>
>>>> >>>> > message managers. Hmmm, I will recheck my logic related to
>>>> messages. Btw
>>>> >>>>
>>>> >>>> Serialization (like GraphJobMessage) is good idea. It stores
>>>> multiple
>>>> >>>> messages in serialized form in a single object to reduce the memory
>>>> >>>> usage and RPC overhead.
>>>> >>>>
>>>> >>>> > what is the limit of these message managers ? How much data at a
>>>> single
>>>> >>>> > time they can handle ?
>>>> >>>>
>>>> >>>> It depends on memory.
>>>> >>>>
>>>> >>>> > P.S. Each day, as I am moving towards a big cluster I am running
>>>> into
>>>> >>>> > problems (alot of them :D).
>>>> >>>>
>>>> >>>> Haha, sorry for inconvenient and thanks for your reports.
>>>> >>>>
>>>> >>>> On Fri, Aug 28, 2015 at 11:25 AM, Behroz Sikander <
>>>> [email protected]>
>>>> >>>> wrote:
>>>> >>>> > Ok. So, I do have a memory problem. I will try to scale out.
>>>> >>>> >
>>>> >>>> > *>>Each task processor has two message manager, one for outgoing
>>>> and
>>>> >>>> one*
>>>> >>>> >
>>>> >>>> > *for incoming. All these are handled in memory, so it
>>>> sometimesrequires
>>>> >>>> > large memory space.*
>>>> >>>> > So, you mean that before barrier synchronization, I have alot of
>>>> data in
>>>> >>>> > message managers. Hmmm, I will recheck my logic related to
>>>> messages. Btw
>>>> >>>> > what is the limit of these message managers ? How much data at a
>>>> single
>>>> >>>> > time they can handle ?
>>>> >>>> >
>>>> >>>> > P.S. Each day, as I am moving towards a big cluster I am running
>>>> into
>>>> >>>> > problems (alot of them :D).
>>>> >>>> >
>>>> >>>> > Regards,
>>>> >>>> > Behroz Sikander
>>>> >>>> >
>>>> >>>> > On Fri, Aug 28, 2015 at 4:04 AM, Edward J. Yoon <
>>>> [email protected]>
>>>> >>>> > wrote:
>>>> >>>> >
>>>> >>>> >> > for 3 Groom child process + 2GB for Ubuntu OS). Is this correct
>>>> >>>> >> > understanding ?
>>>> >>>> >>
>>>> >>>> >> and,
>>>> >>>> >>
>>>> >>>> >> > on a big dataset. I think these exceptions have something to
>>>> do with
>>>> >>>> >> Ubuntu
>>>> >>>> >> > OS killing the hama process due to lack of memory. So, I was
>>>> curious
>>>> >>>> >> about
>>>> >>>> >>
>>>> >>>> >> Yes, you're right.
>>>> >>>> >>
>>>> >>>> >> Each task processor has two message manager, one for outgoing
>>>> and one
>>>> >>>> >> for incoming. All these are handled in memory, so it sometimes
>>>> >>>> >> requires large memory space. To solve the OutOfMemory issue, you
>>>> >>>> >> should scale-out your cluster by increasing the number of nodes
>>>> and
>>>> >>>> >> job tasks, or optimize your algorithm. Another option is
>>>> >>>> >> disk-spillable message manager. This is not supported yet.
>>>> >>>> >>
>>>> >>>> >> On Fri, Aug 28, 2015 at 10:45 AM, Behroz Sikander <
>>>> [email protected]>
>>>> >>>> >> wrote:
>>>> >>>> >> > Hi,
>>>> >>>> >> > Yes. According to hama-default.xml, each machine will open 3
>>>> process
>>>> >>>> with
>>>> >>>> >> > 2GB memory each. This means that my VMs need atleast 8GB
>>>> memory (2GB
>>>> >>>> each
>>>> >>>> >> > for 3 Groom child process + 2GB for Ubuntu OS). Is this correct
>>>> >>>> >> > understanding ?
>>>> >>>> >> >
>>>> >>>> >> > I recently ran into the following exceptions when I was trying
>>>> to run
>>>> >>>> >> hama
>>>> >>>> >> > on a big dataset. I think these exceptions have something to
>>>> do with
>>>> >>>> >> Ubuntu
>>>> >>>> >> > OS killing the hama process due to lack of memory. So, I was
>>>> curious
>>>> >>>> >> about
>>>> >>>> >> > my configurations.
>>>> >>>> >> > 'BSP task process exit with nonzero status of 137.'
>>>> >>>> >> > 'BSP task process exit with nonzero status of 1'
>>>> >>>> >> >
>>>> >>>> >> >
>>>> >>>> >> >
>>>> >>>> >> > Regards,
>>>> >>>> >> > Behroz
>>>> >>>> >> >
>>>> >>>> >> > On Fri, Aug 28, 2015 at 3:04 AM, Edward J. Yoon <
>>>> >>>> [email protected]>
>>>> >>>> >> > wrote:
>>>> >>>> >> >
>>>> >>>> >> >> Hi,
>>>> >>>> >> >>
>>>> >>>> >> >> You can change the max tasks per node by setting below
>>>> property in
>>>> >>>> >> >> hama-site.xml. :-)
>>>> >>>> >> >>
>>>> >>>> >> >>   <property>
>>>> >>>> >> >>     <name>bsp.tasks.maximum</name>
>>>> >>>> >> >>     <value>3</value>
>>>> >>>> >> >>     <description>The maximum number of BSP tasks that will be
>>>> run
>>>> >>>> >> >> simultaneously
>>>> >>>> >> >>     by a groom server.</description>
>>>> >>>> >> >>   </property>
>>>> >>>> >> >>
>>>> >>>> >> >>
>>>> >>>> >> >> On Fri, Aug 28, 2015 at 5:18 AM, Behroz Sikander <
>>>> >>>> [email protected]>
>>>> >>>> >> >> wrote:
>>>> >>>> >> >> > Hi,
>>>> >>>> >> >> > Recently, I noticed that my hama deployment is only opening
>>>> 3
>>>> >>>> >> processes
>>>> >>>> >> >> per
>>>> >>>> >> >> > machine. This is because of the configuration settings in
>>>> the
>>>> >>>> default
>>>> >>>> >> >> hama
>>>> >>>> >> >> > file.
>>>> >>>> >> >> >
>>>> >>>> >> >> > My questions is why 3 and why not 5 or 7 ? What criteria's
>>>> should
>>>> >>>> be
>>>> >>>> >> >> > considered if I want to increase the value ?
>>>> >>>> >> >> >
>>>> >>>> >> >> > Regards,
>>>> >>>> >> >> > Behroz
>>>> >>>> >> >>
>>>> >>>> >> >>
>>>> >>>> >> >>
>>>> >>>> >> >> --
>>>> >>>> >> >> Best Regards, Edward J. Yoon
>>>> >>>> >> >>
>>>> >>>> >>
>>>> >>>> >>
>>>> >>>> >>
>>>> >>>> >> --
>>>> >>>> >> Best Regards, Edward J. Yoon
>>>> >>>> >>
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> --
>>>> >>>> Best Regards, Edward J. Yoon
>>>> >>>>
>>>> >>>
>>>> >>>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Best Regards, Edward J. Yoon
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards, Edward J. Yoon
>>>>
>>>
>>>
>>



-- 
Best Regards, Edward J. Yoon

Reply via email to