Thanks, have nice holidays. :-) On Sat, Sep 12, 2015 at 7:04 PM, Behroz Sikander <[email protected]> wrote: > Hi, > I also added the swap space and the algorithm ran a bit more but after some > time I faced the same problem again. I am on holidays and will figure out > the issue in next few days. I will keep you guys updated. > > Regards, > Behroz > > > > On Fri, Sep 11, 2015 at 6:36 AM, Edward J. Yoon <[email protected]> > wrote: > >> Hi, I think you have to adding some swap space. Did you figure out >> what's problem? >> >> On Fri, Sep 4, 2015 at 8:20 AM, Behroz Sikander <[email protected]> >> wrote: >> > More info on this: >> > I noticed that only 2 machines were failing with OutOfMemory. After >> messing >> > around, I found out that the swap memory was 0 for these 2 machines but >> > others had swap space of 1 GB. I added the swap to these machines and it >> > worked. But as expected in the next run of algorithm with more data it >> > crashed again. This time GroomChildProcess crashed with the following log >> > message >> > >> > >> > *OpenJDK 64-Bit Server VM warning: INFO: >> > os::commit_memory(0x00000007fa100000, 42467328, 0) failed; error='Cannot >> > allocate memory' (errno=12)* >> > *#* >> > *# There is insufficient memory for the Java Runtime Environment to >> > continue.* >> > *# Native memory allocation (malloc) failed to allocate 42467328 bytes >> for >> > committing reserved memory.* >> > *# An error report file with more information is saved as:* >> > *# >> > >> /home/behroz/Documents/Packages/tmp_data/hama_tmp/bsp/local/groomServer/attempt_201509040050_0004_000006_0/work/hs_err_pid28850.log* >> > >> > My slave machines have 8GB of RAM, 4 CPUs, 20 GB harddrive and 1GB swap. >> I >> > run 3 groom child process each taking 2GB of RAM. Apart from >> > GroomChildProcess, I have GroomServer, DataNode and TaskManager running >> on >> > the slave. After assigning 2GB ram to 3 child groom process (total 6GB >> > RAM), only 2 GB of RAM is left for others. Do you think this is the >> problem >> > ? >> > >> > Regards, >> > Behroz >> > >> > On Thu, Sep 3, 2015 at 11:39 PM, Behroz Sikander <[email protected]> >> wrote: >> > >> >> Ok I found a strange thing. In my hadoop folder, I found a new file >> named >> >> "hs_err_pid4919.log" inside the $HADOOP_HOME directory. >> >> >> >> The content of the file are >> >> >> >> *# Increase physical memory or swap space* >> >> *# Check if swap backing store is full* >> >> *# Use 64 bit Java on a 64 bit OS* >> >> *# Decrease Java heap size (-Xmx/-Xms)* >> >> *# Decrease number of Java threads* >> >> *# Decrease Java thread stack sizes (-Xss)* >> >> *# Set larger code cache with -XX:ReservedCodeCacheSize=* >> >> *# This output file may be truncated or incomplete.* >> >> *#* >> >> *# Out of Memory Error (os_linux.cpp:2809), pid=4919, >> tid=140564483778304* >> >> *#* >> >> *# JRE version: OpenJDK Runtime Environment (7.0_79-b14) (build >> >> 1.7.0_79-b14)* >> >> *# Java VM: OpenJDK 64-Bit Server VM (24.79-b02 mixed mode linux-amd64 >> >> compressed oops)* >> >> *# Derivative: IcedTea 2.5.6* >> >> *# Distribution: Ubuntu 14.04 LTS, package 7u79-2.5.6-0ubuntu1.14.04.1* >> >> *# Failed to write core dump. Core dumps have been disabled. To enable >> >> core dumping, try "ulimit -c unlimited" before starting Java again* >> >> *#* >> >> >> >> *--------------- T H R E A D ---------------* >> >> >> >> *Current thread (0x00007fd7c0438800): JavaThread "PacketResponder: >> >> BP-1786576942-141.40.254.14-1441293753577:blk_1074136820_396012, >> >> type=HAS_DOWNSTREAM_IN_PIPELINE" daemon [_thread_new, id=11943, >> >> stack(0x00007fd7b80fa000,0x00007fd7b81fb000)]* >> >> >> >> *Stack: [0x00007fd7b80fa000,0x00007fd7b81fb000], sp=0x00007fd7b81f9be0, >> >> free space=1022k* >> >> *Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, >> C=native >> >> code)* >> >> >> >> I think my DataNode process is crashing. I now know that it is a out of >> >> memory error but the reason is not sure. >> >> >> >> On Thu, Sep 3, 2015 at 10:25 PM, Behroz Sikander <[email protected]> >> >> wrote: >> >> >> >>> ok. HA = High Availability ? >> >>> >> >>> I am also trying to solve the following problem. But I do not >> understand >> >>> why I get the exception because my algorithm does not have a lot of >> data >> >>> that is being sent to master. >> >>> *'BSP task process exit with nonzero status of 1'* >> >>> >> >>> Each slave node processes some data and sends back a Double array of >> size >> >>> 96 to the master machine. Recently, I was testing the algorithm on 8000 >> >>> files when it crashed. This means that 8000 double arrays of size 96 >> are >> >>> sent to the master to process. Once master receives all the data, it >> gets >> >>> out of sync and starts the processing again. Here is the calculation >> >>> >> >>> 8000 * 96 * 8 (size of float) = 6144000 = ~6.144 MB. >> >>> >> >>> I am not sure but this does not seem to be alot of data and I think >> >>> message manager that you mentioned should be able to handle it. >> >>> >> >>> Regards, >> >>> Behroz >> >>> >> >>> On Tue, Sep 1, 2015 at 1:07 PM, Edward J. Yoon <[email protected]> >> >>> wrote: >> >>> >> >>>> I'm reading GroomServer code and its taskMonitorService. It seems >> >>>> related with cluster HA. >> >>>> >> >>>> On Sat, Aug 29, 2015 at 1:16 PM, Edward J. Yoon < >> [email protected]> >> >>>> wrote: >> >>>> >> If my Groom Child Process fails for some reason, the processes are >> >>>> not killed automatically >> >>>> > >> >>>> > I also experienced this problem before. I guess, if one of processes >> >>>> > crashed with OutOfMemory, other processes infinitely waiting for it. >> >>>> > This is a bug. >> >>>> > >> >>>> > On Sat, Aug 29, 2015 at 1:02 AM, Behroz Sikander < >> [email protected]> >> >>>> wrote: >> >>>> >> Just another quick question. If my Groom Child Process fails for >> some >> >>>> >> reason, the processes are not killed automatically. If i run JPS >> >>>> command, I >> >>>> >> can still see something like "3791 GroomServer$BSPPeerChild". Is >> this >> >>>> the >> >>>> >> expected behavior ? >> >>>> >> >> >>>> >> I am using latest hama version (0.7.0). >> >>>> >> Regards, >> >>>> >> Behroz >> >>>> >> >> >>>> >> On Fri, Aug 28, 2015 at 4:12 PM, Behroz Sikander < >> [email protected]> >> >>>> wrote: >> >>>> >> >> >>>> >>> Ok I will try it out. >> >>>> >>> >> >>>> >>> No, actually I am learning alot by facing these problems. It is >> >>>> actually a >> >>>> >>> good thing :D >> >>>> >>> >> >>>> >>> Regards, >> >>>> >>> Behroz >> >>>> >>> >> >>>> >>> On Fri, Aug 28, 2015 at 5:52 AM, Edward J. Yoon < >> >>>> [email protected]> >> >>>> >>> wrote: >> >>>> >>> >> >>>> >>>> > message managers. Hmmm, I will recheck my logic related to >> >>>> messages. Btw >> >>>> >>>> >> >>>> >>>> Serialization (like GraphJobMessage) is good idea. It stores >> >>>> multiple >> >>>> >>>> messages in serialized form in a single object to reduce the >> memory >> >>>> >>>> usage and RPC overhead. >> >>>> >>>> >> >>>> >>>> > what is the limit of these message managers ? How much data at >> a >> >>>> single >> >>>> >>>> > time they can handle ? >> >>>> >>>> >> >>>> >>>> It depends on memory. >> >>>> >>>> >> >>>> >>>> > P.S. Each day, as I am moving towards a big cluster I am >> running >> >>>> into >> >>>> >>>> > problems (alot of them :D). >> >>>> >>>> >> >>>> >>>> Haha, sorry for inconvenient and thanks for your reports. >> >>>> >>>> >> >>>> >>>> On Fri, Aug 28, 2015 at 11:25 AM, Behroz Sikander < >> >>>> [email protected]> >> >>>> >>>> wrote: >> >>>> >>>> > Ok. So, I do have a memory problem. I will try to scale out. >> >>>> >>>> > >> >>>> >>>> > *>>Each task processor has two message manager, one for >> outgoing >> >>>> and >> >>>> >>>> one* >> >>>> >>>> > >> >>>> >>>> > *for incoming. All these are handled in memory, so it >> >>>> sometimesrequires >> >>>> >>>> > large memory space.* >> >>>> >>>> > So, you mean that before barrier synchronization, I have alot >> of >> >>>> data in >> >>>> >>>> > message managers. Hmmm, I will recheck my logic related to >> >>>> messages. Btw >> >>>> >>>> > what is the limit of these message managers ? How much data at >> a >> >>>> single >> >>>> >>>> > time they can handle ? >> >>>> >>>> > >> >>>> >>>> > P.S. Each day, as I am moving towards a big cluster I am >> running >> >>>> into >> >>>> >>>> > problems (alot of them :D). >> >>>> >>>> > >> >>>> >>>> > Regards, >> >>>> >>>> > Behroz Sikander >> >>>> >>>> > >> >>>> >>>> > On Fri, Aug 28, 2015 at 4:04 AM, Edward J. Yoon < >> >>>> [email protected]> >> >>>> >>>> > wrote: >> >>>> >>>> > >> >>>> >>>> >> > for 3 Groom child process + 2GB for Ubuntu OS). Is this >> correct >> >>>> >>>> >> > understanding ? >> >>>> >>>> >> >> >>>> >>>> >> and, >> >>>> >>>> >> >> >>>> >>>> >> > on a big dataset. I think these exceptions have something to >> >>>> do with >> >>>> >>>> >> Ubuntu >> >>>> >>>> >> > OS killing the hama process due to lack of memory. So, I was >> >>>> curious >> >>>> >>>> >> about >> >>>> >>>> >> >> >>>> >>>> >> Yes, you're right. >> >>>> >>>> >> >> >>>> >>>> >> Each task processor has two message manager, one for outgoing >> >>>> and one >> >>>> >>>> >> for incoming. All these are handled in memory, so it sometimes >> >>>> >>>> >> requires large memory space. To solve the OutOfMemory issue, >> you >> >>>> >>>> >> should scale-out your cluster by increasing the number of >> nodes >> >>>> and >> >>>> >>>> >> job tasks, or optimize your algorithm. Another option is >> >>>> >>>> >> disk-spillable message manager. This is not supported yet. >> >>>> >>>> >> >> >>>> >>>> >> On Fri, Aug 28, 2015 at 10:45 AM, Behroz Sikander < >> >>>> [email protected]> >> >>>> >>>> >> wrote: >> >>>> >>>> >> > Hi, >> >>>> >>>> >> > Yes. According to hama-default.xml, each machine will open 3 >> >>>> process >> >>>> >>>> with >> >>>> >>>> >> > 2GB memory each. This means that my VMs need atleast 8GB >> >>>> memory (2GB >> >>>> >>>> each >> >>>> >>>> >> > for 3 Groom child process + 2GB for Ubuntu OS). Is this >> correct >> >>>> >>>> >> > understanding ? >> >>>> >>>> >> > >> >>>> >>>> >> > I recently ran into the following exceptions when I was >> trying >> >>>> to run >> >>>> >>>> >> hama >> >>>> >>>> >> > on a big dataset. I think these exceptions have something to >> >>>> do with >> >>>> >>>> >> Ubuntu >> >>>> >>>> >> > OS killing the hama process due to lack of memory. So, I was >> >>>> curious >> >>>> >>>> >> about >> >>>> >>>> >> > my configurations. >> >>>> >>>> >> > 'BSP task process exit with nonzero status of 137.' >> >>>> >>>> >> > 'BSP task process exit with nonzero status of 1' >> >>>> >>>> >> > >> >>>> >>>> >> > >> >>>> >>>> >> > >> >>>> >>>> >> > Regards, >> >>>> >>>> >> > Behroz >> >>>> >>>> >> > >> >>>> >>>> >> > On Fri, Aug 28, 2015 at 3:04 AM, Edward J. Yoon < >> >>>> >>>> [email protected]> >> >>>> >>>> >> > wrote: >> >>>> >>>> >> > >> >>>> >>>> >> >> Hi, >> >>>> >>>> >> >> >> >>>> >>>> >> >> You can change the max tasks per node by setting below >> >>>> property in >> >>>> >>>> >> >> hama-site.xml. :-) >> >>>> >>>> >> >> >> >>>> >>>> >> >> <property> >> >>>> >>>> >> >> <name>bsp.tasks.maximum</name> >> >>>> >>>> >> >> <value>3</value> >> >>>> >>>> >> >> <description>The maximum number of BSP tasks that will >> be >> >>>> run >> >>>> >>>> >> >> simultaneously >> >>>> >>>> >> >> by a groom server.</description> >> >>>> >>>> >> >> </property> >> >>>> >>>> >> >> >> >>>> >>>> >> >> >> >>>> >>>> >> >> On Fri, Aug 28, 2015 at 5:18 AM, Behroz Sikander < >> >>>> >>>> [email protected]> >> >>>> >>>> >> >> wrote: >> >>>> >>>> >> >> > Hi, >> >>>> >>>> >> >> > Recently, I noticed that my hama deployment is only >> opening >> >>>> 3 >> >>>> >>>> >> processes >> >>>> >>>> >> >> per >> >>>> >>>> >> >> > machine. This is because of the configuration settings in >> >>>> the >> >>>> >>>> default >> >>>> >>>> >> >> hama >> >>>> >>>> >> >> > file. >> >>>> >>>> >> >> > >> >>>> >>>> >> >> > My questions is why 3 and why not 5 or 7 ? What >> criteria's >> >>>> should >> >>>> >>>> be >> >>>> >>>> >> >> > considered if I want to increase the value ? >> >>>> >>>> >> >> > >> >>>> >>>> >> >> > Regards, >> >>>> >>>> >> >> > Behroz >> >>>> >>>> >> >> >> >>>> >>>> >> >> >> >>>> >>>> >> >> >> >>>> >>>> >> >> -- >> >>>> >>>> >> >> Best Regards, Edward J. Yoon >> >>>> >>>> >> >> >> >>>> >>>> >> >> >>>> >>>> >> >> >>>> >>>> >> >> >>>> >>>> >> -- >> >>>> >>>> >> Best Regards, Edward J. Yoon >> >>>> >>>> >> >> >>>> >>>> >> >>>> >>>> >> >>>> >>>> >> >>>> >>>> -- >> >>>> >>>> Best Regards, Edward J. Yoon >> >>>> >>>> >> >>>> >>> >> >>>> >>> >> >>>> > >> >>>> > >> >>>> > >> >>>> > -- >> >>>> > Best Regards, Edward J. Yoon >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Best Regards, Edward J. Yoon >> >>>> >> >>> >> >>> >> >> >> >> >> >> -- >> Best Regards, Edward J. Yoon >>
-- Best Regards, Edward J. Yoon
