Re: RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-03-02 Thread Ing. Alessio Arleo
Hello Sai Ganesh I am using Giraph 1.2.0 release on Hadoop 2.6. 0 so I can confirm you that can be compiled correctly. Which version of Java are you using? Best Regards, Alessio Arleo On 2 Mar 2017, 09:32 -0800, José Luis Larroque , wrote: > You aren't setting yh (yarn heap), and without this p

Re: RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-03-02 Thread José Luis Larroque
You aren't setting yh (yarn heap), and without this parameter, every container will have 1024 MB by default. You should use -yh 10240 (same value of mapreduce.map.memory.mb). You should ask about giraph 1.2 compilation in a separate email for a better understanting. I didn't compile it yet, so i'm

Re: RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-03-02 Thread Sai Ganesh Muthuraman
Hi, I tried building giraph-1.2.0 with yarn profile The build was successful when I just tried *mvn -DskipTests package.* But my hadoop version is hadoop-2.6.0. So I removed that build. I tried installing using the command *mvn -Phadoop_yarn -Dhadoop.version=2.6.0 -DskipTests package* I am get

Re: RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-03-02 Thread Sai Ganesh Muthuraman
Hi, I am still using Giraph-1.1.0 and I will upgrade soon. Thanks a lot for your suggestions. This is the exact command I used hadoop jar myjar.jar org.apache.giraph.GiraphRunner MyBetweenness.BetweennessComputation -vif MyBetweenness.VertexDataInputFormat -vip /user/$USER/inputbc/wiki-Vote -vo

Re: RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-03-02 Thread José Luis Larroque
There are lots of suggestions to deal with that problem. First ones: - Decrease the number of workers, to 1 per node, for maximize the amount of RAM that each worker have. Xmx and Xms should be the same, this is a good practice in every java environment as fas as i know. - Put here the exact comma

Re: RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-03-02 Thread Sai Ganesh Muthuraman
Hi Jose, I went through the container logs and found that the following error was happening java.lang.OutOfMemoryError: Java heap space This was probably causing the missing chosen workers error. This happens only when the graph size exceeds more than 50k vertices and 100k edges. I enabled out

Re: RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-02-27 Thread José Luis Larroque
Could be a lot of different reasons. Memory problems, algorithm problems, etc. I recommend you to focus in reach the logs instead of guessing why the worker's are dying. Maybe you are looking in the wrong place, maybe you can access to them though web ui instead of command line. >From terminal, d

Re: RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-02-27 Thread Sai Ganesh Muthuraman
Hi, The first container in the application logs usually contains the gam logs. But the first container logs are not available. Hence no gam logs.  What could be the possible reasons for the dying of some workers? Sai Ganesh On Feb 25, 2017, at 9:30 PM, José Luis Larroque wrote: You are proba

Re: RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-02-25 Thread José Luis Larroque
You are probably looking at your giraph application manager (gam) logs. You should look for your workers logs, each one have a log (container's logs). If you can't find them, you should look at your yarn configuration in order to know where are them, see this: http://stackoverflow.com/questions/216

Re: RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-02-25 Thread Sai Ganesh Muthuraman
Hi Jose, Which logs do I have to look into exactly, because in the application logs, I found the error message that I mentioned and it was also mentioned that there was **No good last checkpoint.** I am not able to figure out the reason for the failure of a worker for bigger files. What do I ha

Re: RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-02-25 Thread José Luis Larroque
Hi Ganesh, For some reason, some of your workers are dying. When that happens, giraph automatically detects that the amount of workers is below neccesary on " barrierOnWorkerList" and search if a checkpoint exists (a checkpoint is a backup of the state of a Giraph application). You don't have chec

Re: RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-02-24 Thread Sai Ganesh Muthuraman
Hi, I used one worker per node and that worked for smaller files. When the file size was more than 25 MB, I got this strange exception. I tried using 2 nodes and 3 nodes, the result is the same. **ERROR** [org.apache.giraph.master.MasterThread] master.BspServiceMaster (BspServiceMaster.java:ba

Re: RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-02-22 Thread José Luis Larroque
I remember that a good practice is using 1 worker per node, there are several emails recommending this. It's the best way to use the maximum RAM available in the cluster i believe. Bye -- *José Luis Larroque* Analista Programador Universitario - Facultad de Informática - UNLP Desarrollador Jav

Re: RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-02-22 Thread Ramesh Krishnan
HI Ganesh, Recommendation is to increase the number of nodes with lesser ram size. Your number of executors depend on the CPU core hence, i would recommend using 60 GB RAM cpu's with 2 executors each for your use case. Thanks Ramesh On Wed, Feb 22, 2017 at 3:27 PM, Sai Ganesh Muthuraman < saigan

RELATION BETWEEN THE NUMBER OF GIRAPH WORKERS AND THE PROBLEM SIZE

2017-02-22 Thread Sai Ganesh Muthuraman
Hi, I am running a giraph application in the XSEDE comet cluster for graphs of different sizes.  For a graph with 10,000 edges, I used about 8 workers on 2 nodes, each node having 128GB RAM. My input file itself is just about 200KB.  But when I tried to increase the number of workers to 20 or mo