On Mon, Sep 22, 2014 at 2:10 PM, Matthew Saltz <sal...@gmail.com> wrote:
> In the logs for the workers, do you have a line that looks like:
> 2014-09-21 18:12:13,021 INFO org.apache.giraph.worker.BspServiceWorker:
> finishSuperstep: Waiting on all requests, superstep 93 Memory
> (free/total/max) = 21951.08M / 36456.50M / 43691.00M
>
> Looking at the memory usage in the worker that fails at the end of
superstep
> before failure could give you a clue.

Yes, all four workers when I use "-w 4" have those lines:

Task Logs: 'attempt_201409191450_0016_m_000001_0': compute-0-1:
2014-09-25 09:28:13,425 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep -1 Memory
(free/total/max) = 242.41M / 438.06M / 1820.50M
2014-09-25 09:28:13,817 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 0 Memory
(free/total/max) = 194.77M / 438.06M / 1820.50M
2014-09-25 09:28:14,936 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 1 Memory
(free/total/max) = 383.74M / 600.38M / 1820.50M
2014-09-25 09:28:17,820 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 2 Memory
(free/total/max) = 362.14M / 1007.50M / 1820.50M
2014-09-25 09:28:31,680 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 3 Memory
(free/total/max) = 203.33M / 1661.50M / 1820.50M

Task Logs: 'attempt_201409191450_0016_m_000002_0': compute-0-1:
2014-09-25 09:28:13,458 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep -1 Memory
(free/total/max) = 887.74M / 964.50M / 1820.50M
2014-09-25 09:28:14,381 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 0 Memory
(free/total/max) = 830.14M / 964.50M / 1820.50M
2014-09-25 09:28:15,337 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 1 Memory
(free/total/max) = 785.66M / 1217.00M / 1820.50M
2014-09-25 09:28:18,114 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 2 Memory
(free/total/max) = 661.72M / 1113.50M / 1820.50M
2014-09-25 09:28:52,451 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 3 Memory
(free/total/max) = 285.90M / 1831.00M / 1831.00M

Task Logs: 'attempt_201409191450_0016_m_000003_0': wright:
2014-09-25 09:28:13,456 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep -1 Memory
(free/total/max) = 886.23M / 964.50M / 1820.50M
2014-09-25 09:28:14,399 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 0 Memory
(free/total/max) = 826.36M / 964.50M / 1820.50M
2014-09-25 09:28:15,556 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 1 Memory
(free/total/max) = 662.50M / 1217.00M / 1820.50M
2014-09-25 09:28:18,170 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 2 Memory
(free/total/max) = 581.14M / 1115.00M / 1820.50M
2014-09-25 09:29:31,673 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 3 Memory
(free/total/max) = 299.61M / 1834.00M / 1834.00M

Task Logs: 'attempt_201409191450_0016_m_000004_0': wright:
2014-09-25 09:28:13,473 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep -1 Memory
(free/total/max) = 887.10M / 964.50M / 1820.50M
2014-09-25 09:28:14,374 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 0 Memory
(free/total/max) = 826.65M / 964.50M / 1820.50M
2014-09-25 09:28:15,755 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 1 Memory
(free/total/max) = 980.33M / 1217.00M / 1820.50M
2014-09-25 09:28:18,254 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 2 Memory
(free/total/max) = 517.13M / 1128.50M / 1820.50M
2014-09-25 09:29:34,392 INFO org.apache.giraph.worker.BspServiceWorker:
finishSuperstep: Waiting on all requests, superstep 3 Memory
(free/total/max) = 271.52M / 1858.50M / 1858.50M


I'm still not clear on a couple of things:

   1. Each compute node has 16GB of memory, but each task has a max of
   ~1820M (<2GB). In Cloudera's web UI, I set "MapReduce Child Java Maximum
   Heap Size" to 2GB (default is 1GB). I will try upping it to 8GB.
   2. I still don't understand why only two of my five possible nodes are
   being used.

Thank you.



-- 
Matthew Cornell | m...@matthewcornell.org | 413-626-3621 | 34 Dickinson
Street, Amherst MA 01002 | matthewcornell.org

Reply via email to