Hi Ameya, thanks for the answer. My allocated memory was too high. My server has altogether 4000M. I have turned the memory down to 2000M for each Mapper.
Now I have set both out of core options and get the following exception: 2013-12-05 23:10:18,568 INFO org.apache.hadoop.mapred.JobTracker: Adding task (MAP) 'attempt_201312052304_0001_m_000001_0' to tip task_201312052304_0001_m_000001, for tracker 'tracker_hduser:localhost/ 127.0.0.1:39793' 2013-12-05 23:10:27,645 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201312052304_0001_m_000001_0: java.lang.IllegalStateException: run: Caught an unrecoverable exception waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by: java.lang.IllegalStateException: waitFor: ExecutionException occurred while waiting for org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:181) at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:139) at org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:124) at org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:87) at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:221) at org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:281) at org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:325) at org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506) at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:244) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91) ... 7 more Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: getOrCreatePartition: cannot retrieve partition 0 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262) at java.util.concurrent.FutureTask.get(FutureTask.java:119) at org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:300) at org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:173) ... 16 more Caused by: java.lang.IllegalStateException: getOrCreatePartition: cannot retrieve partition 0 at org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:243) at org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:110) at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:482) at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:276) at org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:172) at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:267) at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:211) at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60) at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:228) ... 13 more Caused by: java.lang.NullPointerException at org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:692) at org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:658) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at org.apache.giraph.partition.DiskBackedPartitionStore$DirectExecutorService.execute(DiskBackedPartitionStore.java:972) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132) ... 14 more Thanks, Sebastian 2013/12/5 Ameya Vilankar <ameya.vilan...@gmail.com> > Each worker is allocated *mapred.child.java.opts *memory, which in your > case is 4000M. Check if your server doesn't have enough memory for 2 > Mappers. Also the out of memory option is available in two forms. > 1. Out of core graph > 2. Out of core messages. > > Currently you are setting only the out of core graph and not the out of > core messages. Enable both of them. More information about options can be > found here: http://giraph.apache.org/options.html > set -D giraph.useOutOfCoreGraph=true -D giraph.useOutOfCoreMessages=true > while passing options to GiraphRunner. > > Thanks, > Ameya > > > On Thu, Dec 5, 2013 at 12:39 PM, Sebastian Stipkovic < > sebastian.stipko...@gmail.com> wrote: > >> Hello, >> >> I had setup giraph 1.1.0 with hadoop-0.20.203.0rc1 on a single >> node cluster. It computes a tiny graph successful. But if the >> input graph is huge (5 GB), I get an OutOfMemory(Garbage Collector) >> exception, although I had turned on the out-of-memory-option. The job >> with out-of-memory-option works only well with a tiny graph (0.9 GB). What >> is Wrong? Does I have to do furthermore configurations? >> >> My Configurations are as follows: >> >> >> namevalue *fs.s3n.impl*org.apache.hadoop.fs.s3native.NativeS3FileSystem >> *mapred.task.cache.levels*2 *giraph.vertexOutputFormatClass* >> org.apache.giraph.examples.MyShortestPaths$MyOutputFormat >> *hadoop.tmp.dir*/app/hadoop/tmp *hadoop.native.lib*true >> *map.sort.class*org.apache.hadoop.util.QuickSort >> *dfs.namenode.decommission.nodes.per.interval*5 >> *dfs.https.need.client.auth*false *ipc.client.idlethreshold*4000 >> *dfs.datanode.data.dir.perm*755 *mapred.system.dir* >> ${hadoop.tmp.dir}/mapred/system >> *mapred.job.tracker.persist.jobstatus.hours*0 *dfs.datanode.address* >> 0.0.0.0:50010 *dfs.namenode.logging.level*info >> *dfs.block.access.token.enable*false *io.skip.checksum.errors*false >> *fs.default.name >> <http://fs.default.name>*hdfs://localhost:54310 >> *mapred.cluster.reduce.memory.mb*-1 *mapred.child.tmp*./tmp >> *fs.har.impl.disable.cache*true *dfs.safemode.threshold.pct*0.999f >> *mapred.skip.reduce.max.skip.groups*0 *dfs.namenode.handler.count*10 >> *dfs.blockreport.initialDelay*0 *mapred.heartbeats.in.second*100 >> *mapred.tasktracker.dns.nameserver*default *io.sort.factor*10 >> *mapred.task.timeout*600000 *giraph.maxWorkers*1 >> *mapred.max.tracker.failures*4 *hadoop.rpc.socket.factory.class.default* >> org.apache.hadoop.net.StandardSocketFactory >> *mapred.job.tracker.jobhistory.lru.cache.size*5 *fs.hdfs.impl* >> org.apache.hadoop.hdfs.DistributedFileSystem >> *mapred.queue.default.acl-administer-jobs** >> *dfs.block.access.key.update.interval*600 >> *mapred.skip.map.auto.incr.proc.count*true >> *mapreduce.job.complete.cancel.delegation.tokens*true >> *io.mapfile.bloom.size*1048576 >> *mapreduce.reduce.shuffle.connect.timeout*180000 >> *dfs.safemode.extension*30000 >> *mapred.jobtracker.blacklist.fault-timeout-window*180 >> *tasktracker.http.threads*40 *mapred.job.shuffle.merge.percent*0.66 >> *mapreduce.inputformat.class*org.apache.giraph.bsp.BspInputFormat >> *fs.ftp.impl*org.apache.hadoop.fs.ftp.FTPFileSystem *user.name >> <http://user.name>*hduser *mapred.output.compress*false >> *io.bytes.per.checksum*512 *giraph.isStaticGraph*true >> *mapred.healthChecker.script.timeout*600000 >> *topology.node.switch.mapping.impl* >> org.apache.hadoop.net.ScriptBasedMapping >> *dfs.https.server.keystore.resource*ssl-server.xml >> *mapred.reduce.slowstart.completed.maps*0.05 >> *mapred.reduce.max.attempts*4 *fs.ramfs.impl* >> org.apache.hadoop.fs.InMemoryFileSystem >> *dfs.block.access.token.lifetime*600 *dfs.name.edits.dir*${dfs.name.dir} >> *mapred.skip.map.max.skip.records*0 *mapred.cluster.map.memory.mb*-1 >> *hadoop.security.group.mapping* >> org.apache.hadoop.security.ShellBasedUnixGroupsMapping >> *mapred.job.tracker.persist.jobstatus.dir*/jobtracker/jobsInfo >> *mapred.jar*hdfs://localhost:54310 >> /app/hadoop/tmp/mapred/staging/hduser/.staging/job_201312051827_0001/job.jar >> *dfs.block.size*67108864 *fs.s3.buffer.dir*${hadoop.tmp.dir}/s3 >> *job.end.retry.attempts*0 *fs.file.impl* >> org.apache.hadoop.fs.LocalFileSystem *mapred.local.dir.minspacestart*0 >> *mapred.output.compression.type*RECORD *dfs.datanode.ipc.address* >> 0.0.0.0:50020 *dfs.permissions*true *topology.script.number.args*100 >> *io.mapfile.bloom.error.rate*0.005 *mapred.cluster.max.reduce.memory.mb* >> -1 *mapred.max.tracker.blacklists*4 *mapred.task.profile.maps*0-2 >> *dfs.datanode.https.address*0.0.0.0:50475 *mapred.userlog.retain.hours* >> 24 *dfs.secondary.http.address*0.0.0.0:50090 *dfs.replication.max*512 >> *mapred.job.tracker.persist.jobstatus.active*false >> *hadoop.security.authorization*false *local.cache.size*10737418240 >> *dfs.namenode.delegation.token.renew-interval*86400000 >> *mapred.min.split.size*0 *mapred.map.tasks*2 *mapred.child.java.opts* >> -Xmx4000m *mapreduce.job.counters.limit*120 >> *dfs.https.client.keystore.resource*ssl-client.xml *mapred.job.queue.name >> <http://mapred.job.queue.name>*default *dfs.https.address*0.0.0.0:50470 >> *mapred.job.tracker.retiredjobs.cache.size*1000 >> *dfs.balance.bandwidthPerSec*1048576 *ipc.server.listen.queue.size*128 >> *mapred.inmem.merge.threshold*1000 *job.end.retry.interval*30000 >> *mapred.skip.attempts.to.start.skipping*2 *fs.checkpoint.dir* >> ${hadoop.tmp.dir}/dfs/namesecondary *mapred.reduce.tasks*0 >> *mapred.merge.recordsBeforeProgress*10000 *mapred.userlog.limit.kb*0 >> *mapred.job.reduce.memory.mb*-1 *dfs.max.objects*0 >> *webinterface.private.actions*false *io.sort.spill.percent*0.80 >> *mapred.job.shuffle.input.buffer.percent*0.70 *mapred.job.name >> <http://mapred.job.name>*Giraph: >> org.apache.giraph.examples.MyShortestPaths *dfs.datanode.dns.nameserver* >> default *mapred.map.tasks.speculative.execution*false >> *hadoop.util.hash.type*murmur *dfs.blockreport.intervalMsec*3600000 >> *mapred.map.max.attempts*0 *mapreduce.job.acl-view-job* >> *dfs.client.block.write.retries*3 *mapred.job.tracker.handler.count*10 >> *mapreduce.reduce.shuffle.read.timeout*180000 >> *mapred.tasktracker.expiry.interval*600000 *dfs.https.enable*false >> *mapred.jobtracker.maxtasks.per.job*-1 >> *mapred.jobtracker.job.history.block.size*3145728 >> *giraph.useOutOfCoreGiraph*true *keep.failed.task.files*false >> *mapreduce.outputformat.class*org.apache.giraph.bsp.BspOutputFormat >> *dfs.datanode.failed.volumes.tolerated*0 *ipc.client.tcpnodelay*false >> *mapred.task.profile.reduces*0-2 *mapred.output.compression.codec* >> org.apache.hadoop.io.compress.DefaultCodec *io.map.index.skip*0 >> *mapred.working.dir*hdfs://localhost:54310/user/hduser >> *ipc.server.tcpnodelay*false >> *mapred.jobtracker.blacklist.fault-bucket-width*15 >> *dfs.namenode.delegation.key.update-interval*86400000 >> *mapred.used.genericoptionsparser*true *mapred.mapper.new-api*true >> *mapred.job.map.memory.mb*-1 *giraph.vertex.input.dir*hdfs://localhost: >> 54310/user/hduser/output *dfs.default.chunk.view.size*32768 >> *hadoop.logfile.size*10000000 >> *mapred.reduce.tasks.speculative.execution*true *mapreduce.job.dir* >> hdfs://localhost:54310 >> /app/hadoop/tmp/mapred/staging/hduser/.staging/job_201312051827_0001 >> *mapreduce.tasktracker.outofband.heartbeat*false >> *mapreduce.reduce.input.limit*-1 *dfs.datanode.du.reserved*0 >> *hadoop.security.authentication*simple *fs.checkpoint.period*3600 >> *dfs.web.ugi*webuser,webgroup *mapred.job.reuse.jvm.num.tasks*1 >> *mapred.jobtracker.completeuserjobs.maximum*100 *dfs.df.interval*60000 >> *dfs.data.dir*${hadoop.tmp.dir}/dfs/data >> *mapred.task.tracker.task-controller* >> org.apache.hadoop.mapred.DefaultTaskController *giraph.minWorkers*1 >> *fs.s3.maxRetries*4 *dfs.datanode.dns.interface*default >> *mapred.cluster.max.map.memory.mb*-1 *dfs.support.append*false >> *mapreduce.job.acl-modify-job* >> *dfs.permissions.supergroup*supergroup *mapred.local.dir* >> ${hadoop.tmp.dir}/mapred/local *fs.hftp.impl* >> org.apache.hadoop.hdfs.HftpFileSystem *fs.trash.interval*0 >> *fs.s3.sleepTimeSeconds*10 *dfs.replication.min*1 >> *mapred.submit.replication*10 *fs.har.impl* >> org.apache.hadoop.fs.HarFileSystem *mapred.map.output.compression.codec* >> org.apache.hadoop.io.compress.DefaultCodec >> *mapred.tasktracker.dns.interface*default >> *dfs.namenode.decommission.interval*30 *dfs.http.address*0.0.0.0:50070 >> *dfs.heartbeat.interval*3 *mapred.job.tracker*localhost:54311 >> *mapreduce.job.submithost*hduser *io.seqfile.sorter.recordlimit*1000000 >> *giraph.vertexInputFormatClass* >> org.apache.giraph.examples.MyShortestPaths$MyInputFormat *dfs.name.dir* >> ${hadoop.tmp.dir}/dfs/name *mapred.line.input.format.linespermap*1 >> *mapred.jobtracker.taskScheduler* >> org.apache.hadoop.mapred.JobQueueTaskScheduler >> *dfs.datanode.http.address*0.0.0.0:50075 *mapred.local.dir.minspacekill* >> 0 *dfs.replication.interval*3 *io.sort.record.percent*0.05 >> *fs.kfs.impl*org.apache.hadoop.fs.kfs.KosmosFileSystem *mapred.temp.dir* >> ${hadoop.tmp.dir}/mapred/temp *mapred.tasktracker.reduce.tasks.maximum*2 >> *mapreduce.job.user.classpath.first*true *dfs.replication*1 >> *fs.checkpoint.edits.dir*${fs.checkpoint.dir} *giraph.computationClass* >> org.apache.giraph.examples.MyShortestPaths >> *mapred.tasktracker.tasks.sleeptime-before-sigkill*5000 >> *mapred.job.reduce.input.buffer.percent*0.0 >> *mapred.tasktracker.indexcache.mb*10 >> *mapreduce.job.split.metainfo.maxsize*10000000 *hadoop.logfile.count*10 >> *mapred.skip.reduce.auto.incr.proc.count*true >> *mapreduce.job.submithostaddress*127.0.1.1 >> *io.seqfile.compress.blocksize*1000000 *fs.s3.block.size*67108864 >> *mapred.tasktracker.taskmemorymanager.monitoring-interval*5000 >> *giraph.minPercentResponded*100.0 *mapred.queue.default.state*RUNNING >> *mapred.acls.enabled*false *mapreduce.jobtracker.staging.root.dir* >> ${hadoop.tmp.dir}/mapred/staging *mapred.queue.names*default >> *dfs.access.time.precision*3600000 *fs.hsftp.impl* >> org.apache.hadoop.hdfs.HsftpFileSystem >> *mapred.task.tracker.http.address*0.0.0.0:50060 >> *mapred.reduce.parallel.copies*5 *io.seqfile.lazydecompress*true >> *mapred.output.dir*/user/hduser/output/shortestpaths *io.sort.mb*100 >> *ipc.client.connection.maxidletime*10000 *mapred.compress.map.output* >> false *hadoop.security.uid.cache.secs*14400 >> *mapred.task.tracker.report.address*127.0.0.1:0 >> *mapred.healthChecker.interval*60000 *ipc.client.kill.max*10 >> *ipc.client.connect.max.retries*10 *ipc.ping.interval*300000 >> *mapreduce.user.classpath.first*true *mapreduce.map.class* >> org.apache.giraph.graph.GraphMapper *fs.s3.impl* >> org.apache.hadoop.fs.s3.S3FileSystem *mapred.user.jobconf.limit*5242880 >> *mapred.job.tracker.http.address*0.0.0.0:50030 *io.file.buffer.size*4096 >> *mapred.jobtracker.restart.recover*false *io.serializations* >> org.apache.hadoop.io.serializer.WritableSerialization >> *dfs.datanode.handler.count*3 *mapred.reduce.copy.backoff*300 >> *mapred.task.profile*false *dfs.replication.considerLoad*true >> *jobclient.output.filter*FAILED >> *dfs.namenode.delegation.token.max-lifetime*604800000 >> *mapred.tasktracker.map.tasks.maximum*4 *io.compression.codecs* >> org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec >> *fs.checkpoint.size*67108864 >> >> Additionally, if I have more than one worker I get an Exception, too? Are >> my configurations wrong? >> >> >> best regards, >> Sebastian >> > >