Yep. there's a bug. Where're currently working on this for a fix. Should be ready in a few days.
On Thu, Jan 23, 2014 at 5:10 AM, Yingyi Bu <buyin...@gmail.com> wrote: > I just run into the same issue with the latest trunk version. > Does anybody know how to fix it? > > Best regards, > Yingyi > > > On Fri, Dec 6, 2013 at 8:27 AM, Sebastian Stipkovic < > sebastian.stipko...@gmail.com> wrote: > >> Hello, >> >> I have found a link, where someone describes the same problem: >> >> https://issues.apache.org/jira/browse/GIRAPH-788 >> >> Does somebody can help me? Does out-of-core-options runs only on >> particular hadoop? >> >> >> Thanks, >> Sebastian >> >> >> 2013/12/6 Sebastian Stipkovic <sebastian.stipko...@gmail.com> >> >>> Hi Rob, >>> >>> embarrassing. You are right. But now I get with the correct option the >>> following exception: >>> >>> >>> 2013-12-05 23:10:18,568 INFO org.apache.hadoop.mapred.JobTracker: Adding >>> task (MAP) 'attempt_201312052304_0001_m_000001_0' to tip >>> task_201312052304_0001_m_000001, for tracker 'tracker_hduser:localhost/ >>> 127.0.0.1:39793' 2013-12-05 23:10:27,645 INFO >>> org.apache.hadoop.mapred.TaskInProgress: Error from >>> attempt_201312052304_0001_m_000001_0: java.lang.IllegalStateException: run: >>> Caught an unrecoverable exception waitFor: ExecutionException occurred >>> while waiting for >>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at >>> org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:101) at >>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at >>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at >>> org.apache.hadoop.mapred.Child$4.run(Child.java:259) at >>> java.security.AccessController.doPrivileged(Native Method) at >>> javax.security.auth.Subject.doAs(Subject.java:415) at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) >>> at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by: >>> java.lang.IllegalStateException: waitFor: ExecutionException occurred while >>> waiting for >>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable@62bf5822 at >>> org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:181) >>> at >>> org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:139) >>> at >>> org.apache.giraph.utils.ProgressableUtils.waitForever(ProgressableUtils.java:124) >>> at >>> org.apache.giraph.utils.ProgressableUtils.getFutureResult(ProgressableUtils.java:87) >>> at >>> org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:221) >>> at >>> org.apache.giraph.worker.BspServiceWorker.loadInputSplits(BspServiceWorker.java:281) >>> at >>> org.apache.giraph.worker.BspServiceWorker.loadVertices(BspServiceWorker.java:325) >>> at >>> org.apache.giraph.worker.BspServiceWorker.setup(BspServiceWorker.java:506) >>> at >>> org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:244) >>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:91) ... 7 more >>> Caused by: java.util.concurrent.ExecutionException: >>> java.lang.IllegalStateException: getOrCreatePartition: cannot retrieve >>> partition 0 at >>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:262) at >>> java.util.concurrent.FutureTask.get(FutureTask.java:119) at >>> org.apache.giraph.utils.ProgressableUtils$FutureWaitable.waitFor(ProgressableUtils.java:300) >>> at >>> org.apache.giraph.utils.ProgressableUtils.waitFor(ProgressableUtils.java:173) >>> ... 16 more Caused by: java.lang.IllegalStateException: >>> getOrCreatePartition: cannot retrieve partition 0 at >>> org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:243) >>> at >>> org.apache.giraph.comm.requests.SendWorkerVerticesRequest.doRequest(SendWorkerVerticesRequest.java:110) >>> at >>> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:482) >>> at >>> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendVertexRequest(NettyWorkerClientRequestProcessor.java:276) >>> at >>> org.apache.giraph.worker.VertexInputSplitsCallable.readInputSplit(VertexInputSplitsCallable.java:172) >>> at >>> org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:267) >>> at >>> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:211) >>> at >>> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:60) >>> at >>> org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51) >>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at >>> java.util.concurrent.FutureTask.run(FutureTask.java:166) at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:724) Caused by: >>> java.util.concurrent.ExecutionException: java.lang.NullPointerException at >>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at >>> java.util.concurrent.FutureTask.get(FutureTask.java:111) at >>> org.apache.giraph.partition.DiskBackedPartitionStore.getOrCreatePartition(DiskBackedPartitionStore.java:228) >>> ... 13 more Caused by: java.lang.NullPointerException at >>> org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:692) >>> at >>> org.apache.giraph.partition.DiskBackedPartitionStore$GetPartition.call(DiskBackedPartitionStore.java:658) >>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at >>> java.util.concurrent.FutureTask.run(FutureTask.java:166) at >>> org.apache.giraph.partition.DiskBackedPartitionStore$DirectExecutorService.execute(DiskBackedPartitionStore.java:972) >>> at >>> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:132) >>> ... 14 more >>> >>> >>> Thanks, >>> Sebastian >>> >>> >>> 2013/12/5 Rob Vesse <rve...@dotnetrdf.org> >>> >>>> Sebastian >>>> >>>> You've made a minor typo in the configuration setting which means you >>>> haven't actually enabled out of core graph mode. >>>> >>>> You have *giraph.useOutOfCoreGiraph *when it should be >>>> *giraph.useOutOfCoreGraph >>>> *– note that the last word is Graph not Giraph >>>> >>>> Rob >>>> >>>> From: Sebastian Stipkovic <sebastian.stipko...@gmail.com> >>>> Reply-To: <user@giraph.apache.org> >>>> Date: Thursday, 5 December 2013 20:39 >>>> To: <user@giraph.apache.org> >>>> Subject: out of core option >>>> >>>> Hello, >>>> >>>> I had setup giraph 1.1.0 with hadoop-0.20.203.0rc1 on a single >>>> node cluster. It computes a tiny graph successful. But if the >>>> input graph is huge (5 GB), I get an OutOfMemory(Garbage Collector) >>>> exception, although I had turned on the out-of-memory-option. The job >>>> with out-of-memory-option works only well with a tiny graph (0.9 GB). What >>>> is Wrong? Does I have to do furthermore configurations? >>>> >>>> My Configurations are as follows: >>>> >>>> >>>> namevalue*fs.s3n.impl*org.apache.hadoop.fs.s3native.NativeS3FileSystem >>>> *mapred.task.cache.levels*2*giraph.vertexOutputFormatClass* >>>> org.apache.giraph.examples.MyShortestPaths$MyOutputFormat >>>> *hadoop.tmp.dir*/app/hadoop/tmp*hadoop.native.lib*true*map.sort.class*org.apache.hadoop.util.QuickSort >>>> *dfs.namenode.decommission.nodes.per.interval*5 >>>> *dfs.https.need.client.auth*false *ipc.client.idlethreshold*4000 >>>> *dfs.datanode.data.dir.perm*755*mapred.system.dir* >>>> ${hadoop.tmp.dir}/mapred/system >>>> *mapred.job.tracker.persist.jobstatus.hours*0*dfs.datanode.address* >>>> 0.0.0.0:50010*dfs.namenode.logging.level*info >>>> *dfs.block.access.token.enable* >>>> false*io.skip.checksum.errors*false*fs.default.name >>>> <http://fs.default.name>* hdfs://localhost:54310 >>>> *mapred.cluster.reduce.memory.mb*-1*mapred.child.tmp* ./tmp >>>> *fs.har.impl.disable.cache*true*dfs.safemode.threshold.pct*0.999f >>>> *mapred.skip.reduce.max.skip.groups*0*dfs.namenode.handler.count*10 >>>> *dfs.blockreport.initialDelay* 0*mapred.heartbeats.in.second*100 >>>> *mapred.tasktracker.dns.nameserver*default*io.sort.factor* 10 >>>> *mapred.task.timeout*600000*giraph.maxWorkers*1 >>>> *mapred.max.tracker.failures* 4 >>>> *hadoop.rpc.socket.factory.class.default* >>>> org.apache.hadoop.net.StandardSocketFactory >>>> *mapred.job.tracker.jobhistory.lru.cache.size* 5*fs.hdfs.impl* >>>> org.apache.hadoop.hdfs.DistributedFileSystem >>>> *mapred.queue.default.acl-administer-jobs* * >>>> *dfs.block.access.key.update.interval*600 >>>> *mapred.skip.map.auto.incr.proc.count*true >>>> *mapreduce.job.complete.cancel.delegation.tokens*true >>>> *io.mapfile.bloom.size*1048576 >>>> *mapreduce.reduce.shuffle.connect.timeout* 180000 >>>> *dfs.safemode.extension*30000 >>>> *mapred.jobtracker.blacklist.fault-timeout-window*180 >>>> *tasktracker.http.threads*40*mapred.job.shuffle.merge.percent*0.66 >>>> *mapreduce.inputformat.class* org.apache.giraph.bsp.BspInputFormat >>>> *fs.ftp.impl*org.apache.hadoop.fs.ftp.FTPFileSystem*user.name >>>> <http://user.name>* hduser*mapred.output.compress*false >>>> *io.bytes.per.checksum*512*giraph.isStaticGraph* true >>>> *mapred.healthChecker.script.timeout*600000 >>>> *topology.node.switch.mapping.impl* >>>> org.apache.hadoop.net.ScriptBasedMapping >>>> *dfs.https.server.keystore.resource*ssl-server.xml >>>> *mapred.reduce.slowstart.completed.maps*0.05 >>>> *mapred.reduce.max.attempts*4*fs.ramfs.impl* >>>> org.apache.hadoop.fs.InMemoryFileSystem >>>> *dfs.block.access.token.lifetime* 600*dfs.name.edits.dir* >>>> ${dfs.name.dir}*mapred.skip.map.max.skip.records*0 >>>> *mapred.cluster.map.memory.mb*-1*hadoop.security.group.mapping* >>>> org.apache.hadoop.security.ShellBasedUnixGroupsMapping >>>> *mapred.job.tracker.persist.jobstatus.dir*/jobtracker/jobsInfo >>>> *mapred.jar*hdfs://localhost:54310 >>>> /app/hadoop/tmp/mapred/staging/hduser/.staging/job_201312051827_0001/job.jar >>>> *dfs.block.size*67108864*fs.s3.buffer.dir*${hadoop.tmp.dir}/s3 >>>> *job.end.retry.attempts* 0*fs.file.impl* >>>> org.apache.hadoop.fs.LocalFileSystem*mapred.local.dir.minspacestart*0 >>>> *mapred.output.compression.type*RECORD*dfs.datanode.ipc.address* >>>> 0.0.0.0:50020 *dfs.permissions*true*topology.script.number.args*100 >>>> *io.mapfile.bloom.error.rate* 0.005 >>>> *mapred.cluster.max.reduce.memory.mb*-1*mapred.max.tracker.blacklists*4 >>>> *mapred.task.profile.maps*0-2*dfs.datanode.https.address*0.0.0.0:50475 >>>> *mapred.userlog.retain.hours*24*dfs.secondary.http.address* >>>> 0.0.0.0:50090 *dfs.replication.max*512 >>>> *mapred.job.tracker.persist.jobstatus.active*false >>>> *hadoop.security.authorization* false*local.cache.size*10737418240 >>>> *dfs.namenode.delegation.token.renew-interval*86400000 >>>> *mapred.min.split.size*0*mapred.map.tasks*2*mapred.child.java.opts*-Xmx4000m >>>> *mapreduce.job.counters.limit*120*dfs.https.client.keystore.resource* >>>> ssl-client.xml *mapred.job.queue.name <http://mapred.job.queue.name>* >>>> default*dfs.https.address*0.0.0.0:50470 >>>> *mapred.job.tracker.retiredjobs.cache.size*1000 >>>> *dfs.balance.bandwidthPerSec*1048576 *ipc.server.listen.queue.size* 128 >>>> *mapred.inmem.merge.threshold*1000*job.end.retry.interval*30000 >>>> *mapred.skip.attempts.to.start.skipping*2*fs.checkpoint.dir* >>>> ${hadoop.tmp.dir}/dfs/namesecondary*mapred.reduce.tasks* 0 >>>> *mapred.merge.recordsBeforeProgress*10000*mapred.userlog.limit.kb*0 >>>> *mapred.job.reduce.memory.mb*-1*dfs.max.objects*0 >>>> *webinterface.private.actions*false *io.sort.spill.percent*0.80 >>>> *mapred.job.shuffle.input.buffer.percent*0.70*mapred.job.name >>>> <http://mapred.job.name>* Giraph: >>>> org.apache.giraph.examples.MyShortestPaths*dfs.datanode.dns.nameserver* >>>> default*mapred.map.tasks.speculative.execution* false >>>> *hadoop.util.hash.type*murmur*dfs.blockreport.intervalMsec*3600000 >>>> *mapred.map.max.attempts*0*mapreduce.job.acl-view-job* >>>> *dfs.client.block.write.retries* 3*mapred.job.tracker.handler.count*10 >>>> *mapreduce.reduce.shuffle.read.timeout*180000 >>>> *mapred.tasktracker.expiry.interval*600000*dfs.https.enable*false >>>> *mapred.jobtracker.maxtasks.per.job* -1 >>>> *mapred.jobtracker.job.history.block.size*3145728 >>>> *giraph.useOutOfCoreGiraph*true *keep.failed.task.files*false >>>> *mapreduce.outputformat.class*org.apache.giraph.bsp.BspOutputFormat >>>> *dfs.datanode.failed.volumes.tolerated*0*ipc.client.tcpnodelay*false >>>> *mapred.task.profile.reduces* 0-2*mapred.output.compression.codec* >>>> org.apache.hadoop.io.compress.DefaultCodec*io.map.index.skip*0 >>>> *mapred.working.dir*hdfs://localhost:54310/user/hduser >>>> *ipc.server.tcpnodelay* false >>>> *mapred.jobtracker.blacklist.fault-bucket-width*15 >>>> *dfs.namenode.delegation.key.update-interval*86400000 >>>> *mapred.used.genericoptionsparser*true*mapred.mapper.new-api*true >>>> *mapred.job.map.memory.mb* -1*giraph.vertex.input.dir*hdfs://localhost: >>>> 54310/user/hduser/output *dfs.default.chunk.view.size*32768 >>>> *hadoop.logfile.size*10000000 >>>> *mapred.reduce.tasks.speculative.execution* true*mapreduce.job.dir* >>>> hdfs://localhost:54310 >>>> /app/hadoop/tmp/mapred/staging/hduser/.staging/job_201312051827_0001 >>>> *mapreduce.tasktracker.outofband.heartbeat*false >>>> *mapreduce.reduce.input.limit*-1*dfs.datanode.du.reserved* 0 >>>> *hadoop.security.authentication*simple*fs.checkpoint.period*3600 >>>> *dfs.web.ugi*webuser,webgroup*mapred.job.reuse.jvm.num.tasks*1 >>>> *mapred.jobtracker.completeuserjobs.maximum* 100*dfs.df.interval*60000 >>>> *dfs.data.dir*${hadoop.tmp.dir}/dfs/data >>>> *mapred.task.tracker.task-controller* >>>> org.apache.hadoop.mapred.DefaultTaskController*giraph.minWorkers*1 >>>> *fs.s3.maxRetries* 4*dfs.datanode.dns.interface*default >>>> *mapred.cluster.max.map.memory.mb*-1 *dfs.support.append*false >>>> *mapreduce.job.acl-modify-job* >>>> *dfs.permissions.supergroup* supergroup*mapred.local.dir* >>>> ${hadoop.tmp.dir}/mapred/local*fs.hftp.impl* >>>> org.apache.hadoop.hdfs.HftpFileSystem *fs.trash.interval*0 >>>> *fs.s3.sleepTimeSeconds*10*dfs.replication.min* 1 >>>> *mapred.submit.replication*10*fs.har.impl* >>>> org.apache.hadoop.fs.HarFileSystem*mapred.map.output.compression.codec* >>>> org.apache.hadoop.io.compress.DefaultCodec >>>> *mapred.tasktracker.dns.interface*default >>>> *dfs.namenode.decommission.interval* 30*dfs.http.address*0.0.0.0:50070 >>>> *dfs.heartbeat.interval* 3*mapred.job.tracker*localhost:54311 >>>> *mapreduce.job.submithost* hduser*io.seqfile.sorter.recordlimit*1000000 >>>> *giraph.vertexInputFormatClass* >>>> org.apache.giraph.examples.MyShortestPaths$MyInputFormat *dfs.name.dir* >>>> ${hadoop.tmp.dir}/dfs/name*mapred.line.input.format.linespermap*1 >>>> *mapred.jobtracker.taskScheduler* >>>> org.apache.hadoop.mapred.JobQueueTaskScheduler >>>> *dfs.datanode.http.address*0.0.0.0:50075 >>>> *mapred.local.dir.minspacekill*0*dfs.replication.interval*3 >>>> *io.sort.record.percent* 0.05*fs.kfs.impl* >>>> org.apache.hadoop.fs.kfs.KosmosFileSystem*mapred.temp.dir* >>>> ${hadoop.tmp.dir}/mapred/temp *mapred.tasktracker.reduce.tasks.maximum* >>>> 2*mapreduce.job.user.classpath.first*true*dfs.replication* 1 >>>> *fs.checkpoint.edits.dir*${fs.checkpoint.dir}*giraph.computationClass* >>>> org.apache.giraph.examples.MyShortestPaths >>>> *mapred.tasktracker.tasks.sleeptime-before-sigkill*5000 >>>> *mapred.job.reduce.input.buffer.percent*0.0 >>>> *mapred.tasktracker.indexcache.mb*10 >>>> *mapreduce.job.split.metainfo.maxsize*10000000*hadoop.logfile.count* 10 >>>> *mapred.skip.reduce.auto.incr.proc.count*true >>>> *mapreduce.job.submithostaddress*127.0.1.1 >>>> *io.seqfile.compress.blocksize*1000000*fs.s3.block.size*67108864 >>>> *mapred.tasktracker.taskmemorymanager.monitoring-interval* 5000 >>>> *giraph.minPercentResponded*100.0*mapred.queue.default.state*RUNNING >>>> *mapred.acls.enabled*false*mapreduce.jobtracker.staging.root.dir* >>>> ${hadoop.tmp.dir}/mapred/staging*mapred.queue.names* default >>>> *dfs.access.time.precision*3600000*fs.hsftp.impl* >>>> org.apache.hadoop.hdfs.HsftpFileSystem >>>> *mapred.task.tracker.http.address*0.0.0.0:50060 >>>> *mapred.reduce.parallel.copies* 5*io.seqfile.lazydecompress*true >>>> *mapred.output.dir*/user/hduser/output/shortestpaths *io.sort.mb*100 >>>> *ipc.client.connection.maxidletime*10000*mapred.compress.map.output*false >>>> *hadoop.security.uid.cache.secs*14400 >>>> *mapred.task.tracker.report.address*127.0.0.1:0 >>>> *mapred.healthChecker.interval*60000*ipc.client.kill.max*10 >>>> *ipc.client.connect.max.retries* 10*ipc.ping.interval*300000 >>>> *mapreduce.user.classpath.first*true *mapreduce.map.class* >>>> org.apache.giraph.graph.GraphMapper*fs.s3.impl* >>>> org.apache.hadoop.fs.s3.S3FileSystem*mapred.user.jobconf.limit* 5242880 >>>> *mapred.job.tracker.http.address*0.0.0.0:50030*io.file.buffer.size* >>>> 4096*mapred.jobtracker.restart.recover*false*io.serializations* >>>> org.apache.hadoop.io.serializer.WritableSerialization >>>> *dfs.datanode.handler.count*3*mapred.reduce.copy.backoff*300 >>>> *mapred.task.profile* false*dfs.replication.considerLoad*true >>>> *jobclient.output.filter*FAILED >>>> *dfs.namenode.delegation.token.max-lifetime*604800000 >>>> *mapred.tasktracker.map.tasks.maximum*4*io.compression.codecs* >>>> org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec >>>> *fs.checkpoint.size*67108864 >>>> >>>> Additionally, if I have more than one worker I get an Exception, too? >>>> Are my configurations wrong? >>>> >>>> >>>> best regards, >>>> Sebastian >>>> >>>> >>> >> > -- Claudio Martella claudio.marte...@gmail.com