yes, you should implement your own edge store.please take a look at 
ByteArrayEdges for example and modify it to use BigDataOutput & BigDataInput 
instead of ExtendedByteArrayOutput/Input.
From: and...@wizardapps.net
To: user@giraph.apache.org
Subject: Re: NegativeArraySizeException with large dataset
Date: Tue, 9 Sep 2014 09:58:36 -0700






Great, thanks for pointing me in the right direction. All of the edge values 
are strings (in a Text object) and point to and from vertices with Text IDs, 
but none of the values should be greater than 60 bytes or so during the loading 
step. The size will increase during computation because I am modifying the 
values of the edges, but the actual size of the data is not too large.

 
Given that I am using text based IDs and values, it looks to me like I may have 
to implement my own edge store-- does that seem right?

 
Thank you for your help!
 
-- 

Andrew

 

 
 
On Mon, Sep 8, 2014, at 05:31 PM, Pavan Kumar A wrote:

ByteArrayEdges or any of the other edge stores used array based/ map based 
stores, all of these will encounter this exception when size of the array 
approaches Integer.MAX

some things to consider for time being, what do your edges look like?

if they are long ids & null values u can use LongNullArrayEdges to push the 
boundary a bit i.e, until u get a vertex who has ~2 billion outgoing edges

for long ids & double values u can use LongDoubleArrayEdges etc.

 
please take a look at classes that implement this interface OutEdges

 
If none of those work, you can implement one of your own

and use a store backed by datastructures like BigDataOutput instead of plain 
old ByteArrays

 
From: and...@wizardapps.net

To: user@giraph.apache.org

Subject: NegativeArraySizeException with large dataset

Date: Mon, 8 Sep 2014 17:19:17 -0700

 
Hey,

 
I am currently running Giraph on a semi-large dataset of 600 million edges (the 
edges are directed, so I've used the ReverseEdgeDuplicator for an expected 
total of 1.2b edges). I am running into an issue during superstep -1 when the 
edges are being loaded-- I receive a "java.lang.NegativeArraySizeException" 
exception. This occurs near the end of when the edges should be done loading-- 
by my estimate, I believe around 1b out of the 1.2b have been loaded.

 
The exception occurs on one of the workers, and all of the other workers 
subsequently halt loading before I kill the job.

 
The issue doesn't occur with half of the dataset (300 million edges, 600 
million total with the reverser).

 
The only reference I've found to this particular exception type is GIRAPH-821 
(https://issues.apache.org/jira/browse/GIRAPH-821), which suggests to enable 
the useBigDataIOForMessages flag. I would be surprised if it helped, because 
this error occurs during the loading superstep, and there are no "super 
vertices" in my traversal computation. Enabling this flag had no effect.

 
Any help on this would be appreciated.

 
The full stack trace for the exception is as follows:

 
java.lang.NegativeArraySizeException

        at 
org.apache.giraph.utils.UnsafeByteArrayOutputStream.ensureSize(UnsafeByteArrayOutputStream.java:116)

        at 
org.apache.giraph.utils.UnsafeByteArrayOutputStream.write(UnsafeByteArrayOutputStream.java:167)

        at org.apache.hadoop.io.Text.write(Text.java:282)

        at 
org.apache.giraph.utils.WritableUtils.writeEdge(WritableUtils.java:501)

        at org.apache.giraph.edge.ByteArrayEdges.add(ByteArrayEdges.java:93)

        at 
org.apache.giraph.edge.AbstractEdgeStore.addPartitionEdges(AbstractEdgeStore.java:166)

        at 
org.apache.giraph.comm.requests.SendWorkerEdgesRequest.doRequest(SendWorkerEdgesRequest.java:72)

        at 
org.apache.giraph.comm.netty.handler.WorkerRequestServerHandler.processRequest(WorkerRequestServerHandler.java:62)

        at 
org.apache.giraph.comm.netty.handler.WorkerRequestServerHandler.processRequest(WorkerRequestServerHandler.java:36)

        at 
org.apache.giraph.comm.netty.handler.RequestServerHandler.channelRead(RequestServerHandler.java:108)

        at 
io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338)

        at 
io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:324)

        at 
org.apache.giraph.comm.netty.handler.RequestDecoder.channelRead(RequestDecoder.java:100)

        at 
io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338)

        at 
io.netty.channel.DefaultChannelHandlerContext.access$700(DefaultChannelHandlerContext.java:29)

        at 
io.netty.channel.DefaultChannelHandlerContext$8.run(DefaultChannelHandlerContext.java:329)

        at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:354)

        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:353)

        at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)

        at java.lang.Thread.run(Thread.java:745)

 
-- 

Andrew


 



                                          

Reply via email to