Great, thanks for pointing me in the right direction. All of
the edge values are strings (in a Text object) and point to and
from vertices with Text IDs, but none of the values should be
greater than 60 bytes or so during the loading step. The size
will increase during computation because I am modifying the
values of the edges, but the actual size of the data is not too
large.



Given that I am using text based IDs and values, it looks to me
like I may have to implement my own edge store-- does that seem
right?



Thank you for your help!



--
Andrew





On Mon, Sep 8, 2014, at 05:31 PM, Pavan Kumar A wrote:

ByteArrayEdges or any of the other edge stores used array
based/ map based stores, all of these will encounter this
exception when size of the array approaches Integer.MAX
some things to consider for time being, what do your edges look
like?
if they are long ids & null values u can use LongNullArrayEdges
to push the boundary a bit i.e, until u get a vertex who has ~2
billion outgoing edges
for long ids & double values u can use LongDoubleArrayEdges
etc.

please take a look at classes that implement this
interface OutEdges

If none of those work, you can implement one of your own
and use a store backed by datastructures like BigDataOutput
instead of plain old ByteArrays
  __________________________________________________________

From: and...@wizardapps.net
To: user@giraph.apache.org
Subject: NegativeArraySizeException with large dataset
Date: Mon, 8 Sep 2014 17:19:17 -0700

Hey,

I am currently running Giraph on a semi-large dataset of 600
million edges (the edges are directed, so I've used the
ReverseEdgeDuplicator for an expected total of 1.2b edges). I
am running into an issue during superstep -1 when the edges are
being loaded-- I receive a
"java.lang.NegativeArraySizeException" exception. This occurs
near the end of when the edges should be done loading-- by my
estimate, I believe around 1b out of the 1.2b have been loaded.

The exception occurs on one of the workers, and all of the
other workers subsequently halt loading before I kill the job.

The issue doesn't occur with half of the dataset (300 million
edges, 600 million total with the reverser).

The only reference I've found to this particular exception type
is GIRAPH-821
([1]https://issues.apache.org/jira/browse/GIRAPH-821), which
suggests to enable the useBigDataIOForMessages flag. I would be
surprised if it helped, because this error occurs during the
loading superstep, and there are no "super vertices" in my
traversal computation. Enabling this flag had no effect.

Any help on this would be appreciated.

The full stack trace for the exception is as follows:

java.lang.NegativeArraySizeException
        at
org.apache.giraph.utils.UnsafeByteArrayOutputStream.ensureSize(
UnsafeByteArrayOutputStream.java:116)
        at
org.apache.giraph.utils.UnsafeByteArrayOutputStream.write(Unsaf
eByteArrayOutputStream.java:167)
        at org.apache.hadoop.io.Text.write(Text.java:282)
        at
org.apache.giraph.utils.WritableUtils.writeEdge(WritableUtils.j
ava:501)
        at
org.apache.giraph.edge.ByteArrayEdges.add(ByteArrayEdges.java:9
3)
        at
org.apache.giraph.edge.AbstractEdgeStore.addPartitionEdges(Abst
ractEdgeStore.java:166)
        at
org.apache.giraph.comm.requests.SendWorkerEdgesRequest.doReques
t(SendWorkerEdgesRequest.java:72)
        at
org.apache.giraph.comm.netty.handler.WorkerRequestServerHandler
.processRequest(WorkerRequestServerHandler.java:62)
        at
org.apache.giraph.comm.netty.handler.WorkerRequestServerHandler
.processRequest(WorkerRequestServerHandler.java:36)
        at
org.apache.giraph.comm.netty.handler.RequestServerHandler.chann
elRead(RequestServerHandler.java:108)
        at
io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead
(DefaultChannelHandlerContext.java:338)
        at
io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(D
efaultChannelHandlerContext.java:324)
        at
org.apache.giraph.comm.netty.handler.RequestDecoder.channelRead
(RequestDecoder.java:100)
        at
io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead
(DefaultChannelHandlerContext.java:338)
        at
io.netty.channel.DefaultChannelHandlerContext.access$700(Defaul
tChannelHandlerContext.java:29)
        at
io.netty.channel.DefaultChannelHandlerContext$8.run(DefaultChan
nelHandlerContext.java:329)
        at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(
SingleThreadEventExecutor.java:354)
        at
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:353)
        at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(Single
ThreadEventExecutor.java:101)
        at java.lang.Thread.run(Thread.java:745)

--
Andrew

References

1. https://issues.apache.org/jira/browse/GIRAPH-821

Reply via email to