Michael,

The log actually is very helpful as the stacktrace seems to be the
point where it is using up all the memory (this is not always the
case!).  From what I see, I am guessing your have a very large number
of named graphs in your store.

What appears to be happening is that before the update starts,
UpdateEngineMain attempts to fire notification events to listeners
that an update is about to occur.  Unfortunately, it tries to fire an
event for each named graph in the system.  Because TDB represents
named graphs as quads, the only way to get a list of all the named
graphs to fire an event for is to perform an entire table scan,
project just the graph part of the quad and then perform a distinct
operation.

There are a few problems with this approach:
  1) This is pretty dang inefficient, as the entire database is
scanned on every update query
  2) With a large number of named graphs, you have to fire a lot of
events, which is also inefficient
  3) If you have a lot of named graphs, the distinct operation has to
store every graph name in an in-memory hashset

You are running into issue 3).  The underlying cause seems to be a
mismatch in the design of the graph notification.  This needs to be
redesigned to fire a single event for the entire graphstore.

-Stephen

P.S.  Problematic code is in DatasetGraphTDB.java (line 262).


On Fri, Sep 14, 2012 at 5:12 AM, Michael Brunnbauer <bru...@netestate.de> wrote:
>
> Hello Andy,
>
> On Fri, Sep 14, 2012 at 12:11:41PM +0100, Andy Seaborne wrote:
>> What I don't understand is where the garbage is coming from.
>> It may be the queries, and not the update.
>
> The queries are on another TDB. I do nothing with the updated TDB except the
> DROP.
>
>> So does the log provide any clues? (Running with -v provides more
>> details - including the updates).
>
> See the attached log. The exception trace at the end may provide hints.
>
> Regards,
>
> Michael Brunnbauer
>

Reply via email to