Massimo,

when profiling this it quickly becomes apparent that the issue is within the 
lucene document.
(org.apache.lucene.document.Document)

it holds an arraylist of all its fields which amount to all the memory.

It also contains several methods that walk over that list (filtering it) and or 
returning copies of that.

Another issue that came up, the addtion takes longer and longer (because of 
Lucene doing a quick-sort on the fields at each flush()).

So my suggestion would be to shard the indexing over several arguments and hide 
that behind a domain level API, each document should have around 50k entries to 
allow lucene to handle it gracefully. After you introduced this API you should 
perhaps consider replacing this large index with a more appropriate key-value 
store (like redis, jdbm, custom-impl - depending on your real use-case which 
you haven't revealed :) ).

Cheers

Michael
Am 24.06.2011 um 16:36 schrieb Massimo Lusetti:

> On Thu, Jun 23, 2011 at 9:08 PM, Mattias Persson
> <matt...@neotechnology.com> wrote:
> 
>> That should be quite fine. I could try this out locally perhaps. Something
>> like:
>> 
>> Index<Node> index = db.index().forNodes("myIndex");
>> Transaction tx = db.beginTx();
>> Node node = db.createNode();
>> for ( int i = 0; i < 250000; i++ )
>> {
>>    index.add(node,"key",i);
>>    if ( i%10000 == 0 )
>>    {
>>        tx.success();
>>        tx.finish();
>>        tx = db.beginTx();
>>    }
>> }
>> 
>> ?
> 
> I've run the example above and here are the results.
> 
> After 603 round of 10000 entries I got an OutOfMemoryError with this
> stack trace:
> 
> org.neo4j.graphdb.TransactionFailureException: Unable to commit transaction
>        at 
> org.neo4j.kernel.TopLevelTransaction.finish(TopLevelTransaction.java:104)
>        at 
> statisticheng.services.graphdb.Neo4jSourceImpl.testDbIndexCapabilities(Neo4jSourceImpl.java:271)
>        at 
> statisticheng.services.graphdb.Neo4jSourceImpl.<init>(Neo4jSourceImpl.java:241)
>        at 
> statisticheng.services.StatistichengModule.buildNeo4jSource(StatistichengModule.java:145)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at 
> org.apache.tapestry5.ioc.internal.ServiceBuilderMethodInvoker.createObject(ServiceBuilderMethodInvoker.java:64)
>        at 
> org.apache.tapestry5.ioc.internal.OperationTrackingObjectCreator$1.invoke(OperationTrackingObjectCreator.java:45)
>        at 
> org.apache.tapestry5.ioc.internal.OperationTrackerImpl.invoke(OperationTrackerImpl.java:65)
>        at 
> org.apache.tapestry5.ioc.internal.PerThreadOperationTracker.invoke(PerThreadOperationTracker.java:68)
>        at 
> org.apache.tapestry5.ioc.internal.RegistryImpl.invoke(RegistryImpl.java:1063)
>        at 
> org.apache.tapestry5.ioc.internal.OperationTrackingObjectCreator.createObject(OperationTrackingObjectCreator.java:49)
>        at 
> org.apache.tapestry5.ioc.internal.SingletonServiceLifecycle.createService(SingletonServiceLifecycle.java:29)
>        at 
> org.apache.tapestry5.ioc.internal.LifecycleWrappedServiceCreator.createObject(LifecycleWrappedServiceCreator.java:46)
>        at 
> org.apache.tapestry5.ioc.internal.AdvisorStackBuilder.createObject(AdvisorStackBuilder.java:60)
>        at 
> org.apache.tapestry5.ioc.internal.InterceptorStackBuilder.createObject(InterceptorStackBuilder.java:52)
>        at 
> org.apache.tapestry5.ioc.internal.RecursiveServiceCreationCheckWrapper.createObject(RecursiveServiceCreationCheckWrapper.java:60)
>        at 
> org.apache.tapestry5.ioc.internal.OperationTrackingObjectCreator$1.invoke(OperationTrackingObjectCreator.java:45)
>        at 
> org.apache.tapestry5.ioc.internal.OperationTrackerImpl.invoke(OperationTrackerImpl.java:65)
>        at 
> org.apache.tapestry5.ioc.internal.PerThreadOperationTracker.invoke(PerThreadOperationTracker.java:68)
>        at 
> org.apache.tapestry5.ioc.internal.RegistryImpl.invoke(RegistryImpl.java:1063)
>        at 
> org.apache.tapestry5.ioc.internal.OperationTrackingObjectCreator.createObject(OperationTrackingObjectCreator.java:49)
>        at 
> org.apache.tapestry5.ioc.internal.services.JustInTimeObjectCreator.obtainObjectFromCreator(JustInTimeObjectCreator.java:68)
>        at 
> org.apache.tapestry5.ioc.internal.services.JustInTimeObjectCreator.createObject(JustInTimeObjectCreator.java:57)
>        at 
> org.apache.tapestry5.ioc.internal.services.JustInTimeObjectCreator.eagerLoadService(JustInTimeObjectCreator.java:89)
>        at 
> org.apache.tapestry5.ioc.internal.RegistryImpl.performRegistryStartup(RegistryImpl.java:331)
>        at 
> org.apache.tapestry5.ioc.internal.RegistryWrapper.performRegistryStartup(RegistryWrapper.java:73)
>        at org.apache.tapestry5.TapestryFilter.init(TapestryFilter.java:104)
>        at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
>        at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>        at 
> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
>        at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
>        at 
> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
>        at 
> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
>        at 
> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
>        at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>        at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>        at 
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
>        at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>        at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
>        at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>        at 
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
>        at org.mortbay.jetty.Server.doStart(Server.java:224)
>        at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
>        at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
>        at org.mortbay.start.Main.invokeMain(Main.java:194)
>        at org.mortbay.start.Main.start(Main.java:534)
>        at org.mortbay.start.Main.start(Main.java:441)
>        at org.mortbay.start.Main.main(Main.java:119)
> Caused by: javax.transaction.HeuristicMixedException: Unable to
> rollback ---> error in commit: java.lang.OutOfMemoryError: Java heap
> space ---> error code for rollback: 0
>        at 
> org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:669)
>        at 
> org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:588)
>        at 
> org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:107)
>        at 
> org.neo4j.kernel.TopLevelTransaction.finish(TopLevelTransaction.java:85)
>        ... 54 more
> Caused by: javax.transaction.xa.XAException: Transaction already started 
> commit
>        at 
> org.neo4j.kernel.impl.transaction.xaframework.XaResourceManager.rollback(XaResourceManager.java:476)
>        at 
> org.neo4j.kernel.impl.transaction.xaframework.XaResourceHelpImpl.rollback(XaResourceHelpImpl.java:111)
>        at 
> org.neo4j.kernel.impl.transaction.TransactionImpl.doRollback(TransactionImpl.java:533)
>        at 
> org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:651)
>        ... 57 more
> 
> 
> The error point in my code: "at
> statisticheng.services.graphdb.Neo4jSourceImpl.testDbIndexCapabilities(Neo4jSourceImpl.java:271)"
> correspond to the tx.finish() just before the new
> graphdb.beginTransaction() in the for loop.
> 
> Does it helps to understand what's going on?
> 
> Thanks a lot
> -- 
> Massimo
> http://meridio.blogspot.com
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to