Hi, I was loading on by file basis and was commiting after each file. Even though 22 mln file was finished in 5 hours and 66 mln did not finish in 3 days. I rewrite the program to read file and commit after some amount of records in hope better to control memory but program fails with " out of memory error" even for small dataset (80 000) . (With file approach the small dataset was loading without problem). my snippet: for ( File file : files ) { SimpleTimer timer = new SimpleTimer(); FileInputStream in = new FileInputStream(file); BufferedReader br = new BufferedReader(new InputStreamReader(in)); String strLine; int count=0; while ((strLine = br.readLine()) != null) { count = count+1;
if (strLine.trim().length() != 0) { String[] result = strLine.split("\\s"); rc.add(f.createURI(stripeN3(result[0])),f.createURI(stripeN3(result[1])),f.createURI(stripeN3(result[2])), context) ; if (count==10000) { //rc.add( file, "", RDFFormat.NTRIPLES,context); rc.commit(); count = 0; } } } br.close(); in.close(); rc.commit(); timer.end(); } sumtimer.end(); rc.commit(); rc.close(); } What can cause the problem : INFO] Trace org.apache.maven.lifecycle.LifecycleExecutionException: An exception occured while executing the Java class. Java heap space at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:583) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:512) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:482) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:330) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:291) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:142) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:336) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:129) at org.apache.maven.cli.MavenCli.main(MavenCli.java:287) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:592) at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315) at org.codehaus.classworlds.Launcher.launch(Launcher.java:255) at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430) at org.codehaus.classworlds.Launcher.main(Launcher.java:375) Caused by: org.apache.maven.plugin.MojoExecutionException: An exception occured while executing the Java class. Java heap space at org.codehaus.mojo.exec.ExecJavaMojo.execute(ExecJavaMojo.java:338) at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:451) at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:558) ... 16 more Caused by: java.lang.OutOfMemoryError: Java heap space at java.nio.ByteBuffer.wrap(ByteBuffer.java:350) at java.nio.ByteBuffer.wrap(ByteBuffer.java:373) at org.neo4j.kernel.impl.transaction.XidImpl.getNewGlobalId(XidImpl.java:55) at org.neo4j.kernel.impl.transaction.TransactionImpl.<init>(TransactionImpl.java:67) at org.neo4j.kernel.impl.transaction.TxManager.begin(TxManager.java:497) at org.neo4j.kernel.EmbeddedGraphDbImpl.beginTx(EmbeddedGraphDbImpl.java:238) at org.neo4j.kernel.EmbeddedGraphDatabase.beginTx(EmbeddedGraphDatabase.java:139) at org.neo4j.index.impl.GenericIndexService.beginTx(GenericIndexService.java:105) at org.neo4j.index.impl.IndexServiceQueue.run(IndexServiceQueue.java:221) [INFO] ------------------------------------------------------------------------ [INFO] Total time: 5 minutes 14 seconds Thank you for the help, Lyudmila > There are some problems at the moment regarding insertion speeds. > > o We haven't yet created an rdf store which can use a BatchInserter (which > could also be tweaked to ignore checking if statements already exists > before > it adds each statement and all that). > o The other one is that the sail layer on top of the neo4j-rdf component > contains functionality which allows a thread to have more than one running > transaction at the same time. This was added due to some users > requirements, > but slows it down by a factor 2 or something (not sure about this). > > I would like to see both these issues resolved soon, and when they are > fixed > insertion speeds will be quite nice! > > 2010/4/9 Lyudmila L. Balakireva <lu...@lanl.gov> > >> Hi, >> How to optimize loading to the VerboseQuadStore? >> I am doing test similar to the test example from neo rdf sail and it >> is very slow. The size of files 3G - 7G . >> Thanks, >> Luda >> _______________________________________________ >> Neo mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > > > > -- > Mattias Persson, [matt...@neotechnology.com] > Hacker, Neo Technology > www.neotechnology.com > _______________________________________________ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user