Hi,
 I  was loading on by file basis and was commiting after each file.
Even  though 22 mln file   was finished in 5 hours and 66 mln did not
finish in 3 days.
I rewrite the program to read file and commit after some  amount of
records in hope better to control memory
 but program  fails with " out of memory error"  even for small dataset
(80 000) . (With file approach the small dataset was  loading  without
problem).
my snippet:
for ( File file : files )
                        {     SimpleTimer timer = new SimpleTimer();
                                  FileInputStream in = new 
FileInputStream(file);
                                        BufferedReader br = new 
BufferedReader(new
InputStreamReader(in));
                                    String strLine;
                                    int count=0;
                                    while ((strLine = br.readLine()) != null)   
{
                                        count = count+1;

                                           if (strLine.trim().length() != 0) {
                                              String[] result = 
strLine.split("\\s");

                                               
rc.add(f.createURI(stripeN3(result[0])),f.createURI(stripeN3(result[1])),f.createURI(stripeN3(result[2])),
context) ;


                                                        if (count==10000) {
                                                        //rc.add( file, "", 
RDFFormat.NTRIPLES,context);

                                                        rc.commit();
                                                        count = 0;
                                                         }
                                              }
                                     }

                                     br.close();
                                     in.close();
                                     rc.commit();

                                timer.end();
                        }

                        sumtimer.end();
                        rc.commit();
                        rc.close();
                }

What can cause the problem :
INFO] Trace
org.apache.maven.lifecycle.LifecycleExecutionException: An exception
occured while executing the Java class. Java heap space
        at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:583)
        at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeStandaloneGoal(DefaultLifecycleExecutor.java:512)
        at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:482)
        at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:330)
        at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:291)
        at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:142)
        at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:336)
        at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:129)
        at org.apache.maven.cli.MavenCli.main(MavenCli.java:287)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:592)
        at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315)
        at org.codehaus.classworlds.Launcher.launch(Launcher.java:255)
        at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430)
        at org.codehaus.classworlds.Launcher.main(Launcher.java:375)
Caused by: org.apache.maven.plugin.MojoExecutionException: An exception
occured while executing the Java class. Java heap space
        at org.codehaus.mojo.exec.ExecJavaMojo.execute(ExecJavaMojo.java:338)
        at
org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:451)
        at
org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:558)
        ... 16 more
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.nio.ByteBuffer.wrap(ByteBuffer.java:350)
        at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
        at 
org.neo4j.kernel.impl.transaction.XidImpl.getNewGlobalId(XidImpl.java:55)
        at
org.neo4j.kernel.impl.transaction.TransactionImpl.<init>(TransactionImpl.java:67)
        at org.neo4j.kernel.impl.transaction.TxManager.begin(TxManager.java:497)
        at
org.neo4j.kernel.EmbeddedGraphDbImpl.beginTx(EmbeddedGraphDbImpl.java:238)
        at
org.neo4j.kernel.EmbeddedGraphDatabase.beginTx(EmbeddedGraphDatabase.java:139)
        at
org.neo4j.index.impl.GenericIndexService.beginTx(GenericIndexService.java:105)
        at 
org.neo4j.index.impl.IndexServiceQueue.run(IndexServiceQueue.java:221)
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 5 minutes 14 seconds


Thank you for the help,
Lyudmila




> There are some problems at the moment regarding insertion speeds.
>
> o We haven't yet created an rdf store which can use a BatchInserter (which
> could also be tweaked to ignore checking if statements already exists
> before
> it adds each statement and all that).
> o The other one is that the sail layer on top of the neo4j-rdf component
> contains functionality which allows a thread to have more than one running
> transaction at the same time. This was added due to some users
> requirements,
> but slows it down by a factor 2 or something (not sure about this).
>
> I would like to see both these issues resolved soon, and when they are
> fixed
> insertion speeds will be quite nice!
>
> 2010/4/9 Lyudmila L. Balakireva <lu...@lanl.gov>
>
>> Hi,
>> How to optimize loading to the VerboseQuadStore?
>>  I am doing test similar to the  test example from neo rdf sail   and it
>> is  very slow.  The size of  files 3G - 7G .
>> Thanks,
>> Luda
>> _______________________________________________
>> Neo mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Hacker, Neo Technology
> www.neotechnology.com
> _______________________________________________
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>

_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to