Re: [Neo4j] Fastest way to load large dataset using new python embedded mode

Jacob Hansson Wed, 05 Oct 2011 07:11:48 -0700

Hey Sean,

we haven't added support for the batch inserter in the python bindings yet,
I can think of two things you can do:

a) Use the batch inserter in java land like you say to create your db, and
then just set embedded python to use that db location.
b) Just use the normal API, depending on how interconnected your data is, if
you do transactions of say 100 000 inserts per TX, it shouldn't take that
long to insert 70M nodes. Pure insert of 1 node with one property clocks in
on my machine at about 30 000 inserts per second.

It might be interesting to add support for the GEOFF (
http://py2neo.org/geoff/) import/export format that the cool kids behind
py2neo has developed..

Add a ticket to the github page for neo4j-embedded if you'd like to see any
of that happen :)

/jake

On Wed, Oct 5, 2011 at 4:03 PM, Sean Davis <sdav...@mail.nih.gov> wrote:

> I have a few datasets that contain about 70M nodes.  Relationships
> between these sets will be sparse and will be added over time.  What
> is the fastest way to load these nodes into neo4j?  I can work with
> java (http://wiki.neo4j.org/content/Batch_Insert) if necessary, but
> I'd be interested to hear if there is a way to use this API in the new
> embedded python mode.
>
> Thanks,
> Sean
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>

-- 
Jacob Hansson
Phone: +46 (0) 763503395
Twitter: @jakewins
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Fastest way to load large dataset using new python embedded mode

Reply via email to