Re: [Neo] Import/export

Atle Prange Wed, 20 Jan 2010 00:32:52 -0800

Indeed the implicit id can cause problems on any use case regarding
synchronization between neo4j and some other datasource (or replication
between two neo4j instances)


For example if i import data from an xml file, and then rerun the import, i
would expect the data in my neo4j instance to be overwritten/updated...

It would be really sweet it it would be possible to define this id before
the node is created, as a String for example. But i guess you guys use long
for a good reason.

-atle

On Tue, Jan 19, 2010 at 1:00 PM, Rick Bullotta <
rick.bullo...@burningskysoftware.com> wrote:

> There is really no "natural" way to express complex graphs (something more
> than hierarchal) in something like XML or JSON, but it can be done as long
> as each entity has a unique identification of some kind (e.g. GraphML's
> IDs).
>
> Barring any reason not to, it would seem that GraphML would be the most
> logical place to start.  It seems to recognize many of the potential
> complications(e.g. the parse "hints") and is extensible.
>
> One primary disconnect point is the dependency on "ids".  Neo nodes and
> relationships have an implicit ID (the long value representing the node or
> relationship), but may or may not have an explicit ID.  Thus, as mentioned
> previously, the identity may not be the same on import as it was an export,
> unless an explicit ID is provided for each node/relationship.  Our current
> graph model is a mix of both.  Some nodes have ID's (typically a name),
> others do not.
>
> Also, GraphML would need to be extended to support the concept of
> relationship types, but this seems to be fairly straightforward using
> custom
> attributes, elements and/or xlink.
>
>
>
> -----Original Message-----
> From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org]
> On
> Behalf Of Craig Taverner
> Sent: Tuesday, January 19, 2010 4:07 AM
> To: Neo user discussions
> Subject: Re: [Neo] Import/export
>
> I was wondering if the neo-shell or the neo4j.rb in IRB would solve this
> requirement (easily creating or loading some initial graph). I have not
> played much with the shell, but know that it has commands for making nodes
> and relationships. But I think it is best for interactive work, and I think
> it is not ideal for scripting. On the other hand the Ruby API provides a
> command-line/scripting DSL for generating a graph and since it is also very
> easy to read data from a file, it is easy to read the file and create the
> graph in a language not entirely unlike your original 'javascript-like'
> example.
>
> While on that note, I said javascript-like, but if we focus on the 'state
> transfer' example you gave, we're talking about JSON, and I think I might
> get a few votes for suggesting that as a nicer alternative to XML for a
> generic data structure.
>
> So, if you use JSON, the Java API and a JSON library would suffice to build
> the graph. If you are willing to deviate a little from the syntax, you
> could
> have the format in Ruby and directly executable in neo4j.rb (which I think
> is even cooler :-)
>
> I also personally think both XML and JSON represent implicit tree
> structures, and so any XML or JSON dataset can be loaded as a tree graph
> with generic code (and no need for hashes of node ids or any caches).
> However, things get slightly tricky when we need to translate XML/JSON
> closed graph contructs into the graph, but even that seems achievable (with
> hashes/caches ;-)
>
> On Tue, Jan 19, 2010 at 8:55 AM, David Montag <da...@montag.se> wrote:
>
> > Hi,
> >
> > Having read the replies and thought about it more, I think my initial
> > e-mail
> > had a slightly wrong focus. The technical details that have surfaced so
> far
> > are interesting, and would definitely be relevant, should an
> implementation
> > be attempted.
> >
> > However.
> >
> > What I personally would like to know is, do you think there's a need for
> > initial data sets in the first place? Because that is the problem that I
> > initially set out to solve. Then I kind of got ahead of myself and
> started
> > thinking about the hows, and not the whats and whys. Simply zipping up a
> > couple of pre-populated stores with different graphs would actually solve
> > the problem. Maybe not in the most elegant and/or maintainable way, but
> > still. Export/import is a much broader feature.
> >
> > Opinions? Don't get me wrong, I'm not trying to kill the tech discussion.
> > I'm just trying to solve the actual problem that I ran into. And if you
> > think export/import would be useful too, great! I'd be happy to continue
> > that discussion as well.
> >
> > Also, let me make it clear that this (i.e. initial data sets) isn't
> > something I'm doing as a project for myself. I would expect it to be a
> > community effort, benefiting everyone. So I actually *want* to know if
> you
> > like the ideas or not, in addition to solutions. With the awesomeness
> that
> > is the Neo4j community, it shouldn't be a problem. :)
> >
> > -David
> >
> > On Tue, Jan 19, 2010 at 2:32 AM, Rick Bullotta <
> > rick.bullo...@burningskysoftware.com> wrote:
> >
> > > Actually, I think there's one other key "gotcha" to be aware of.
> > >
> > > Rewiring relationships when importing should not assume anything about
> > the
> > > nodeID's.  While the nodeID's are a useful "unique identifier" in the
> > > export
> > > process, on import, you'd want to create a HashMap or similar structure
> > > that
> > > you populate with the "old" and "new" node ID's as you create them in
> the
> > > first pass through (nodes/properties), then use the "old" nodeIDs
> > > referenced
> > > in the exported relationships as your lookup to get the "new" nodeIDs.
> > >
> > > Could be kinda memory intensive for really large graphs (since you'd
> have
> > > to
> > > keep a HashMap entry of Long/Long for each node), but probably
> > manageable.
> > > In the worst case you could keep the translation table on disk and
> chunk
> > it
> > > in as needed.
> > >
> > > -----Original Message-----
> > > From: user-boun...@lists.neo4j.org [mailto:
> user-boun...@lists.neo4j.org]
> > > On
> > > Behalf Of Rob Challen
> > > Sent: Monday, January 18, 2010 6:25 PM
> > > To: Neo user discussions
> > > Subject: Re: [Neo] Import/export
> > >
> > > Rdf seems a good candidate to me.
> > >
> > > Having said that it might just be pretty easy to write out the graph
> > > in a spreadsheet (nodes and properties in one tab and relationship
> > > triples and properties in another) and import that, as long as you
> > > aren't fussed about maintaining data types.
> > >
> > > Rob.
> > >
> > > On 18/01/2010, Peter Neubauer <neubauer.pe...@gmail.com> wrote:
> > > > Hi David,
> > > > one thing would be to provide example node spaces, maybe even as
> > > > Amazon EC2 AMIs, or downloadable nodespaces.
> > > >
> > > > Regrading XML format, I think GraphML is the most standard thing
> > > > there, Gremlin already has a GraphML importer that can be used to
> > > > import data into Neo4j,
> > > >
> > >
> >
> http://wiki.github.com/tinkerpop/gremlin/graphml-reader-and-writer-library
> > > > . Probably not hard to write directly onto Neo4j.
> > > >
> > > > Anyone knowing about a good other binary format?
> > > >
> > > > WDYT?
> > > >
> > > > Cheers,
> > > >
> > > > /peter neubauer
> > > >
> > > > COO and Sales, Neo Technology
> > > >
> > > > GTalk:      neubauer.peter
> > > > Skype       peter.neubauer
> > > > Phone       +46 704 106975
> > > > LinkedIn   http://www.linkedin.com/in/neubauer
> > > > Twitter      http://twitter.com/peterneubauer
> > > >
> > > > http://www.neo4j.org                - Your high performance graph
> > > database.
> > > > http://gremlin.tinkerpop.com    - PageRank in 2 lines of code.
> > > >
> > > >
> > > >
> > > > On Mon, Jan 18, 2010 at 8:37 PM, David Montag <da...@montag.se>
> wrote:
> > > >> Hi,
> > > >>
> > > >> This weekend I was toying around with Neo4j. I wanted to do some
> > > indexing
> > > >> experiments. Unfortunately I found myself without a graph to work
> > with.
> > > >> Sure, I could write some code to generate a graph for me, but it'd
> be
> > a
> > > >> one-time-thing. I wanted to get going *now*. That got me thinking
> > about
> > > >> import/export functionality.
> > > >>
> > > >> I think a command-line import tool would be useful, accompanied by
> > (and
> > > >> built on) a Java API. Both of them would be tied to a certain
> > > >> representation
> > > >> format. The export can be represented in different ways, where two
> > > >> possible
> > > >> ways are:
> > > >> - State transfer: (node{id:1, name:foo}, node{id:2},
> > rel{start:1,end:2,
> > > >> type=bar}, ...)
> > > >> - Operation transfer: (id1 = create node, id2 = create node, create
> > rel
> > > >> id1->id2 type bar, ...)
> > > >>
> > > >> I guess the state transfer feels like the more straightforward one.
> > The
> > > >> diff-style nature of the operation transfer might be useful in other
> > > >> cases.
> > > >>
> > > >> When I first thought of this, the target user was somebody who
> wanted
> > to
> > > >> get
> > > >> started with a graph, and didn't want to write code to do an import
> > > >> "manually". Maybe the import/export can extend to other use cases,
> but
> > > >> this
> > > >> was the primary one. A possible workflow could be db exported to
> file,
> > > >> file
> > > >> published, file downloaded, file imported into db.
> > > >>
> > > >> In the end, it would be great if new users could download sample
> data
> > > sets
> > > >> and import them into a Neo4j instance without writing a single line
> of
> > > >> code.
> > > >> Which also gets me thinking about a command-line tool to create an
> > empty
> > > >> Neo4j instance to import into. The actual implementations of the
> tools
> > > are
> > > >> trivial. It's the discussion that leads to the implementation that's
> > > >> important.
> > > >>
> > > >> Does this sound like anything that would interest people? If so,
> > > (digging
> > > >> into details) what kind of representation do you guys think would be
> > > best?
> > > >> I
> > > >> was thinking XML, but a binary format might be better for
> performance
> > > >> (size/primitives ratio). Maybe both? Because I do like the idea of a
> > > >> human-readable (and editable) format. If you don't think it would be
> > > >> useful
> > > >> I would love to hear why.
> > > >>
> > > >> This is just a brain dump of my thoughts. Surely others have thought
> > of
> > > >> this
> > > >> as well. I'm just getting the discussion started. WDYT?
> > > >>
> > > >> -David
> > > >> _______________________________________________
> > > >> Neo mailing list
> > > >> User@lists.neo4j.org
> > > >> https://lists.neo4j.org/mailman/listinfo/user
> > > >>
> > > > _______________________________________________
> > > > Neo mailing list
> > > > User@lists.neo4j.org
> > > > https://lists.neo4j.org/mailman/listinfo/user
> > > >
> > >
> > > --
> > > Sent from my mobile device
> > > _______________________________________________
> > > Neo mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> > >
> > > _______________________________________________
> > > Neo mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> > >
> > _______________________________________________
> > Neo mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
> _______________________________________________
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] Import/export

Reply via email to