Re: [Neo] merging databases to enable bulk load on live database

Johan Svensson Mon, 02 Nov 2009 10:53:00 -0800

On Fri, Oct 30, 2009 at 5:31 PM, Craig Taverner <cr...@amanzi.com> wrote:
> Hi Johan,
>
> Of course I have a response to each of your excellent points :-)
>
> *'transactional merge'*
> I do assume that all database access is locked for the duration of the real
> 'merge'. But since the merge should be a set of raw file concatenations, I
> assume that does not take too long (seconds, I would hope). That should make
> it transactionally safe, right?
>


There would be more to it to really make it safe. If we just ignore
the update so it is not atomic (you will be able to see parts of the
update before everything is there) we still have to make sure it will
survive a crash. The most simple way to do that would be to reuse the
code already in place using a logical log meaning:

 o build a logical log for the merge
 o rotate the current log and set the merge log as the current one
 o apply the log then append any running transactions to the new log

Building the merge log would slow things down and to make the update
atomic we would have to grab locks slowing things down even more.

The other solution would be to hack something new to handle this use
case but that would require a lot of work to make sure you never end
up with a corrupt db.

> *'data inter-dependence'*
> Obviously the reason for wanting to merge the databases is to create some
> links between the new data and the old. These would all be added after the
> merge, of course. I my case it involves just one or two relationships, but
> even the generic case of many relationships still makes sense. However, if
> it is necessary to build relationships at all points through the database as
> it is loaded, then that is not a case for this idea, nor is it a case for
> bulk load or separated database at all. So, if the data needs a lot of
> complex relationships, none of the solutions
>

True.

> *'inter-database relationships'*
> If *real* inter-database relationships existed, such that a traverser would
> traverse across the divide, then that would indeed solve this case, and
> perform very well too. That was my previous idea for '*sharding*' the
> database, but you also said that was too tricky at this point. But since you
> mean an application level 'fake inter-database relationship', then that
> means a lot more application level, possibly fragile, code that disallows
> the use of traversers and any internal neo4j conveniences. For some data,
> this might actually be good enough. I have considered this option myself,
> but would obviously prefer the convenience of still having access to the
> traverser framework.
>

Yes.

> *'read-only database'*
> We did look into the option of, at application level, stopping access to the
> database and allowing only the bulk load. The problem is that at any point
> there could be several threads with references to the main NeoService, and
> we would have to setup a reliable way of swapping out all those references
> on the fly (which requires those threads cooperation, of course). Another
> issue was that I understood that we would need to stop *all* access, not
> only write access, to allow the bulk load to run, but you imply below that
> 'read-only' access can still work? If that is possible, this option is
> slightly more attractive, but we still have the multi-thread
> *collaboration*requirement.
>

Yes. If you have read only instances they can read the "other" parts
of the graph while a batch inserter makes new inserts. The problem
here is if the application crash you may end up with a corrupt db
(batch inserter use-case is load all your data in once fast, not
update a live db). One could however use online-backup here to make a
backup that will be rolled back to if the batch insert fails.

I think the best way to move forward is to have a look at the writes
you perform now and try tweak it. Hopefully we can achieve good enough
write speed for your use case running in normal transactional
read/write mode.

Regards,
-Johan
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] merging databases to enable bulk load on live database

Reply via email to