Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

Tobias Ivarsson Mon, 01 Aug 2011 11:53:26 -0700

I think a bit of elaboration might be in order.

EmbeddedReadOnlyGraphDatabase was created for one specific purpose:

Being able to interactively introspect a graph without having to shut down
the application that uses it.

Specifically the tools that we wanted to support with this were the Neo4j
shell and Neoclipse.

EmbeddedReadOnlyGraphDatabase (EROGD) has two major issues with way caching
is done internally in Neo4j (one issue with each cache):

   - When the EROGD reads data from the file system it will, like a normal
   EGD, cache the node and relationship objects. If a normal EGD modifies the
   graph "under the feet" of the EROGD, there is no way for the EROGD to know
   that the data in cache is now stale, which will lead to an inconsistent view
   of the graph. If for example the EROGD has cached Node[15] with the
   information that it is connected to some other node through
   Relationship[344], and Relationship[344] is deleted you will get
   InvalidRecordException (as you described). And of course if relationships
   are added to Node[15] these will not be seen at all by the EROGD (until
   Node[15] is evicted from the cache due to not being used for a while).
   - Neo4j also caches data on the filesystem level by memory mapping (mmap)
   hot regions of the store files. Writes to these regions will not be flushed
   to the actual file until the mmapped window is evicted due to being less hot
   than other windows, or when the transaction log for Neo4j is rotated. This
   means that from the p.o.v. of the EROGD the actual data written to disk will
   look inconsistent. Which would also lead to InvalidRecordExcaption. This
   situation is actually made even more complicated by the fact that unix
   operating systems will attempt to share memory mapped data from the same
   file between multiple processes, but the normal EGD and the EROGD will not
   make the same decisions on which regions to mmap, they might not even decide
   on the same size for mmap windows. We haven't tested how well different
   operating systems deal with reading data that was written to an mmap region
   through non-mmap syscalls from a different process, most likely this varies
   from OS to OS.

The second of these problems is of course the worst, since it cannot be
worked around. The first one can be mitigated by configuring Neo4j to not
use the object cache, by passing the cache_type=none parameter to the
constructor of the EROGD. This should really be made default for EROGD,
unless we decide to completely remove EROGD.

I hope that sheds some light on the reasons why you experience these
problems with EmbeddedReadOnlyGraphDatabase, and what the intention of
creating it was.

As a side note I can mention that I had a different idea for how to solve
the introspection-of-live-graph problem at the time
EmbeddedReadOnlyGraphDatabase was created: Create network based
implementation of the GraphDatabaseService API and connect directly to the
running instance. This would completely avoid the cache staleness problem,
but at the cost of network overhead for each graph operation, which is
probably fine for tooling purposes. With the JVM agent attach protocol it
would be possible to inject such a server into a running graph database that
wasn't originally configured for it. I in fact implemented this as the
RemoteGraphDatabase subproject.
Since my colleagues did not share my vision about that idea, this project
didn't receive much attention after its initial inception. It was also never
really used for these purposes, but rather misused for building
applications, leading us to deprecate the project. When we then later
discovered a severe bug in the implementation of the remote transaction
handling logic, we completely removed the project.
I still believe this to be a superior model for tools, but would build it
differently if I were to build it today.

-tobias

On Mon, Aug 1, 2011 at 4:48 PM, Jim Webber <j...@neotechnology.com> wrote:

> Hi Mathias,
>
> EmbeddedReadOnlyGraphDatabase is not quite what it seems, and I think
> should be deprecated/removed. The correct way for database instances to
> become consistent is through the HA protocol.
>
> Jim
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>

-- 
Tobias Ivarsson <tobias.ivars...@neotechnology.com>
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

Reply via email to