Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

Utility Mail Tue, 02 Aug 2011 12:16:31 -0700

I agree!
In my opinion a remote access to a live instance of a GD is really to be hoped. 
Let me explain my current test case with neo4j: I created an instance of an 
EmbeddedGraphDatabase that ingests continously csv files coming froma a polling 
service. At the same time I need to create an indipendent service indipentent 
from the first one that query (to retrive and not to modify) the GD. I'm tried 
with EROGD but the active index segment has become corrupted! Even if EGD is 
thread safe and I can create multiple thread sharing the same instance of GD 
what to do when, like in my case, I need to have indipendent service (app) 
accessing at the same time to the EGD?



Paolo Forte

p.s.
I'm not sure is correlated and for sure is a lack of my knowledge of webadmin, 
but how can I control my EGD status (number of nodes,edges, etc.) via webadmin 
while it is ingesting new data?



Il giorno 01/ago/2011, alle ore 20:52, Tobias Ivarsson 
<tobias.ivars...@neotechnology.com> ha scritto:

> I think a bit of elaboration might be in order.
> 
> EmbeddedReadOnlyGraphDatabase was created for one specific purpose:
> 
> Being able to interactively introspect a graph without having to shut down
> the application that uses it.
> 
> Specifically the tools that we wanted to support with this were the Neo4j
> shell and Neoclipse.
> 
> EmbeddedReadOnlyGraphDatabase (EROGD) has two major issues with way caching
> is done internally in Neo4j (one issue with each cache):
> 
>   - When the EROGD reads data from the file system it will, like a normal
>   EGD, cache the node and relationship objects. If a normal EGD modifies the
>   graph "under the feet" of the EROGD, there is no way for the EROGD to know
>   that the data in cache is now stale, which will lead to an inconsistent view
>   of the graph. If for example the EROGD has cached Node[15] with the
>   information that it is connected to some other node through
>   Relationship[344], and Relationship[344] is deleted you will get
>   InvalidRecordException (as you described). And of course if relationships
>   are added to Node[15] these will not be seen at all by the EROGD (until
>   Node[15] is evicted from the cache due to not being used for a while).
>   - Neo4j also caches data on the filesystem level by memory mapping (mmap)
>   hot regions of the store files. Writes to these regions will not be flushed
>   to the actual file until the mmapped window is evicted due to being less hot
>   than other windows, or when the transaction log for Neo4j is rotated. This
>   means that from the p.o.v. of the EROGD the actual data written to disk will
>   look inconsistent. Which would also lead to InvalidRecordExcaption. This
>   situation is actually made even more complicated by the fact that unix
>   operating systems will attempt to share memory mapped data from the same
>   file between multiple processes, but the normal EGD and the EROGD will not
>   make the same decisions on which regions to mmap, they might not even decide
>   on the same size for mmap windows. We haven't tested how well different
>   operating systems deal with reading data that was written to an mmap region
>   through non-mmap syscalls from a different process, most likely this varies
>   from OS to OS.
> 
> The second of these problems is of course the worst, since it cannot be
> worked around. The first one can be mitigated by configuring Neo4j to not
> use the object cache, by passing the cache_type=none parameter to the
> constructor of the EROGD. This should really be made default for EROGD,
> unless we decide to completely remove EROGD.
> 
> I hope that sheds some light on the reasons why you experience these
> problems with EmbeddedReadOnlyGraphDatabase, and what the intention of
> creating it was.
> 
> As a side note I can mention that I had a different idea for how to solve
> the introspection-of-live-graph problem at the time
> EmbeddedReadOnlyGraphDatabase was created: Create network based
> implementation of the GraphDatabaseService API and connect directly to the
> running instance. This would completely avoid the cache staleness problem,
> but at the cost of network overhead for each graph operation, which is
> probably fine for tooling purposes. With the JVM agent attach protocol it
> would be possible to inject such a server into a running graph database that
> wasn't originally configured for it. I in fact implemented this as the
> RemoteGraphDatabase subproject.
> Since my colleagues did not share my vision about that idea, this project
> didn't receive much attention after its initial inception. It was also never
> really used for these purposes, but rather misused for building
> applications, leading us to deprecate the project. When we then later
> discovered a severe bug in the implementation of the remote transaction
> handling logic, we completely removed the project.
> I still believe this to be a superior model for tools, but would build it
> differently if I were to build it today.
> 
> -tobias
> 
> On Mon, Aug 1, 2011 at 4:48 PM, Jim Webber <j...@neotechnology.com> wrote:
> 
>> Hi Mathias,
>> 
>> EmbeddedReadOnlyGraphDatabase is not quite what it seems, and I think
>> should be deprecated/removed. The correct way for database instances to
>> become consistent is through the HA protocol.
>> 
>> Jim
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> 
> 
> 
> -- 
> Tobias Ivarsson <tobias.ivars...@neotechnology.com>
> Hacker, Neo Technology
> www.neotechnology.com
> Cellphone: +46 706 534857
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
> 
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Synchronization of EmbeddedReadOnlyGraphDatabase - Bug?

Reply via email to