Hi all,

Accessing a remote database for read-centric purposes should be done through 
HA. Even if we could bind a read-only local instance of EGD to a data store on 
disk, the caching will become out of sync with respect to the on-disk store. 

HA avoids this because it's a proper protocol for synchronising databases.

Jim


On 2 Aug 2011, at 21:01, Utility Mail wrote:

> I agree!
> In my opinion a remote access to a live instance of a GD is really to be 
> hoped. Let me explain my current test case with neo4j: I created an instance 
> of an EmbeddedGraphDatabase that ingests continously csv files coming froma a 
> polling service. At the same time I need to create an indipendent service 
> indipentent from the first one that query (to retrive and not to modify) the 
> GD. I'm tried with EROGD but the active index segment has become corrupted! 
> Even if EGD is thread safe and I can create multiple thread sharing the same 
> instance of GD what to do when, like in my case, I need to have indipendent 
> service (app) accessing at the same time to the EGD?
> 
> 
> Paolo Forte
> 
> p.s.
> I'm not sure is correlated and for sure is a lack of my knowledge of 
> webadmin, but how can I control my EGD status (number of nodes,edges, etc.) 
> via webadmin while it is ingesting new data?
> 
> 
> 
> Il giorno 01/ago/2011, alle ore 20:52, Tobias Ivarsson 
> <tobias.ivars...@neotechnology.com> ha scritto:
> 
>> I think a bit of elaboration might be in order.
>> 
>> EmbeddedReadOnlyGraphDatabase was created for one specific purpose:
>> 
>> Being able to interactively introspect a graph without having to shut down
>> the application that uses it.
>> 
>> Specifically the tools that we wanted to support with this were the Neo4j
>> shell and Neoclipse.
>> 
>> EmbeddedReadOnlyGraphDatabase (EROGD) has two major issues with way caching
>> is done internally in Neo4j (one issue with each cache):
>> 
>>  - When the EROGD reads data from the file system it will, like a normal
>>  EGD, cache the node and relationship objects. If a normal EGD modifies the
>>  graph "under the feet" of the EROGD, there is no way for the EROGD to know
>>  that the data in cache is now stale, which will lead to an inconsistent view
>>  of the graph. If for example the EROGD has cached Node[15] with the
>>  information that it is connected to some other node through
>>  Relationship[344], and Relationship[344] is deleted you will get
>>  InvalidRecordException (as you described). And of course if relationships
>>  are added to Node[15] these will not be seen at all by the EROGD (until
>>  Node[15] is evicted from the cache due to not being used for a while).
>>  - Neo4j also caches data on the filesystem level by memory mapping (mmap)
>>  hot regions of the store files. Writes to these regions will not be flushed
>>  to the actual file until the mmapped window is evicted due to being less hot
>>  than other windows, or when the transaction log for Neo4j is rotated. This
>>  means that from the p.o.v. of the EROGD the actual data written to disk will
>>  look inconsistent. Which would also lead to InvalidRecordExcaption. This
>>  situation is actually made even more complicated by the fact that unix
>>  operating systems will attempt to share memory mapped data from the same
>>  file between multiple processes, but the normal EGD and the EROGD will not
>>  make the same decisions on which regions to mmap, they might not even decide
>>  on the same size for mmap windows. We haven't tested how well different
>>  operating systems deal with reading data that was written to an mmap region
>>  through non-mmap syscalls from a different process, most likely this varies
>>  from OS to OS.
>> 
>> The second of these problems is of course the worst, since it cannot be
>> worked around. The first one can be mitigated by configuring Neo4j to not
>> use the object cache, by passing the cache_type=none parameter to the
>> constructor of the EROGD. This should really be made default for EROGD,
>> unless we decide to completely remove EROGD.
>> 
>> I hope that sheds some light on the reasons why you experience these
>> problems with EmbeddedReadOnlyGraphDatabase, and what the intention of
>> creating it was.
>> 
>> As a side note I can mention that I had a different idea for how to solve
>> the introspection-of-live-graph problem at the time
>> EmbeddedReadOnlyGraphDatabase was created: Create network based
>> implementation of the GraphDatabaseService API and connect directly to the
>> running instance. This would completely avoid the cache staleness problem,
>> but at the cost of network overhead for each graph operation, which is
>> probably fine for tooling purposes. With the JVM agent attach protocol it
>> would be possible to inject such a server into a running graph database that
>> wasn't originally configured for it. I in fact implemented this as the
>> RemoteGraphDatabase subproject.
>> Since my colleagues did not share my vision about that idea, this project
>> didn't receive much attention after its initial inception. It was also never
>> really used for these purposes, but rather misused for building
>> applications, leading us to deprecate the project. When we then later
>> discovered a severe bug in the implementation of the remote transaction
>> handling logic, we completely removed the project.
>> I still believe this to be a superior model for tools, but would build it
>> differently if I were to build it today.
>> 
>> -tobias
>> 
>> On Mon, Aug 1, 2011 at 4:48 PM, Jim Webber <j...@neotechnology.com> wrote:
>> 
>>> Hi Mathias,
>>> 
>>> EmbeddedReadOnlyGraphDatabase is not quite what it seems, and I think
>>> should be deprecated/removed. The correct way for database instances to
>>> become consistent is through the HA protocol.
>>> 
>>> Jim
>>> _______________________________________________
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>>> 
>> 
>> 
>> 
>> -- 
>> Tobias Ivarsson <tobias.ivars...@neotechnology.com>
>> Hacker, Neo Technology
>> www.neotechnology.com
>> Cellphone: +46 706 534857
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to