Jawad, awesome work, is it possible to look at the source code somewhere? Would be great if this is usable to start on some work distribution between the graph engine and the scaling part of Cassandra.
Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Fri, Jun 4, 2010 at 3:26 PM, Jawad Stouli <ja...@citizenplace.com> wrote: > Hi everyone. > > I have been working on using Cassandra as a new PersistenceSource and I > now have a full working prototype. I still have many optimizations to do > and I do not expect the final solution to be as efficient as embedding > Neo4j but it should allow to benefit from the numerous advantages of > Cassandra (mainly in term of scaling and replication). I am still > designing and building many benchmarks to ensure that the project will > really be efficient enough for production (as Neo4j rest server would > be). I will keep you informed of that. > > Still, I have one major problem. Ids of nodes and relationships are > integers in Neo4j and nioneo. I perfectly understand that choice for > Neo4j, but I cannot see how to adapt this to a distributed environment > like Cassandra (at a given moment, you cannot ensure that an id is > really free for every node of the cluster and it could lead to a high > number failures and high latency when writing data). Therefore, I would > prefer using an UUID as it is much more common in Cassandra. The problem > is that the type of ids is hardcoded. Would it be possible to replace it > by an "Object" (or a String) in the kernel of Neo4j, which would not > change the way nioneo handles its ids ? I can provide a patch for that, > but I really wanted to have your view on that. > > Best, > Jawad > > Le 20/05/10 14:16, Johan Svensson a écrit : >> On Wed, May 19, 2010 at 4:48 PM, Jawad Stouli<ja...@citizenplace.com> wrote: >> >>> Hi Johan and thanks for your answer. >>> >>> I think that I have figured out the major concepts behind PersistenceSource >>> and I have a partially working prototype of Neo4j using Cassandra. As you >>> stated it, I had to make some minor modifications to Neo4j core to handle my >>> own PersistenceSource. >>> I really want to keep my work compatible with future versions of Neo4j, >>> would it be possible to include back the possibility to choose that source ? >>> >> Yes we can certainly do that. >> >> >>> Some concepts remain unclear to me and I still have some unanswered >>> questions. >>> >>> - Why do you use a property index ? It seems to me that it is used to store >>> an integer id / property key correspondence and then use it to store / >>> retrieve properties. Is it tightly coupled to the way nioneo handles >>> properties or am I missing something more important ? >>> >> Reason is it is faster to read/write an integer from/to disk than a >> string key. Typically you will have few unique property key names in >> any given system so it is an optimization to make add/remove/get >> property faster. >> >> >>> - PersistenceSource, Transaction and Command have a clear role in the >>> xaframework. But I don't really see the difference between XaDataSource and >>> XaConnection. >>> >> Yes that could have been done differently and I guess the reasons are >> the old JTA and XA specifications. There are discussions in progress >> on removing the dependency on JTA, write something new that fits >> better in modern "today containers/frameworks" (with optional support >> for JTA) and that would likely result in a cleaner API and >> implementation. >> >> >>> - I don't understand the Logicallog and what this process is used for. >>> >> To make sure every transaction that has been committed will be "there" >> if the system crashes. The logical log contains all operations >> performed and the data will be forced to disk before each transaction >> commits. The log can then be used to put the normal store files in a >> consistent state after a crash. >> >> Regards, >> Johan >> >> >>> Thanks in advance, >>> Jawad >>> >>> On Tue, May 18, 2010 at 1:22 PM, Johan >>> Svensson<jo...@neotechnology.com>wrote: >>> >>> >>>> Hi, >>>> >>>> Have a look at org.neo4j.kernel.impl.nioneo.xa package. To implement a >>>> new persistence source start by creating new implementations of the >>>> NeoStoreXaDataSource and NeoStoreXaConnection classes. It is no longer >>>> possible to swap in a different persistence source using configuration >>>> (used to be) but if you modify the code in >>>> org.neo4j.kernel.GraphDbInstance.start method to register >>>> YourImplNeoStoreXaDataSource instead of the nioneo one (with same >>>> name) it should work. >>>> >>>> Back when we had Neo4j running on different relational databases >>>> (Postgres, Informix, MySQL) one big problem was that when the number >>>> of total relationships in the graph increased the time to figure out >>>> what relationships a specific node had also took longer time >>>> (regardless if that node had few relationships). It is important to >>>> have a getRelationships method were execution time is connected to >>>> number of relationships on that node to maintain high traversal speed >>>> as the graph increase in size. >>>> >>>> Regards, >>>> Johan >>>> >>>> On Sat, May 15, 2010 at 8:03 PM, Jawad Stouli<ja...@citizenplace.com> >>>> wrote: >>>> >>>>> Hi everyone, >>>>> >>>>> I would be very interested in getting more information >>>>> that would help me implement new persistence sources. I have read (there: >>>>> http://www.mail-archive.com/user@lists.neo4j.org/msg00006.html) that it >>>>> should not be that difficult (or, at least, it is possible) but I still >>>>> have some difficulties while navigating through the sources to understand >>>>> exactly how it should be done. >>>>> >>>>> Besides, I have read that using MySQL was >>>>> less efficient than Nioneo. Was the difference really important ? >>>>> >>>>> Best, >>>>> >>>>> >>>>> Jawad >>>>> >> _______________________________________________ >> Neo mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> >> > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user