Cool Jawad, nice to see your commits. I haven't tried it out yet, but do you have any feedback on the characteristics of the Cassandra backend as opposed to files so far? Pros, cons?
Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Jun 14, 2010 at 7:11 PM, Jawad - CitizenPlace <ja...@citizenplace.com> wrote: > Hi, > I have committed my current work here : > https://trac.neo4j.org/browser/laboratory/users/jawad/cassandra > You will have to use the modified kernel to get it working : > https://trac.neo4j.org/browser/laboratory/users/jawad/neo4j-kernel > > Do not forget to read > https://trac.neo4j.org/browser/laboratory/users/jawad/cassandra/README > to know more about how to use CassandraPersistenceSource and the > numerous problems of the current implementation. > > I will be very pleased to get your opinion on this. > > Best, > Jawad > > Le 07/06/10 13:35, Peter Neubauer a écrit : >> Jawad, >> if you sign the CLA, >> http://wiki.neo4j.org/content/About_Contributor_License_Agreement, we >> might open a new branch in the laboratory to keep that code and sync >> it with the kernel? >> >> Cheers, >> >> /peter neubauer >> >> COO and Sales, Neo Technology >> >> GTalk: neubauer.peter >> Skype peter.neubauer >> Phone +46 704 106975 >> LinkedIn http://www.linkedin.com/in/neubauer >> Twitter http://twitter.com/peterneubauer >> >> http://www.neo4j.org - Your high performance graph database. >> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. >> >> >> >> On Mon, Jun 7, 2010 at 11:24 AM, Jawad Stouli<ja...@citizenplace.com> wrote: >> >>> Peter, >>> I can obviously send you by email my current work in progress but, as I >>> told you, I have some difficulties linked to the way Neo4j is designed. >>> >>> As I told you in my previous mail, it would be much better for Cassandra >>> to use non-long ids. >>> Another main point is that Cassandra does not need ids on properties. >>> Indeed properties do not need to be indexed (we can simply store a list >>> of couples property key/property value under the nodeId); maybe it is a >>> requirement of Lucene and the way you handled it (but Lucene can >>> directly handle indexes into Cassandra). The direct consequence is that >>> when calling nodeChangeProperty / nodeDeleteProperty / relChangeProperty >>> ... in PersistenceSource, I only get the property id when I would rather >>> need the property key id (otherwise, it would lead to one request to the >>> database for each property modification and many useless entries to keep >>> the link between property id and node id). >>> >>> All those points are linked to the fact that Cassandra does not handle >>> data the way Neo4j does and, for a really optimized solution, there are >>> some modifications that I had to do in the kernel (I can provide patches >>> for that). As I told you, I really want to keep my work compatible and >>> open (I would be really happy if you wanted to work on it), and I wanted >>> to understand how we can reconcile the trunk and my work. >>> >>> Best, >>> Jawad >>> >>> -- >>> CitizenPlace >>> ja...@citizenplace.com >>> >>> Le 07/06/10 07:15, Peter Neubauer a écrit : >>> >>>> Jawad, >>>> awesome work, is it possible to look at the source code somewhere? >>>> Would be great if this is usable to start on some work distribution >>>> between the graph engine and the scaling part of Cassandra. >>>> >>>> Cheers, >>>> >>>> /peter neubauer >>>> >>>> COO and Sales, Neo Technology >>>> >>>> GTalk: neubauer.peter >>>> Skype peter.neubauer >>>> Phone +46 704 106975 >>>> LinkedIn http://www.linkedin.com/in/neubauer >>>> Twitter http://twitter.com/peterneubauer >>>> >>>> http://www.neo4j.org - Your high performance graph database. >>>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. >>>> >>>> >>>> >>>> On Fri, Jun 4, 2010 at 3:26 PM, Jawad Stouli<ja...@citizenplace.com> >>>> wrote: >>>> >>>> >>>>> Hi everyone. >>>>> >>>>> I have been working on using Cassandra as a new PersistenceSource and I >>>>> now have a full working prototype. I still have many optimizations to do >>>>> and I do not expect the final solution to be as efficient as embedding >>>>> Neo4j but it should allow to benefit from the numerous advantages of >>>>> Cassandra (mainly in term of scaling and replication). I am still >>>>> designing and building many benchmarks to ensure that the project will >>>>> really be efficient enough for production (as Neo4j rest server would >>>>> be). I will keep you informed of that. >>>>> >>>>> Still, I have one major problem. Ids of nodes and relationships are >>>>> integers in Neo4j and nioneo. I perfectly understand that choice for >>>>> Neo4j, but I cannot see how to adapt this to a distributed environment >>>>> like Cassandra (at a given moment, you cannot ensure that an id is >>>>> really free for every node of the cluster and it could lead to a high >>>>> number failures and high latency when writing data). Therefore, I would >>>>> prefer using an UUID as it is much more common in Cassandra. The problem >>>>> is that the type of ids is hardcoded. Would it be possible to replace it >>>>> by an "Object" (or a String) in the kernel of Neo4j, which would not >>>>> change the way nioneo handles its ids ? I can provide a patch for that, >>>>> but I really wanted to have your view on that. >>>>> >>>>> Best, >>>>> Jawad >>>>> >>>>> Le 20/05/10 14:16, Johan Svensson a écrit : >>>>> >>>>> >>>>>> On Wed, May 19, 2010 at 4:48 PM, Jawad Stouli<ja...@citizenplace.com> >>>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Hi Johan and thanks for your answer. >>>>>>> >>>>>>> I think that I have figured out the major concepts behind >>>>>>> PersistenceSource >>>>>>> and I have a partially working prototype of Neo4j using Cassandra. As >>>>>>> you >>>>>>> stated it, I had to make some minor modifications to Neo4j core to >>>>>>> handle my >>>>>>> own PersistenceSource. >>>>>>> I really want to keep my work compatible with future versions of Neo4j, >>>>>>> would it be possible to include back the possibility to choose that >>>>>>> source ? >>>>>>> >>>>>>> >>>>>>> >>>>>> Yes we can certainly do that. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Some concepts remain unclear to me and I still have some unanswered >>>>>>> questions. >>>>>>> >>>>>>> - Why do you use a property index ? It seems to me that it is used to >>>>>>> store >>>>>>> an integer id / property key correspondence and then use it to store / >>>>>>> retrieve properties. Is it tightly coupled to the way nioneo handles >>>>>>> properties or am I missing something more important ? >>>>>>> >>>>>>> >>>>>>> >>>>>> Reason is it is faster to read/write an integer from/to disk than a >>>>>> string key. Typically you will have few unique property key names in >>>>>> any given system so it is an optimization to make add/remove/get >>>>>> property faster. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> - PersistenceSource, Transaction and Command have a clear role in the >>>>>>> xaframework. But I don't really see the difference between XaDataSource >>>>>>> and >>>>>>> XaConnection. >>>>>>> >>>>>>> >>>>>>> >>>>>> Yes that could have been done differently and I guess the reasons are >>>>>> the old JTA and XA specifications. There are discussions in progress >>>>>> on removing the dependency on JTA, write something new that fits >>>>>> better in modern "today containers/frameworks" (with optional support >>>>>> for JTA) and that would likely result in a cleaner API and >>>>>> implementation. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> - I don't understand the Logicallog and what this process is used for. >>>>>>> >>>>>>> >>>>>>> >>>>>> To make sure every transaction that has been committed will be "there" >>>>>> if the system crashes. The logical log contains all operations >>>>>> performed and the data will be forced to disk before each transaction >>>>>> commits. The log can then be used to put the normal store files in a >>>>>> consistent state after a crash. >>>>>> >>>>>> Regards, >>>>>> Johan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Thanks in advance, >>>>>>> Jawad >>>>>>> >>>>>>> On Tue, May 18, 2010 at 1:22 PM, Johan >>>>>>> Svensson<jo...@neotechnology.com>wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Have a look at org.neo4j.kernel.impl.nioneo.xa package. To implement a >>>>>>>> new persistence source start by creating new implementations of the >>>>>>>> NeoStoreXaDataSource and NeoStoreXaConnection classes. It is no longer >>>>>>>> possible to swap in a different persistence source using configuration >>>>>>>> (used to be) but if you modify the code in >>>>>>>> org.neo4j.kernel.GraphDbInstance.start method to register >>>>>>>> YourImplNeoStoreXaDataSource instead of the nioneo one (with same >>>>>>>> name) it should work. >>>>>>>> >>>>>>>> Back when we had Neo4j running on different relational databases >>>>>>>> (Postgres, Informix, MySQL) one big problem was that when the number >>>>>>>> of total relationships in the graph increased the time to figure out >>>>>>>> what relationships a specific node had also took longer time >>>>>>>> (regardless if that node had few relationships). It is important to >>>>>>>> have a getRelationships method were execution time is connected to >>>>>>>> number of relationships on that node to maintain high traversal speed >>>>>>>> as the graph increase in size. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Johan >>>>>>>> >>>>>>>> On Sat, May 15, 2010 at 8:03 PM, Jawad Stouli<ja...@citizenplace.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Hi everyone, >>>>>>>>> >>>>>>>>> I would be very interested in getting more information >>>>>>>>> that would help me implement new persistence sources. I have read >>>>>>>>> (there: >>>>>>>>> http://www.mail-archive.com/user@lists.neo4j.org/msg00006.html) that >>>>>>>>> it >>>>>>>>> should not be that difficult (or, at least, it is possible) but I >>>>>>>>> still >>>>>>>>> have some difficulties while navigating through the sources to >>>>>>>>> understand >>>>>>>>> exactly how it should be done. >>>>>>>>> >>>>>>>>> Besides, I have read that using MySQL was >>>>>>>>> less efficient than Nioneo. Was the difference really important ? >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> >>>>>>>>> Jawad >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> _______________________________________________ >>>>>> Neo mailing list >>>>>> User@lists.neo4j.org >>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Neo4j mailing list >>>>> User@lists.neo4j.org >>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Neo4j mailing list >>>> User@lists.neo4j.org >>>> https://lists.neo4j.org/mailman/listinfo/user >>>> >>>> >>> _______________________________________________ >>> Neo4j mailing list >>> User@lists.neo4j.org >>> https://lists.neo4j.org/mailman/listinfo/user >>> >>> >> _______________________________________________ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user