Cool Jawad,
nice to see your commits. I haven't tried it out yet, but do you have
any feedback on the characteristics of the Cassandra backend as
opposed to files so far? Pros, cons?

Cheers,

/peter neubauer

COO and Sales, Neo Technology

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Mon, Jun 14, 2010 at 7:11 PM, Jawad - CitizenPlace
<ja...@citizenplace.com> wrote:
> Hi,
> I have committed my current work here :
> https://trac.neo4j.org/browser/laboratory/users/jawad/cassandra
> You will have to use the modified kernel to get it working :
> https://trac.neo4j.org/browser/laboratory/users/jawad/neo4j-kernel
>
> Do not forget to read
> https://trac.neo4j.org/browser/laboratory/users/jawad/cassandra/README
> to know more about how to use CassandraPersistenceSource and the
> numerous problems of the current implementation.
>
> I will be very pleased to get your opinion on this.
>
> Best,
> Jawad
>
> Le 07/06/10 13:35, Peter Neubauer a écrit :
>> Jawad,
>> if you sign the CLA,
>> http://wiki.neo4j.org/content/About_Contributor_License_Agreement, we
>> might open a new branch in the laboratory to keep that code and sync
>> it with the kernel?
>>
>> Cheers,
>>
>> /peter neubauer
>>
>> COO and Sales, Neo Technology
>>
>> GTalk:      neubauer.peter
>> Skype       peter.neubauer
>> Phone       +46 704 106975
>> LinkedIn   http://www.linkedin.com/in/neubauer
>> Twitter      http://twitter.com/peterneubauer
>>
>> http://www.neo4j.org               - Your high performance graph database.
>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>>
>>
>>
>> On Mon, Jun 7, 2010 at 11:24 AM, Jawad Stouli<ja...@citizenplace.com>  wrote:
>>
>>> Peter,
>>> I can obviously send you by email my current work in progress but, as I
>>> told you, I have some difficulties linked to the way Neo4j is designed.
>>>
>>> As I told you in my previous mail, it would be much better for Cassandra
>>> to use non-long ids.
>>> Another main point is that Cassandra does not need ids on properties.
>>> Indeed properties do not need to be indexed (we can simply store a list
>>> of couples property key/property value under the nodeId); maybe it is a
>>> requirement of Lucene and the way you handled it (but Lucene can
>>> directly handle indexes into Cassandra). The direct consequence is that
>>> when calling nodeChangeProperty / nodeDeleteProperty / relChangeProperty
>>> ... in PersistenceSource, I only get the property id when I would rather
>>> need the property key id (otherwise, it would lead to one request to the
>>> database for each property modification and many useless entries to keep
>>> the link between property id and node id).
>>>
>>> All those points are linked to the fact that Cassandra does not handle
>>> data the way Neo4j does and, for a really optimized solution, there are
>>> some modifications that I had to do in the kernel (I can provide patches
>>> for that). As I told you, I really want to keep my work compatible and
>>> open (I would be really happy if you wanted to work on it), and I wanted
>>> to understand how we can reconcile the trunk and my work.
>>>
>>> Best,
>>> Jawad
>>>
>>> --
>>> CitizenPlace
>>> ja...@citizenplace.com
>>>
>>> Le 07/06/10 07:15, Peter Neubauer a écrit :
>>>
>>>> Jawad,
>>>> awesome work, is it possible to look at the source code somewhere?
>>>> Would be great if this is usable to start on some work distribution
>>>> between the graph engine and the scaling part of Cassandra.
>>>>
>>>> Cheers,
>>>>
>>>> /peter neubauer
>>>>
>>>> COO and Sales, Neo Technology
>>>>
>>>> GTalk:      neubauer.peter
>>>> Skype       peter.neubauer
>>>> Phone       +46 704 106975
>>>> LinkedIn   http://www.linkedin.com/in/neubauer
>>>> Twitter      http://twitter.com/peterneubauer
>>>>
>>>> http://www.neo4j.org               - Your high performance graph database.
>>>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>>>>
>>>>
>>>>
>>>> On Fri, Jun 4, 2010 at 3:26 PM, Jawad Stouli<ja...@citizenplace.com>    
>>>> wrote:
>>>>
>>>>
>>>>> Hi everyone.
>>>>>
>>>>> I have been working on using Cassandra as a new PersistenceSource and I
>>>>> now have a full working prototype. I still have many optimizations to do
>>>>> and I do not expect the final solution to be as efficient as embedding
>>>>> Neo4j but it should allow to benefit from the numerous advantages of
>>>>> Cassandra (mainly in term of scaling and replication). I am still
>>>>> designing and building many benchmarks to ensure that the project will
>>>>> really be efficient enough for production (as Neo4j rest server would
>>>>> be). I will keep you informed of that.
>>>>>
>>>>> Still, I have one major problem. Ids of nodes and relationships are
>>>>> integers in Neo4j and nioneo. I perfectly understand that choice for
>>>>> Neo4j, but I cannot see how to adapt this to a distributed environment
>>>>> like Cassandra (at a given moment, you cannot ensure that an id is
>>>>> really free for every node of the cluster and it could lead to a high
>>>>> number failures and high latency when writing data). Therefore, I would
>>>>> prefer using an UUID as it is much more common in Cassandra. The problem
>>>>> is that the type of ids is hardcoded. Would it be possible to replace it
>>>>> by an "Object" (or a String) in the kernel of Neo4j, which would not
>>>>> change the way nioneo handles its ids ? I can provide a patch for that,
>>>>> but I really wanted to have your view on that.
>>>>>
>>>>> Best,
>>>>> Jawad
>>>>>
>>>>> Le 20/05/10 14:16, Johan Svensson a écrit :
>>>>>
>>>>>
>>>>>> On Wed, May 19, 2010 at 4:48 PM, Jawad Stouli<ja...@citizenplace.com>    
>>>>>>   wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi Johan and thanks for your answer.
>>>>>>>
>>>>>>> I think that I have figured out the major concepts behind 
>>>>>>> PersistenceSource
>>>>>>> and I have a partially working prototype of Neo4j using Cassandra. As 
>>>>>>> you
>>>>>>> stated it, I had to make some minor modifications to Neo4j core to 
>>>>>>> handle my
>>>>>>> own PersistenceSource.
>>>>>>> I really want to keep my work compatible with future versions of Neo4j,
>>>>>>> would it be possible to include back the possibility to choose that 
>>>>>>> source ?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Yes we can certainly do that.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Some concepts remain unclear to me and I still have some unanswered
>>>>>>> questions.
>>>>>>>
>>>>>>> - Why do you use a property index ? It seems to me that it is used to 
>>>>>>> store
>>>>>>> an integer id / property key correspondence and then use it to store /
>>>>>>> retrieve properties. Is it tightly coupled to the way nioneo handles
>>>>>>> properties or am I missing something more important ?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Reason is it is faster to read/write an integer from/to disk than a
>>>>>> string key. Typically you will have few unique property key names in
>>>>>> any given system so it is an optimization to make add/remove/get
>>>>>> property faster.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> - PersistenceSource, Transaction and Command have a clear role in the
>>>>>>> xaframework. But I don't really see the difference between XaDataSource 
>>>>>>> and
>>>>>>> XaConnection.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Yes that could have been done differently and I guess the reasons are
>>>>>> the old JTA and XA specifications. There are discussions in progress
>>>>>> on removing the dependency on JTA, write something new that fits
>>>>>> better in modern "today containers/frameworks" (with optional support
>>>>>> for JTA) and that would likely result in a cleaner API and
>>>>>> implementation.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> - I don't understand the Logicallog and what this process is used for.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> To make sure every transaction that has been committed will be "there"
>>>>>> if the system crashes. The logical log contains all operations
>>>>>> performed and the data will be forced to disk before each transaction
>>>>>> commits. The log can then be used to put the normal store files in a
>>>>>> consistent state after a crash.
>>>>>>
>>>>>> Regards,
>>>>>> Johan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Thanks in advance,
>>>>>>> Jawad
>>>>>>>
>>>>>>> On Tue, May 18, 2010 at 1:22 PM, Johan 
>>>>>>> Svensson<jo...@neotechnology.com>wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Have a look at org.neo4j.kernel.impl.nioneo.xa package. To implement a
>>>>>>>> new persistence source start by creating new implementations of the
>>>>>>>> NeoStoreXaDataSource and NeoStoreXaConnection classes. It is no longer
>>>>>>>> possible to swap in a different persistence source using configuration
>>>>>>>> (used to be) but if you modify the code in
>>>>>>>> org.neo4j.kernel.GraphDbInstance.start method to register
>>>>>>>> YourImplNeoStoreXaDataSource instead of the nioneo one (with same
>>>>>>>> name) it should work.
>>>>>>>>
>>>>>>>> Back when we had Neo4j running on different relational databases
>>>>>>>> (Postgres, Informix, MySQL) one big problem was that when the number
>>>>>>>> of total relationships in the graph increased the time to figure out
>>>>>>>> what relationships a specific node had also took longer time
>>>>>>>> (regardless if that node had few relationships). It is important to
>>>>>>>> have a getRelationships method were execution time is connected to
>>>>>>>> number of relationships on that node to maintain high traversal speed
>>>>>>>> as the graph increase in size.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Johan
>>>>>>>>
>>>>>>>> On Sat, May 15, 2010 at 8:03 PM, Jawad Stouli<ja...@citizenplace.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi everyone,
>>>>>>>>>
>>>>>>>>> I would be very interested in getting more information
>>>>>>>>> that would help me implement new persistence sources. I have read 
>>>>>>>>> (there:
>>>>>>>>> http://www.mail-archive.com/user@lists.neo4j.org/msg00006.html) that 
>>>>>>>>> it
>>>>>>>>> should not be that difficult (or, at least, it is possible) but I 
>>>>>>>>> still
>>>>>>>>> have some difficulties while navigating through the sources to 
>>>>>>>>> understand
>>>>>>>>> exactly how it should be done.
>>>>>>>>>
>>>>>>>>> Besides, I have read that using MySQL was
>>>>>>>>> less efficient than Nioneo. Was the difference really important ?
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Jawad
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>> _______________________________________________
>>>>>> Neo mailing list
>>>>>> User@lists.neo4j.org
>>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Neo4j mailing list
>>>>> User@lists.neo4j.org
>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Neo4j mailing list
>>>> User@lists.neo4j.org
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>
>>>>
>>> _______________________________________________
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>>>
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to