Re: [Neo4j] [Neo] Implementing new persistence source

Jawad Stouli Mon, 07 Jun 2010 02:24:27 -0700

Peter,
I can obviously send you by email my current work in progress but, as I 
told you, I have some difficulties linked to the way Neo4j is designed.


As I told you in my previous mail, it would be much better for Cassandra 
to use non-long ids.
Another main point is that Cassandra does not need ids on properties. 
Indeed properties do not need to be indexed (we can simply store a list 
of couples property key/property value under the nodeId); maybe it is a 
requirement of Lucene and the way you handled it (but Lucene can 
directly handle indexes into Cassandra). The direct consequence is that 
when calling nodeChangeProperty / nodeDeleteProperty / relChangeProperty 
... in PersistenceSource, I only get the property id when I would rather 
need the property key id (otherwise, it would lead to one request to the 
database for each property modification and many useless entries to keep 
the link between property id and node id).

All those points are linked to the fact that Cassandra does not handle 
data the way Neo4j does and, for a really optimized solution, there are 
some modifications that I had to do in the kernel (I can provide patches 
for that). As I told you, I really want to keep my work compatible and 
open (I would be really happy if you wanted to work on it), and I wanted 
to understand how we can reconcile the trunk and my work.

Best,
Jawad

--
CitizenPlace
ja...@citizenplace.com

Le 07/06/10 07:15, Peter Neubauer a écrit :
> Jawad,
> awesome work, is it possible to look at the source code somewhere?
> Would be great if this is usable to start on some work distribution
> between the graph engine and the scaling part of Cassandra.
>
> Cheers,
>
> /peter neubauer
>
> COO and Sales, Neo Technology
>
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
>
> http://www.neo4j.org               - Your high performance graph database.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>
>
>
> On Fri, Jun 4, 2010 at 3:26 PM, Jawad Stouli<ja...@citizenplace.com>  wrote:
>    
>> Hi everyone.
>>
>> I have been working on using Cassandra as a new PersistenceSource and I
>> now have a full working prototype. I still have many optimizations to do
>> and I do not expect the final solution to be as efficient as embedding
>> Neo4j but it should allow to benefit from the numerous advantages of
>> Cassandra (mainly in term of scaling and replication). I am still
>> designing and building many benchmarks to ensure that the project will
>> really be efficient enough for production (as Neo4j rest server would
>> be). I will keep you informed of that.
>>
>> Still, I have one major problem. Ids of nodes and relationships are
>> integers in Neo4j and nioneo. I perfectly understand that choice for
>> Neo4j, but I cannot see how to adapt this to a distributed environment
>> like Cassandra (at a given moment, you cannot ensure that an id is
>> really free for every node of the cluster and it could lead to a high
>> number failures and high latency when writing data). Therefore, I would
>> prefer using an UUID as it is much more common in Cassandra. The problem
>> is that the type of ids is hardcoded. Would it be possible to replace it
>> by an "Object" (or a String) in the kernel of Neo4j, which would not
>> change the way nioneo handles its ids ? I can provide a patch for that,
>> but I really wanted to have your view on that.
>>
>> Best,
>> Jawad
>>
>> Le 20/05/10 14:16, Johan Svensson a écrit :
>>      
>>> On Wed, May 19, 2010 at 4:48 PM, Jawad Stouli<ja...@citizenplace.com>    
>>> wrote:
>>>
>>>        
>>>> Hi Johan and thanks for your answer.
>>>>
>>>> I think that I have figured out the major concepts behind PersistenceSource
>>>> and I have a partially working prototype of Neo4j using Cassandra. As you
>>>> stated it, I had to make some minor modifications to Neo4j core to handle 
>>>> my
>>>> own PersistenceSource.
>>>> I really want to keep my work compatible with future versions of Neo4j,
>>>> would it be possible to include back the possibility to choose that source 
>>>> ?
>>>>
>>>>          
>>> Yes we can certainly do that.
>>>
>>>
>>>        
>>>> Some concepts remain unclear to me and I still have some unanswered
>>>> questions.
>>>>
>>>> - Why do you use a property index ? It seems to me that it is used to store
>>>> an integer id / property key correspondence and then use it to store /
>>>> retrieve properties. Is it tightly coupled to the way nioneo handles
>>>> properties or am I missing something more important ?
>>>>
>>>>          
>>> Reason is it is faster to read/write an integer from/to disk than a
>>> string key. Typically you will have few unique property key names in
>>> any given system so it is an optimization to make add/remove/get
>>> property faster.
>>>
>>>
>>>        
>>>> - PersistenceSource, Transaction and Command have a clear role in the
>>>> xaframework. But I don't really see the difference between XaDataSource and
>>>> XaConnection.
>>>>
>>>>          
>>> Yes that could have been done differently and I guess the reasons are
>>> the old JTA and XA specifications. There are discussions in progress
>>> on removing the dependency on JTA, write something new that fits
>>> better in modern "today containers/frameworks" (with optional support
>>> for JTA) and that would likely result in a cleaner API and
>>> implementation.
>>>
>>>
>>>        
>>>> - I don't understand the Logicallog and what this process is used for.
>>>>
>>>>          
>>> To make sure every transaction that has been committed will be "there"
>>> if the system crashes. The logical log contains all operations
>>> performed and the data will be forced to disk before each transaction
>>> commits. The log can then be used to put the normal store files in a
>>> consistent state after a crash.
>>>
>>> Regards,
>>> Johan
>>>
>>>
>>>        
>>>> Thanks in advance,
>>>> Jawad
>>>>
>>>> On Tue, May 18, 2010 at 1:22 PM, Johan 
>>>> Svensson<jo...@neotechnology.com>wrote:
>>>>
>>>>
>>>>          
>>>>> Hi,
>>>>>
>>>>> Have a look at org.neo4j.kernel.impl.nioneo.xa package. To implement a
>>>>> new persistence source start by creating new implementations of the
>>>>> NeoStoreXaDataSource and NeoStoreXaConnection classes. It is no longer
>>>>> possible to swap in a different persistence source using configuration
>>>>> (used to be) but if you modify the code in
>>>>> org.neo4j.kernel.GraphDbInstance.start method to register
>>>>> YourImplNeoStoreXaDataSource instead of the nioneo one (with same
>>>>> name) it should work.
>>>>>
>>>>> Back when we had Neo4j running on different relational databases
>>>>> (Postgres, Informix, MySQL) one big problem was that when the number
>>>>> of total relationships in the graph increased the time to figure out
>>>>> what relationships a specific node had also took longer time
>>>>> (regardless if that node had few relationships). It is important to
>>>>> have a getRelationships method were execution time is connected to
>>>>> number of relationships on that node to maintain high traversal speed
>>>>> as the graph increase in size.
>>>>>
>>>>> Regards,
>>>>> Johan
>>>>>
>>>>> On Sat, May 15, 2010 at 8:03 PM, Jawad Stouli<ja...@citizenplace.com>
>>>>> wrote:
>>>>>
>>>>>            
>>>>>> Hi everyone,
>>>>>>
>>>>>> I would be very interested in getting more information
>>>>>> that would help me implement new persistence sources. I have read (there:
>>>>>> http://www.mail-archive.com/user@lists.neo4j.org/msg00006.html) that it
>>>>>> should not be that difficult (or, at least, it is possible) but I still
>>>>>> have some difficulties while navigating through the sources to understand
>>>>>> exactly how it should be done.
>>>>>>
>>>>>> Besides, I have read that using MySQL was
>>>>>> less efficient than Nioneo. Was the difference really important ?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>>
>>>>>> Jawad
>>>>>>
>>>>>>              
>>> _______________________________________________
>>> Neo mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>>>
>>>        
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>>      
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>    

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] [Neo] Implementing new persistence source

Reply via email to