Re: Forming a cluster of embedded Cassandra instances

Jan Kesten Sun, 14 Feb 2016 21:57:07 -0800

Hi,

the embedded cassandra to speedup entering the project may will work for 
developers, we used it for junit. But a simple clone and maven build - I guess 
it will end in a single node cassandra cluster. Remember cassandra is a 
distributed database, one will need more than one node to get performance and 
fault tolerance. Also I would not recommend adding and removing of cluster 
nodes at high frequency with application start-stop-cycles.


To help in getting things up and running, provide a small readme for 
downloading and starting cassandra. For mac and linux unpacking the tar.gz and 
running cassandra.sh is not too complicated. Or use a hint to the DataStax 
Community Edition installers. Apart from installing Java that is a five minute 
stop to a single node "TestCluster".

Configuring a distributed setup is a bit more or a lot more difficult and 
definitly needs more understanding and planning. 

Just as a hint and offtopic: I saw people using cassandra as application glue 
for interprocess communication where every app server started a node (for 
communication, sessions and as queue and so on).  If that is eventually a use 
case - have a look at hazelcast. 

Jan

Von meinem iPhone gesendet

> Am 14.02.2016 um 23:26 schrieb John Sanda <john.sa...@gmail.com>:
> 
> The motivation was to make it easy for someone to get up and running quickly 
> with the project. Clone the git repo, run the maven build, and then you are 
> all set. It definitely does lower the learning curve for someone just getting 
> started with a project and who is not really thinking about Cassandra. It 
> also is convenient for non-devs who need to quickly get the project up and 
> running. For development, we have people working on Linux, Mac OS X, and 
> Windows. I am not a Windows user and not even sure if ccm works on Windows, 
> so ccm can't be the de factor standard for development.
> 
>> On Sun, Feb 14, 2016 at 2:52 PM, Jack Krupansky <jack.krupan...@gmail.com> 
>> wrote:
>> What motivated the use of an embedded instance for development - as opposed 
>> to simply spawning a process for Cassandra?
>> 
>> 
>> 
>> -- Jack Krupansky
>> 
>>> On Sun, Feb 14, 2016 at 2:05 PM, John Sanda <john.sa...@gmail.com> wrote:
>>> The project I work on day to day uses an embedded instance of Cassandra, 
>>> but it is intended for primarily for development. We embed Cassandra in a 
>>> WildFly (i.e., JBoss) server. It is packaged and deployed as an EAR. I 
>>> personally do not do this. I use and recommend ccm for development. If you 
>>> do you WildFly, there is also wildfly-cassandra which deploys Cassandra as 
>>> a custom WildFly extension. In other words it is deployed in WildFly like 
>>> other subsystems like EJB, web, etc, not like an application. There isn't a 
>>> whole lot of active development on this, but it could be another option.
>>> 
>>> For production, we have to support single node clusters (not embedded 
>>> though), and it has been challenging for pretty much all the reasons you 
>>> find people saying not to do so.
>>> 
>>> As for failure detection and cluster membership changes, are you using the 
>>> Datastax driver? You can register an event listener with the driver to 
>>> receive notifications for those things.
>>> 
>>>> On Sat, Feb 13, 2016 at 6:33 PM, Jonathan Haddad <j...@jonhaddad.com> 
>>>> wrote:
>>>> +1 to what jack said. Don't mess with embedded till you understand the 
>>>> basics of the db. You're not making your system any less complex, I'd say 
>>>> you're most likely going to shoot yourself in the foot. 
>>>>> On Sat, Feb 13, 2016 at 2:22 PM Jack Krupansky <jack.krupan...@gmail.com> 
>>>>> wrote:
>>>>> HA requires an odd number of replicas - 3, 5, 7 - so that split-brain can 
>>>>> be avoided. Two nodes would not support HA. You need to be able to reach 
>>>>> a quorum, which is defined as n/2+1 where n is the number of replicas. 
>>>>> IOW, you cannot update the data if a quorum cannot be reached. The data 
>>>>> on any given node needs to be replicated on at least two other nodes.
>>>>> 
>>>>> Embedded Cassandra is only for extremely sophisticated developers - not 
>>>>> those who are new to Cassandra, with a "superficial understanding".
>>>>> 
>>>>> As a general proposition, you should not be running application code on 
>>>>> Cassandra nodes.
>>>>> 
>>>>> That said, if any of the senior Cassandra developers wish to personally 
>>>>> support your efforts towards embedded clusters, they are certainly free 
>>>>> to do so. we'll see if any of them step forward.
>>>>> 
>>>>> 
>>>>> -- Jack Krupansky
>>>>> 
>>>>>> On Sat, Feb 13, 2016 at 3:47 PM, Binil Thomas 
>>>>>> <binil.thomas.pub...@gmail.com> wrote:
>>>>>> Hi all,
>>>>>> 
>>>>>> TL;DR: I have a very superficial understanding of Cassandra and am 
>>>>>> currently evaluating it for a project. 
>>>>>> 
>>>>>> * Can Cassandra be embedded into another JVM application? 
>>>>>> * Can such embedded instances form a cluster? 
>>>>>> * Can the application use the the failure detection and cluster 
>>>>>> membership dissemination infrastructure of embedded Cassandra?
>>>>>> 
>>>>>> ----  
>>>>>> 
>>>>>> I am in the process of re-packaging a SaaS system written in Java to be 
>>>>>> deployed on-premise by customers. The SaaS system currently uses AWS 
>>>>>> DynamoDB. The data storage needs for this application are modest, but I 
>>>>>> would like to keep the deployment complexity to a minimum. Here are 
>>>>>> three different usecases the on-premise system should support:
>>>>>> 
>>>>>> 1. single-node deployments with minimal complexity
>>>>>> 2. two-node HA deployments; the data and processing needs dictated by 
>>>>>> the load on the system are well under what a single node can do, but the 
>>>>>> second node is there to satisfy the HA requirement as a hot standby
>>>>>> 3. a multi-node clustered deployment, where higher operational 
>>>>>> complexity is justified
>>>>>> 
>>>>>> I am considering Cassandra for these usecases. 
>>>>>> 
>>>>>> For usecase #1, I hope to embed Cassandra into the same JVM as my 
>>>>>> application. I read on the web that CassandraDaemon can be used this 
>>>>>> way. Is that accurate? What other applications embed Cassandra this way? 
>>>>>> I *think* JetBrains Upsource does, but do you know other ones? 
>>>>>> (Incidentally, my Java application embeds Jetty webserver also). 
>>>>>> 
>>>>>> For usecase #2, I am hoping that I can deploy two instances of this 
>>>>>> ensemble and have the embedded Cassandra instances form a cluster. If I 
>>>>>> configure every write to be replicated on both nodes synchronously, then 
>>>>>> it will satisfy the HA needs of this usecase. Is it feasible to form 
>>>>>> clusters of embedded Cassandra instances?
>>>>>> 
>>>>>> For usecase #3, I can form a large cluster of the ensemble where all 
>>>>>> writes are replicated synchronously to a quorum of nodes. 
>>>>>> 
>>>>>> Finally, in usecase #2 and #3, I'd like to use the failure detection and 
>>>>>> cluster membership dissemination infrastructure of Cassandra from within 
>>>>>> my application. Is it possible to be notified of membership changes when 
>>>>>> embedding Cassandra? I could use a separate library to do this (say, 
>>>>>> with JGroups or Akka) but I fear that if this library and the embedded 
>>>>>> Cassandra instances disagrees, it could lead to subtle bugs.
>>>>>> 
>>>>>> Thanks,
>>>>>> Binil
>>>>>> 
>>>>>> PS: Cross-posted at 
>>>>>> http://stackoverflow.com/questions/35384983/forming-a-cluster-of-embedded-cassandra-instances
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> - John
> 
> 
> 
> -- 
> 
> - John

Re: Forming a cluster of embedded Cassandra instances

Reply via email to