BTW are the performance concerns with vnodes a big deal for Spark?  Or were
those more important for MapReduce?  Some of the DataStax videos that I
watched discussed how the Cassandra Spark connecter has optimizations to
deal with vnodes.

I would imagine that Spark's ability to cache RDDs would mean that paying a
small efficiency cost when reading data out of Cassandra initially might
not be the end of the world (especially given the benefits of using vnodes).

On Fri, Feb 20, 2015 at 8:29 AM, Clint Kelly <clint.ke...@gmail.com> wrote:

> Hi Mark,
>
> Thanks for your reply.  That makes sense.  I recall looking at this
> back when we were going to run Hadoop against data in Cassandra tables
> at my previous company.
>
> Disabling virtual nodes seems unfortunate as it would make (as I
> understand it) scaling the cluster a lot trickier.  I assume there is
> a tradeoff between the performance of analytics jobs and the ease with
> which you can change cluster size.
>
> -Clint
>
>
>
> On Fri, Feb 20, 2015 at 1:01 AM, Mark Reddy <mark.l.re...@gmail.com>
> wrote:
> > Hey Clint,
> >
> > Someone for DataStax can correct me here, but I'm assuming that they have
> > disabled vnodes because the AMI is built to make it easy to set up a
> > pre-configured mixed workload cluster. A mixture of
> Real-Time/Transactional
> > (Cassandra), Analytics (Hadoop), or Search (Solr). If you take a look at
> the
> > getting started guide for both Hadoop and Solr you will see a paragraph
> > instructing the user to disable vnodes for a mix workload cluster.
> >
> >
> http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/srch/srchIntro.html
> >
> http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/ana/anaStrt.html
> >
> > This is specific to the example AMI and that type of workload. This is
> by no
> > means a warning for users to disable vnodes on their
> Real-Time/Transactional
> > Cassandra only clusters on EC2.
> >
> >
> > I've used vnodes on EC2 without issue.
> >
> > Regards,
> > Mark
> >
> > On 20 February 2015 at 05:08, Clint Kelly <clint.ke...@gmail.com> wrote:
> >>
> >> Hi all,
> >>
> >> The guide for installing Cassandra on EC2 says that
> >>
> >> "Note: The DataStax AMI does not install DataStax Enterprise nodes
> >> with virtual nodes enabled."
> >>
> >>
> >>
> http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/install/installAMI.html
> >>
> >> Just curious why this is the case.  It was my understanding that
> >> virtual nodes make taking Cassandra nodes on and offline an easier
> >> process, and that seems like something that an EC2 user would want to
> >> do quite frequently.
> >>
> >> -Clint
> >
> >
>

Reply via email to