Hi mck,

I'm not familiar with this ticket, but my understanding was that
performance of Hadoop jobs on C* clusters with vnodes was poor because a
given Hadoop input split has to run many individual scans (one for each
vnode) rather than just a single scan.  I've run C* and Hadoop in
production with a custom input format that used vnodes (and just combined
multiple vnodes in a single input split) and didn't have any issues (the
jobs had many other performance bottlenecks besides starting multiple scans
from C*).

This is one of the videos where I recall an off-hand mention of the Spark
connector working with vnodes: https://www.youtube.com/watch?v=1NtnrdIUlg0

Best regards,
Clint




On Sat, Feb 21, 2015 at 2:58 PM, mck <m...@apache.org> wrote:

> At least the problem of hadoop and vnodes described in CASSANDRA-6091
> doesn't apply to spark.
>  (Spark already allows multiple token ranges per split).
>
> If this is the reason why DSE hasn't enabled vnodes then fingers crossed
> that'll change soon.
>
>
> > Some of the DataStax videos that I watched discussed how the Cassandra
> Spark connecter has
> > optimizations to deal with vnodes.
>
>
> Are these videos public? if so got any link to them?
>
> ~mck
>

Reply via email to