> … my understanding was that
> performance of Hadoop jobs on C* clusters with vnodes was poor because a
> given Hadoop input split has to run many individual scans (one for each
> vnode) rather than just a single scan.  I've run C* and Hadoop in
> production with a custom input format that used vnodes (and just combined
> multiple vnodes in a single input split) and didn't have any issues (the
> jobs had many other performance bottlenecks besides starting multiple
> scans from C*).

You've described the ticket, and how it has been solved :-)

> This is one of the videos where I recall an off-hand mention of the Spark
> connector working with vnodes:
> https://www.youtube.com/watch?v=1NtnrdIUlg0

Thanks.

~mck

Reply via email to