> … my understanding was that > performance of Hadoop jobs on C* clusters with vnodes was poor because a > given Hadoop input split has to run many individual scans (one for each > vnode) rather than just a single scan. I've run C* and Hadoop in > production with a custom input format that used vnodes (and just combined > multiple vnodes in a single input split) and didn't have any issues (the > jobs had many other performance bottlenecks besides starting multiple > scans from C*).
You've described the ticket, and how it has been solved :-) > This is one of the videos where I recall an off-hand mention of the Spark > connector working with vnodes: > https://www.youtube.com/watch?v=1NtnrdIUlg0 Thanks. ~mck