Right, and subject to techniques for reducing that overhead that I listed. In fact, I would recommend simply picking the largest number of tokens for which the overhead is acceptable for your app, even if it is only 8 or 16 tokens, by 16, 32, or 64 may be sufficient for most apps.
-- Jack Krupansky On Mon, Feb 23, 2015 at 3:01 PM, Eric Stevens <migh...@gmail.com> wrote: > That link is the one from the 4.6 New Features page: > http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/newFeatures.html > > - Ability to use virtual nodes (vnodes) > > <http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/ana/anaNdeOps.html#anaNdeOps__implicationsVnodes> > in > Solr nodes. Recommended range: 64 to 256 (overhead increases by > approximately 30%) > > Anyway, thanks for clearing this up Jack. This overhead is on queries > only, right? > > > > On Mon, Feb 23, 2015 at 10:03 AM, Jack Krupansky <jack.krupan...@gmail.com > > wrote: > >> Thanks for pointing out a mistake in the doc - that statement (for >> Search/Solr) was simply a leftover from before 4.6. Besides, it's in the >> Analytics section, which is not relevant for Search/Solr anyway. >> >> -- Jack Krupansky >> >> On Mon, Feb 23, 2015 at 11:54 AM, Eric Stevens <migh...@gmail.com> wrote: >> >>> 30% overhead is pretty brutal. I think this is basic support for it, >>> and not necessarily a recommendation to use it. >>> >>> From >>> >>> http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/ana/anaNdeOps.html?scroll=anaNdeOps__implicationsVnodes >>> >>> *DataStax does not recommend turning on vnodes *for other Hadoop use >>> cases *or for Solr nodes*, but you can use vnodes for any >>> Cassandra-only cluster, or a Cassandra-only data center in a mixed >>> Hadoop/Solr/Cassandra deployment. If you have enabled virtual nodes on >>> Hadoop nodes, disable virtual nodes before using the cluster. >>> >>> >>> On Mon, Feb 23, 2015 at 9:34 AM, Jack Krupansky < >>> jack.krupan...@gmail.com> wrote: >>> >>>> DSE 4.6 improved Solr vnode performance dramatically, so that vnodes >>>> for Search workloads is now no longer officially discouraged. As per the >>>> official doc for improvements, : "*Ability to use virtual nodes >>>> (vnodes) in Solr nodes. Recommended range: 64 to 256 (overhead increases by >>>> approximately 30%)*". A vnode token count of 64 or 32 would reduce >>>> that overhead further. And... the new 4.6 feature of being able to direct a >>>> Solr query to a specific partition essentially eliminates that overhead >>>> entirely. >>>> >>>> -- Jack Krupansky >>>> >>>> On Mon, Feb 23, 2015 at 11:23 AM, Eric Stevens <migh...@gmail.com> >>>> wrote: >>>> >>>>> Vnodes is officially disrecommended for DSE Solr integration (though a >>>>> small number isn't ruinous). That might be why they still don't enable >>>>> them >>>>> by default. >>>>> On Feb 21, 2015 3:58 PM, "mck" <m...@apache.org> wrote: >>>>> >>>>>> At least the problem of hadoop and vnodes described in CASSANDRA-6091 >>>>>> doesn't apply to spark. >>>>>> (Spark already allows multiple token ranges per split). >>>>>> >>>>>> If this is the reason why DSE hasn't enabled vnodes then fingers >>>>>> crossed >>>>>> that'll change soon. >>>>>> >>>>>> >>>>>> > Some of the DataStax videos that I watched discussed how the >>>>>> Cassandra Spark connecter has >>>>>> > optimizations to deal with vnodes. >>>>>> >>>>>> >>>>>> Are these videos public? if so got any link to them? >>>>>> >>>>>> ~mck >>>>>> >>>>> >>>> >>> >> >