Re: Why no virtual nodes for Cassandra on EC2?

Jack Krupansky Mon, 23 Feb 2015 12:13:46 -0800

Right, and subject to techniques for reducing that overhead that I listed.
In fact, I would recommend simply picking the largest number of tokens for
which the overhead is acceptable for your app, even if it is only 8 or 16
tokens, by 16, 32, or 64 may be sufficient for most apps.


-- Jack Krupansky

On Mon, Feb 23, 2015 at 3:01 PM, Eric Stevens <migh...@gmail.com> wrote:

> That link is the one from the 4.6 New Features page:
> http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/newFeatures.html
>
>    - Ability to use virtual nodes (vnodes)
>    
> <http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/ana/anaNdeOps.html#anaNdeOps__implicationsVnodes>
>  in
>    Solr nodes. Recommended range: 64 to 256 (overhead increases by
>    approximately 30%)
>
> Anyway, thanks for clearing this up Jack.  This overhead is on queries
> only, right?
>
>
>
> On Mon, Feb 23, 2015 at 10:03 AM, Jack Krupansky <jack.krupan...@gmail.com
> > wrote:
>
>> Thanks for pointing out a mistake in the doc - that statement (for
>> Search/Solr) was simply a leftover from before 4.6. Besides, it's in the
>> Analytics section, which is not relevant for Search/Solr anyway.
>>
>> -- Jack Krupansky
>>
>> On Mon, Feb 23, 2015 at 11:54 AM, Eric Stevens <migh...@gmail.com> wrote:
>>
>>> 30% overhead is pretty brutal.  I think this is basic support for it,
>>> and not necessarily a recommendation to use it.
>>>
>>> From
>>>
>>> http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/ana/anaNdeOps.html?scroll=anaNdeOps__implicationsVnodes
>>>
>>> *DataStax does not recommend turning on vnodes *for other Hadoop use
>>> cases *or for Solr nodes*, but you can use vnodes for any
>>> Cassandra-only cluster, or a Cassandra-only data center in a mixed
>>> Hadoop/Solr/Cassandra deployment. If you have enabled virtual nodes on
>>> Hadoop nodes, disable virtual nodes before using the cluster.
>>>
>>>
>>> On Mon, Feb 23, 2015 at 9:34 AM, Jack Krupansky <
>>> jack.krupan...@gmail.com> wrote:
>>>
>>>> DSE 4.6 improved Solr vnode performance dramatically, so that vnodes
>>>> for Search workloads is now no longer officially discouraged. As per the
>>>> official doc for improvements, : "*Ability to use virtual nodes
>>>> (vnodes) in Solr nodes. Recommended range: 64 to 256 (overhead increases by
>>>> approximately 30%)*". A vnode token count of 64 or 32 would reduce
>>>> that overhead further. And... the new 4.6 feature of being able to direct a
>>>> Solr query to a specific partition essentially eliminates that overhead
>>>> entirely.
>>>>
>>>> -- Jack Krupansky
>>>>
>>>> On Mon, Feb 23, 2015 at 11:23 AM, Eric Stevens <migh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Vnodes is officially disrecommended for DSE Solr integration (though a
>>>>> small number isn't ruinous). That might be why they still don't enable 
>>>>> them
>>>>> by default.
>>>>> On Feb 21, 2015 3:58 PM, "mck" <m...@apache.org> wrote:
>>>>>
>>>>>> At least the problem of hadoop and vnodes described in CASSANDRA-6091
>>>>>> doesn't apply to spark.
>>>>>>  (Spark already allows multiple token ranges per split).
>>>>>>
>>>>>> If this is the reason why DSE hasn't enabled vnodes then fingers
>>>>>> crossed
>>>>>> that'll change soon.
>>>>>>
>>>>>>
>>>>>> > Some of the DataStax videos that I watched discussed how the
>>>>>> Cassandra Spark connecter has
>>>>>> > optimizations to deal with vnodes.
>>>>>>
>>>>>>
>>>>>> Are these videos public? if so got any link to them?
>>>>>>
>>>>>> ~mck
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Why no virtual nodes for Cassandra on EC2?

Reply via email to