If you are using replication factor 1 and 3 cassandra nodes, 256 virtual
nodes should be evenly distributed on 3 nodes. So there are totally 256
virtual nodes. But in your experiment, you saw 3*257 mapper. Is that
because of the setting cassandra.input.split.size=3? It is nothing with
node number=3. Otherwise, I am confused why there are 256 virtual nodes on
every cassandra node.

On Wed, Jan 28, 2015 at 12:29 AM, Shenghua(Daniel) Wan <
wansheng...@gmail.com> wrote:

> I did another experiment to verify indeed 3*257 (1 of 257 ranges is null
> effectively) mappers were created.
>
> Thanks mcm for the information !
>
> On Wed, Jan 28, 2015 at 12:17 AM, mck <m...@apache.org> wrote:
>
>> Shenghua,
>>
>> > The problem is the user might only want all the data via a "select *"
>> > like statement. It seems that 257 connections to query the rows are
>> necessary.
>> > However, is there any way to prohibit 257 concurrent connections?
>>
>>
>> Your reasoning is correct.
>> The number of connections should be tunable via the
>> "cassandra.input.split.size" property. See
>> ConfigHelper.setInputSplitSize(..)
>>
>> The problem is that vnodes completely trashes this, since splits
>> returned don't span across vnodes.
>> There's an issue out for this –
>> https://issues.apache.org/jira/browse/CASSANDRA-6091
>>  but part of the problem is that the thrift stuff involved here is
>>  getting rewritten¹ to be pure cql.
>>
>> In the meantime you override the CqlInputFormat and manually re-merge
>> splits together, where location sets match, so to better honour
>> inputSplitSize and to return to a more reasonable number of connections.
>> We do this, using code similar to this patch
>> https://github.com/michaelsembwever/cassandra/pull/2/files
>>
>> ~mck
>>
>> ¹ https://issues.apache.org/jira/browse/CASSANDRA-8358
>>
>
>
>
> --
>
> Regards,
> Shenghua (Daniel) Wan
>

Reply via email to