Re: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

Paulo Ricardo Motta Gomes Fri, 16 May 2014 12:31:35 -0700

Hello Anton,

What version of Cassandra are you using? If between 1.2.6 and 2.0.6 the
setInputRange(startToken, endToken) is not working.


This was fixed in 2.0.7:
https://issues.apache.org/jira/browse/CASSANDRA-6436

If you can't upgrade you can copy AbstractCFIF and CFIF to your project and
apply the patch there.

Cheers,

Paulo


On Wed, May 14, 2014 at 10:29 PM, Anton Brazhnyk <anton.brazh...@genesys.com
> wrote:

> Greetings,
>
> I'm reading data from C* with Spark (via ColumnFamilyInputFormat) and I'd
> like to read just part of it - something like Spark's sample() function.
> Cassandra's API seems allow to do it with its
> ConfigHelper.setInputRange(jobConfiguration, startToken, endToken) method,
> but it doesn't work.
> The limit is just ignored and the entire column family is scanned. It
> seems this kind of feature is just not supported
> and sources of AbstractColumnFamilyInputFormat.getSplits confirm that
> (IMO).
> Questions:
> 1. Am I right that there is no way to get some data limited by token range
> with ColumnFamilyInputFormat?
> 2. Is there other way to limit the amount of data read from Cassandra with
> Spark and ColumnFamilyInputFormat,
> so that this amount is predictable (like 5% of entire dataset)?
>
>
> WBR,
> Anton
>
>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200

Re: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

Reply via email to