Hello Anton, What version of Cassandra are you using? If between 1.2.6 and 2.0.6 the setInputRange(startToken, endToken) is not working.
This was fixed in 2.0.7: https://issues.apache.org/jira/browse/CASSANDRA-6436 If you can't upgrade you can copy AbstractCFIF and CFIF to your project and apply the patch there. Cheers, Paulo On Wed, May 14, 2014 at 10:29 PM, Anton Brazhnyk <anton.brazh...@genesys.com > wrote: > Greetings, > > I'm reading data from C* with Spark (via ColumnFamilyInputFormat) and I'd > like to read just part of it - something like Spark's sample() function. > Cassandra's API seems allow to do it with its > ConfigHelper.setInputRange(jobConfiguration, startToken, endToken) method, > but it doesn't work. > The limit is just ignored and the entire column family is scanned. It > seems this kind of feature is just not supported > and sources of AbstractColumnFamilyInputFormat.getSplits confirm that > (IMO). > Questions: > 1. Am I right that there is no way to get some data limited by token range > with ColumnFamilyInputFormat? > 2. Is there other way to limit the amount of data read from Cassandra with > Spark and ColumnFamilyInputFormat, > so that this amount is predictable (like 5% of entire dataset)? > > > WBR, > Anton > > > -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br <http://www.chaordic.com.br/>* +55 48 3232.3200