You can iterate over them, just make sure to set a sensible row count to chunk 
things up.
See 
http://www.datastax.com/docs/1.2/cql_cli/using/paging#non-ordered-partitioner-paging

You can also break up the processing so only one worker reads the token ranges 
for a node. That allows you to 
process the rows in parallel and avoid workers processing the same rows. 

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/05/2013, at 2:51 AM, Robert Wille <rwi...@footnote.com> wrote:

> Iterating through lots of records is not a primary use of my data.
> However, there are a number scenarios where scanning the entire contents
> of a column family is an interesting and useful exercise. Here are a few:
> removal of orphaned records, checking the integrity a data set, and
> analytics.
> 
> On 5/12/13 3:41 AM, "Oleg Dulin" <oleg.du...@gmail.com> wrote:
> 
>> On 2013-05-11 14:42:32 +0000, Robert Wille said:
>> 
>>> I'm using the JDBC driver to access Cassandra. I'm wondering if its
>>> possible to iterate through a large number of records (e.g. to perform
>>> maintenance on a large column family). I tried calling
>>> Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
>>> ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that
>>> cursors aren't supported. Is there another way to do this, or do I need
>>> to
>>> use a different API?
>>> 
>>> Thanks in advance
>>> 
>>> Robert
>> 
>> If you feel that you need to iterate through a large number of rows
>> then you are probably not using a correct data model.
>> 
>> Can you describe your use case ?
>> 
>> -- 
>> Regards,
>> Oleg Dulin
>> NYC Java Big Data Engineer
>> http://www.olegdulin.com/
>> 
>> 
> 
> 

Reply via email to