You can iterate over them, just make sure to set a sensible row count to chunk things up. See http://www.datastax.com/docs/1.2/cql_cli/using/paging#non-ordered-partitioner-paging
You can also break up the processing so only one worker reads the token ranges for a node. That allows you to process the rows in parallel and avoid workers processing the same rows. Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 13/05/2013, at 2:51 AM, Robert Wille <rwi...@footnote.com> wrote: > Iterating through lots of records is not a primary use of my data. > However, there are a number scenarios where scanning the entire contents > of a column family is an interesting and useful exercise. Here are a few: > removal of orphaned records, checking the integrity a data set, and > analytics. > > On 5/12/13 3:41 AM, "Oleg Dulin" <oleg.du...@gmail.com> wrote: > >> On 2013-05-11 14:42:32 +0000, Robert Wille said: >> >>> I'm using the JDBC driver to access Cassandra. I'm wondering if its >>> possible to iterate through a large number of records (e.g. to perform >>> maintenance on a large column family). I tried calling >>> Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, >>> ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that >>> cursors aren't supported. Is there another way to do this, or do I need >>> to >>> use a different API? >>> >>> Thanks in advance >>> >>> Robert >> >> If you feel that you need to iterate through a large number of rows >> then you are probably not using a correct data model. >> >> Can you describe your use case ? >> >> -- >> Regards, >> Oleg Dulin >> NYC Java Big Data Engineer >> http://www.olegdulin.com/ >> >> > >