Hello, I've just upgraded to Cassandra 1.2.16. I've also started using the CqlPagingInputFormat within my map/reduce tasks.
I have a question with regard to using CqlPagingInputFormat for paging through wide rows. I don't see a way to input more than one column at a time into my Mapper. I suppose a good way to explain is by comparing the CqlPagingInputFormatwith the ColumnFamilyInputFormat which I previously used. My mapper when using CFIF looks like this (just the relevant bits): @Override protected void map(ByteBuffer key, SortedMap<ByteBuffer, IColumn> columns, Context context) throws IOException, InterruptedException { for (IColumn column : columns.values()) { String value = ByteBufferUtil.string(column.value()); /* do interesting stuff with each column value */ } } My mapper when using CPIF looks like this (again, just the relevant bits): @Override protected void map(Map<String, ByteBuffer> key, Map<String, ByteBuffer> columns, Context context) throws IOException, InterruptedException { UUID name = UUIDSerializer.get().fromByteBuffer(columns.get("column1")); String value = ByteBufferUtil.string(columns.get("value")); /* do something interesting with the value */ } In the case of CqlPagingInputFormat, the mapper receives each column (in the wide row) one by one. Is there a way to receive a larger batch of columns similar to using ColumnFamilyInputFormat with a column slice predicate? Perhaps I need to specify a WHERE clause when using CPIF? Does it even matter that my mappers are receiving only one column at a time? I did notice that my map tasks take a significantly longer time completing when using CqlPagingInputFormat (4x mappers receiving about 3 million input records each) than when using ColumnFamilyInputFormat with a large column slice predicate. Thanks in advance. Regards, Paolo