CqlPagingInputFormat: paging through wide rows

Paolo Estrella Wed, 16 Apr 2014 05:34:10 -0700

Hello,

I've just upgraded to Cassandra 1.2.16. I've also started using the
CqlPagingInputFormat within my map/reduce tasks.


I have a question with regard to using CqlPagingInputFormat for paging
through wide rows. I don't see a way to input more than one column at a
time into my Mapper.

I suppose a good way to explain is by comparing the
CqlPagingInputFormatwith the
ColumnFamilyInputFormat which I previously used.

My mapper when using CFIF looks like this (just the relevant bits):

@Override
protected void map(ByteBuffer key, SortedMap<ByteBuffer, IColumn> columns,
Context context) throws IOException, InterruptedException {
    for (IColumn column : columns.values()) {
        String value = ByteBufferUtil.string(column.value());
        /* do interesting stuff with each column value */
    }
}

My mapper when using CPIF looks like this (again, just the relevant bits):

@Override
protected void map(Map<String, ByteBuffer> key, Map<String, ByteBuffer>
columns, Context context) throws IOException, InterruptedException {
    UUID name = UUIDSerializer.get().fromByteBuffer(columns.get("column1"));
    String value = ByteBufferUtil.string(columns.get("value"));
    /* do something interesting with the value */
}

In the case of CqlPagingInputFormat, the mapper receives each column (in
the wide row) one by one. Is there a way to receive a larger batch of
columns similar to using ColumnFamilyInputFormat with a column slice
predicate? Perhaps I need to specify a WHERE clause when using CPIF?

Does it even matter that my mappers are receiving only one column at a
time? I did notice that my map tasks take a significantly longer time
completing when using CqlPagingInputFormat (4x mappers receiving about 3
million input records each) than when using ColumnFamilyInputFormat with a
large column slice predicate.


Thanks in advance.

Regards,
Paolo

CqlPagingInputFormat: paging through wide rows

Reply via email to