Hello!
I use giraph-hbase and write custom CustomHBaseTableInputFormat.
I want to apply some filters (like o.a.h.hbase.filter.RowFilter,
FamilyFilter etc) to get clear data after the "query". For example, I want
to get only vertex with specifying rowkey id. Is it possible?
I try to do it like this:
public class CustomHBaseTableInputFormat extends HBaseVertexInputFormat {
@Override
public VertexReader<Text, FloatWritable, FloatWritable>
createVertexReader(InputSplit split, TaskAttemptContext context) throws
IOException {
return new CustomHBaseReader (split, context);
}
// other methods to impliment
public static class CustomHBaseReader extends HBaseVertexReader {
public HBaseTableReader(InputSplit split, TaskAttemptContext
context) throws IOException {
super(split, context);
}
@Override
public void initialize(InputSplit inputSplit, TaskAttemptContext
context) throws IOException, InterruptedException {
super.initialize(inputSplit, context);
String startIdsRegexp = getStartVertexRegexp();
System.err.println("set row filter with regexp=" +
startIdsRegexp);
Filter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL,
new RegexStringComparator(startIdsRegexp));
Scan scan =
HBaseVertexInputFormat.BASE_FORMAT.getScan().setFilter(rowFilter);
System.err.println("scan=" + scan);
//super.initialize(inputSplit, context);
}
}
// other methods to impliment
}
Log says what scan contains my filter but all of dataset is read (without
applying any filters).
I know about vertexInputFilterClass property, but it filters after query
with a lot of unusable data.
What is a way to set filters correctly? Can I use o.a.h.hbase.filter
package for this? If yes, what do I wrong?