Hello! I use giraph-hbase and write custom CustomHBaseTableInputFormat. I want to apply some filters (like o.a.h.hbase.filter.RowFilter, FamilyFilter etc) to get clear data after the "query". For example, I want to get only vertex with specifying rowkey id. Is it possible? I try to do it like this: public class CustomHBaseTableInputFormat extends HBaseVertexInputFormat { @Override public VertexReader<Text, FloatWritable, FloatWritable> createVertexReader(InputSplit split, TaskAttemptContext context) throws IOException { return new CustomHBaseReader (split, context); } // other methods to impliment
public static class CustomHBaseReader extends HBaseVertexReader { public HBaseTableReader(InputSplit split, TaskAttemptContext context) throws IOException { super(split, context); } @Override public void initialize(InputSplit inputSplit, TaskAttemptContext context) throws IOException, InterruptedException { super.initialize(inputSplit, context); String startIdsRegexp = getStartVertexRegexp(); System.err.println("set row filter with regexp=" + startIdsRegexp); Filter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, new RegexStringComparator(startIdsRegexp)); Scan scan = HBaseVertexInputFormat.BASE_FORMAT.getScan().setFilter(rowFilter); System.err.println("scan=" + scan); //super.initialize(inputSplit, context); } } // other methods to impliment } Log says what scan contains my filter but all of dataset is read (without applying any filters). I know about vertexInputFilterClass property, but it filters after query with a lot of unusable data. What is a way to set filters correctly? Can I use o.a.h.hbase.filter package for this? If yes, what do I wrong?