Did you see the " Message to log?" log ? Can you pastebin the error / exception you got ?
On Mon, Apr 18, 2016 at 1:54 AM, Ivan Cores gonzalez <ivan.co...@inria.fr> wrote: > > > Hi Ted, > So, If I understand the behaviour of getSplits(), I can create "virtual" > splits overriding the getSplits function. > I was performing some tests, but my code crash in runtime and I cannot > found the problem. > Any help? I didn't find examples. > > > public class SimpleRowCounter extends Configured implements Tool { > > static class RowCounterMapper extends > TableMapper<ImmutableBytesWritable, Result> { > public static enum Counters { ROWS } > @Override > public void map(ImmutableBytesWritable row, Result value, Context > context) { > context.getCounter(Counters.ROWS).increment(1); > try { > Thread.sleep(3000); //Simulates work > } catch (InterruptedException name) { } > } > } > > public class MyTableInputFormat extends TableInputFormat { > @Override > public List<InputSplit> getSplits(JobContext context) throws > IOException { > //Just to detect if this method is being called ... > List<InputSplit> splits = super.getSplits(context); > System.out.printf(" Message to log? \n" ); > return splits; > } > } > > @Override > public int run(String[] args) throws Exception { > if (args.length != 1) { > System.err.println("Usage: SimpleRowCounter <tablename>"); > return -1; > } > String tableName = args[0]; > > Scan scan = new Scan(); > scan.setFilter(new FirstKeyOnlyFilter()); > scan.setCaching(500); > scan.setCacheBlocks(false); > > Job job = new Job(getConf(), getClass().getSimpleName()); > job.setJarByClass(getClass()); > > TableMapReduceUtil.initTableMapperJob(tableName, scan, > RowCounterMapper.class, > ImmutableBytesWritable.class, Result.class, job, true, > MyTableInputFormat.class); > > job.setNumReduceTasks(0); > job.setOutputFormatClass(NullOutputFormat.class); > return job.waitForCompletion(true) ? 0 : 1; > } > > public static void main(String[] args) throws Exception { > int exitCode = ToolRunner.run(HBaseConfiguration.create(), > new SimpleRowCounter(), args); > System.exit(exitCode); > } > } > > Thanks so much, > Iván. > > > > > ----- Mensaje original ----- > > De: "Ted Yu" <yuzhih...@gmail.com> > > Para: user@hbase.apache.org > > Enviados: Martes, 12 de Abril 2016 17:29:52 > > Asunto: Re: Processing rows in parallel with MapReduce jobs. > > > > Please take a look at TableInputFormatBase#getSplits() : > > > > * Calculates the splits that will serve as input for the map tasks. > The > > > > * number of splits matches the number of regions in a table. > > > > Each mapper would be reading one of the regions. > > > > On Tue, Apr 12, 2016 at 8:18 AM, Ivan Cores gonzalez < > ivan.co...@inria.fr> > > wrote: > > > > > Hi Ted, > > > Yes, I mean same region. > > > > > > I wasn't using the getSplits() function. I'm trying to add it to my > code > > > but I'm not sure how I have to do it. Is there any example in the > website? > > > I can not find anything. (By the way, I'm using TableInputFormat, not > > > InputFormat) > > > > > > But just to confirm, with the getSplits() function, Are mappers > processing > > > rows in the same region executed in parallel? (assuming that there are > > > empty > > > processors/cores) > > > > > > Thanks, > > > Ivan. > > > > > > > > > ----- Mensaje original ----- > > > > De: "Ted Yu" <yuzhih...@gmail.com> > > > > Para: user@hbase.apache.org > > > > Enviados: Lunes, 11 de Abril 2016 15:10:29 > > > > Asunto: Re: Processing rows in parallel with MapReduce jobs. > > > > > > > > bq. if they are located in the same split? > > > > > > > > Probably you meant same region. > > > > > > > > Can you show the getSplits() for the InputFormat of your MapReduce > job ? > > > > > > > > Thanks > > > > > > > > On Mon, Apr 11, 2016 at 5:48 AM, Ivan Cores gonzalez < > > > ivan.co...@inria.fr> > > > > wrote: > > > > > > > > > Hi all, > > > > > > > > > > I have a small question regarding the MapReduce jobs behaviour with > > > HBase. > > > > > > > > > > I have a HBase test table with only 8 rows. I splitted the table > with > > > the > > > > > hbase shell > > > > > split command into 2 splits. So now there are 4 rows in every > split. > > > > > > > > > > I create a MapReduce job that only prints the row key in the log > files. > > > > > When I run the MapReduce job, every row is processed by 1 mapper. > But > > > the > > > > > mappers > > > > > in the same split are executed sequentially (inside the same > > > container). > > > > > That means, > > > > > the first four rows are processed sequentially by 4 mappers. The > system > > > > > has cores > > > > > that are free, so is it possible to process rows in parallel if > they > > > are > > > > > located > > > > > in the same split? > > > > > > > > > > The only way I found to have 8 mappers executed in parallel is > split > > > the > > > > > table > > > > > in 8 splits (1 split per row). But obviously this is not the best > > > solution > > > > > for > > > > > big tables ... > > > > > > > > > > Thanks, > > > > > Ivan. > > > > > > > > > > > > > > >