Not sure what exactly is happening in your job. But in one of the delete jobs I wrote I was creating an instance of HTable in setup method of my mapper
delTab = new HTable(conf, conf.get(TABLE_NAME)); And performing delete in map() call using delTab. So no, you do not have access to table directly *usually*. -Shrijeet On Fri, Nov 2, 2012 at 12:47 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.org> wrote: > Sorry, one last question. > > On the map method, I have access to the row using the values > parameter. Now, based on the value content, I might want to delete it. > Do I have access to the table directly from one of the parameters? Or > should I call the delete using an HTableInterface from my pool? > > Thanks, > > JM > > 2012/11/2, Jean-Marc Spaggiari <jean-m...@spaggiari.org>: > > Yep, you perfectly got my question. > > > > I just tried and it's working perfectly! > > > > Thanks a lot! I now have a lot to play with. > > > > JM > > > > 2012/11/2, Shrijeet Paliwal <shrij...@rocketfuel.com>: > >> JM, > >> > >> I personally would chose to put it neither hadoop libs nor hbase libs. > >> Have > >> them go to your application's own install directory. > >> > >> Then you could sent the variable HADOOP_CLASSPATH to have your jar (also > >> include hbase jars, hbase dependencies and dependencies your program > >> needs) > >> And to execute fire 'hadoop jar' command. > >> > >> An example[1]: > >> > >> Set classpath: > >> export HADOOP_CLASSPATH=`hbase > classpath`:mycool.jar:mycooldependency.jar > >> > >> Fire following to launch your job: > >> hadoop jar mycool.jar hbase.experiments.MyCoolProgram > >> -Dmapred.running.map.limit=50 > >> -Dmapred.map.tasks.speculative.execution=false aCommandLineArg > >> > >> > >> Did I get your question right? > >> > >> [1] In the example I gave `hbase classpath` gets you set with all hbase > >> jars. > >> > >> > >> > >> On Fri, Nov 2, 2012 at 11:56 AM, Jean-Marc Spaggiari < > >> jean-m...@spaggiari.org> wrote: > >> > >>> Hi Shrijeet, > >>> > >>> Helped a lot! Thanks! > >>> > >>> Now, the only think I need is to know where's the best place to put my > >>> JAR on the server. Should I put it on the hadoop lib directory? Or > >>> somewhere on the HBase structure? > >>> > >>> Thanks, > >>> > >>> JM > >>> > >>> 2012/10/29, Shrijeet Paliwal <shrij...@rocketfuel.com>: > >>> > In line. > >>> > > >>> > On Mon, Oct 29, 2012 at 8:11 AM, Jean-Marc Spaggiari < > >>> > jean-m...@spaggiari.org> wrote: > >>> > > >>> >> I'm replying to myself ;) > >>> >> > >>> >> I found "cleanup" and "setup" methods from the TableMapper table. So > >>> >> I > >>> >> think those are the methods I was looking for. I will init the > >>> >> HTablePool there. Please let me know if I'm wrong. > >>> >> > >>> >> Now, I still have few other questions. > >>> >> > >>> >> 1) context.getCurrentValue() can throw a InterrruptedException, but > >>> >> when can this occur? Is there a timeout on the Mapper side? Of it's > >>> >> if > >>> >> the region is going down while the job is running? > >>> >> > >>> > > >>> > You do not need to call context.getCurrentValue(). The 'value' > >>> > argument > >>> to > >>> > map method[1] has the information you are looking for. > >>> > > >>> > > >>> >> 2) How can I pass parameters to the Map method? Can I use > >>> >> job.getConfiguration().put to add some properties there, can get > them > >>> >> back in context.getConfiguration.get? > >>> >> > >>> > > >>> > Yes, thats how it is done. > >>> > > >>> > > >>> >> 3) What's the best way to log results/exceptions/traces from the map > >>> >> method? > >>> >> > >>> > > >>> > In most cases, you'll have mapper and reducer classes as nested > static > >>> > classes within some enclosing class. You can get handle to the Logger > >>> from > >>> > the enclosing class and do your usual LOG.info, LOG.warn yada yada. > >>> > > >>> > Hope it helps. > >>> > > >>> > [1] map(KEYIN key, *VALUEIN value*, Context context) > >>> > > >>> >> > >>> >> I will search on my side, but some help will be welcome because it > >>> >> seems there is not much documentation when we start to dig a bit :( > >>> >> > >>> >> JM > >>> >> > >>> >> 2012/10/27, Jean-Marc Spaggiari <jean-m...@spaggiari.org>: > >>> >> > Hi, > >>> >> > > >>> >> > I'm thinking about my firs MapReduce class and I have some > >>> >> > questions. > >>> >> > > >>> >> > The goal of it will be to move some rows from one table to another > >>> >> > one > >>> >> > based on the timestamp only. > >>> >> > > >>> >> > Since this is pretty new for me, I'm starting from the RowCounter > >>> >> > class to have a baseline. > >>> >> > > >>> >> > There are few things I will have to update. First, the > >>> >> > createSumittableJob method to get timestamp range instead of key > >>> >> > range, and "play2 with the parameters. This part is fine. > >>> >> > > >>> >> > Next, I need to update the map method, and this is where I have > >>> >> > some > >>> >> > questions. > >>> >> > > >>> >> > I'm able to find the timestamp of all the cf:c from the > >>> >> > context.getCurrentValue() method, that's fine. Now, my concern is > >>> >> > on > >>> >> > the way to get access to the table to store this field, and the > >>> >> > table > >>> >> > to delete it. Should I instantiate an HTable for the source table, > >>> >> > and > >>> >> > execute and delete on it, then do an insert on another HTable > >>> >> > instance? Should I use an HTablePool? Also, since I’m already on > >>> >> > the > >>> >> > row, can’t I just mark it as deleted instead of calling a new > >>> >> > HTable? > >>> >> > > >>> >> > Also, instead of calling the delete and put one by one, I would > >>> >> > like > >>> >> > to put them on a list and execute it only when it’s over 10 > >>> >> > members. > >>> >> > How can I make sure that at the end of the job, this is flushed? > >>> >> > Else, > >>> >> > I will lose some operations. Is there a kind of “dispose” method > >>> >> > called on the region when the job is done? > >>> >> > > >>> >> > Thanks, > >>> >> > > >>> >> > JM > >>> >> > > >>> >> > >>> > > >>> > >> > > >