Re: Question about MapReduce

Shrijeet Paliwal Fri, 02 Nov 2012 12:52:05 -0700

Not sure what exactly is happening in your job. But in one of the delete
jobs I wrote I was creating an instance of HTable in setup method of my
mapper


delTab = new HTable(conf, conf.get(TABLE_NAME));

And performing delete in map() call using delTab. So no, you do not have
access to table directly *usually*.


-Shrijeet


On Fri, Nov 2, 2012 at 12:47 PM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> Sorry, one last question.
>
> On the map method, I have access to the row using the values
> parameter. Now, based on the value content, I might want to delete it.
> Do I have access to the table directly from one of the parameters? Or
> should I call the delete using an HTableInterface from my pool?
>
> Thanks,
>
> JM
>
> 2012/11/2, Jean-Marc Spaggiari <jean-m...@spaggiari.org>:
> > Yep, you perfectly got my question.
> >
> > I just tried and it's working perfectly!
> >
> > Thanks a lot! I now have a lot to play with.
> >
> > JM
> >
> > 2012/11/2, Shrijeet Paliwal <shrij...@rocketfuel.com>:
> >> JM,
> >>
> >> I personally would chose to put it neither hadoop libs nor hbase libs.
> >> Have
> >> them go to your application's own install directory.
> >>
> >> Then you could sent the variable HADOOP_CLASSPATH to have your jar (also
> >> include hbase jars, hbase dependencies and dependencies your program
> >> needs)
> >> And to execute fire 'hadoop jar' command.
> >>
> >> An example[1]:
> >>
> >> Set classpath:
> >> export HADOOP_CLASSPATH=`hbase
> classpath`:mycool.jar:mycooldependency.jar
> >>
> >> Fire following to launch your job:
> >> hadoop jar mycool.jar hbase.experiments.MyCoolProgram
> >> -Dmapred.running.map.limit=50
> >> -Dmapred.map.tasks.speculative.execution=false aCommandLineArg
> >>
> >>
> >> Did I get your question right?
> >>
> >> [1] In the example I gave `hbase classpath` gets you set with all hbase
> >> jars.
> >>
> >>
> >>
> >> On Fri, Nov 2, 2012 at 11:56 AM, Jean-Marc Spaggiari <
> >> jean-m...@spaggiari.org> wrote:
> >>
> >>> Hi Shrijeet,
> >>>
> >>> Helped a lot! Thanks!
> >>>
> >>> Now, the only think I need is to know where's the best place to put my
> >>> JAR on the server. Should I put it on the hadoop lib directory? Or
> >>> somewhere on the HBase structure?
> >>>
> >>> Thanks,
> >>>
> >>> JM
> >>>
> >>> 2012/10/29, Shrijeet Paliwal <shrij...@rocketfuel.com>:
> >>> > In line.
> >>> >
> >>> > On Mon, Oct 29, 2012 at 8:11 AM, Jean-Marc Spaggiari <
> >>> > jean-m...@spaggiari.org> wrote:
> >>> >
> >>> >> I'm replying to myself ;)
> >>> >>
> >>> >> I found "cleanup" and "setup" methods from the TableMapper table. So
> >>> >> I
> >>> >> think those are the methods I was looking for. I will init the
> >>> >> HTablePool there. Please let me know if I'm wrong.
> >>> >>
> >>> >> Now, I still have few other questions.
> >>> >>
> >>> >> 1) context.getCurrentValue() can throw a InterrruptedException, but
> >>> >> when can this occur? Is there a timeout on the Mapper side? Of it's
> >>> >> if
> >>> >> the region is going down while the job is running?
> >>> >>
> >>> >
> >>> > You do not need to call  context.getCurrentValue(). The 'value'
> >>> > argument
> >>> to
> >>> > map method[1] has the information you are looking for.
> >>> >
> >>> >
> >>> >> 2) How can I pass parameters to the Map method? Can I use
> >>> >> job.getConfiguration().put to add some properties there, can get
> them
> >>> >> back in context.getConfiguration.get?
> >>> >>
> >>> >
> >>> > Yes, thats how it is done.
> >>> >
> >>> >
> >>> >> 3) What's the best way to log results/exceptions/traces from the map
> >>> >> method?
> >>> >>
> >>> >
> >>> > In most cases, you'll have mapper and reducer classes as nested
> static
> >>> > classes within some enclosing class. You can get handle to the Logger
> >>> from
> >>> > the enclosing class and do your usual LOG.info, LOG.warn yada yada.
> >>> >
> >>> > Hope it helps.
> >>> >
> >>> > [1] map(KEYIN key, *VALUEIN value*, Context context)
> >>> >
> >>> >>
> >>> >> I will search on my side, but some help will be welcome because it
> >>> >> seems there is not much documentation when we start to dig a bit :(
> >>> >>
> >>> >> JM
> >>> >>
> >>> >> 2012/10/27, Jean-Marc Spaggiari <jean-m...@spaggiari.org>:
> >>> >> > Hi,
> >>> >> >
> >>> >> > I'm thinking about my firs MapReduce class and I have some
> >>> >> > questions.
> >>> >> >
> >>> >> > The goal of it will be to move some rows from one table to another
> >>> >> > one
> >>> >> > based on the timestamp only.
> >>> >> >
> >>> >> > Since this is pretty new for me, I'm starting from the RowCounter
> >>> >> > class to have a baseline.
> >>> >> >
> >>> >> > There are few things I will have to update. First, the
> >>> >> > createSumittableJob method to get timestamp range instead of key
> >>> >> > range, and "play2 with the parameters. This part is fine.
> >>> >> >
> >>> >> > Next, I need to update the map method, and this is where I have
> >>> >> > some
> >>> >> > questions.
> >>> >> >
> >>> >> > I'm able to find the timestamp of all the cf:c from the
> >>> >> > context.getCurrentValue() method, that's fine. Now, my concern is
> >>> >> > on
> >>> >> > the way to get access to the table to store this field, and the
> >>> >> > table
> >>> >> > to delete it. Should I instantiate an HTable for the source table,
> >>> >> > and
> >>> >> > execute and delete on it, then do an insert on another HTable
> >>> >> > instance?  Should I use an HTablePool? Also, since I’m already on
> >>> >> > the
> >>> >> > row, can’t I just mark it as deleted instead of calling a new
> >>> >> > HTable?
> >>> >> >
> >>> >> > Also, instead of calling the delete and put one by one, I would
> >>> >> > like
> >>> >> > to put them on a list and execute it only when it’s over 10
> >>> >> > members.
> >>> >> > How can I make sure that at the end of the job, this is flushed?
> >>> >> > Else,
> >>> >> > I will lose some operations. Is there a kind of “dispose” method
> >>> >> > called on the region when the job is done?
> >>> >> >
> >>> >> > Thanks,
> >>> >> >
> >>> >> > JM
> >>> >> >
> >>> >>
> >>> >
> >>>
> >>
> >
>

Re: Question about MapReduce

Reply via email to