Re: coprocessor enabled put very slow, help please~~~

Andrew Purtell Tue, 19 Feb 2013 12:05:31 -0800

A coprocessor is some code running in a server process. The resources
available and rules of the road are different from client side programming.
HTablePool (and HTable in general) is problematic for server side
programming in my opinion: http://search-hadoop.com/m/XtAi5Fogw32 Since
this comes up now and again seems like a lightweight alternative for server
side IPC could be useful.



On Tue, Feb 19, 2013 at 7:15 AM, Wei Tan <w...@us.ibm.com> wrote:

> A side question: if HTablePool is not encouraged to be used... how we
> handle the thread safeness in using HTable? Any replacement for
> HTablePool, in plan?
> Thanks,
>
>
> Best Regards,
> Wei
>
>
>
>
> From:   Michel Segel <michael_se...@hotmail.com>
> To:     "user@hbase.apache.org" <user@hbase.apache.org>,
> Date:   02/18/2013 09:23 AM
> Subject:        Re: coprocessor enabled put very slow, help please~~~
>
>
>
> Why are you using an HTable Pool?
> Why are you closing the table after each iteration through?
>
> Try using 1 HTable object. Turn off WAL
> Initiate in start()
> Close in Stop()
> Surround the use in a try / catch
> If exception caught, re instantiate new HTable connection.
>
> Maybe want to flush the connection after puts.
>
>
> Again not sure why you are using check and put on the base table. Your
> count could be off.
>
> As an example look at poem/rhyme 'Marry had a little lamb'.
> Then check your word count.
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Feb 18, 2013, at 7:21 AM, prakash kadel <prakash.ka...@gmail.com>
> wrote:
>
> > Thank you guys for your replies,
> > Michael,
> >   I think i didnt make it clear. Here is my use case,
> >
> > I have text documents to insert in the hbase. (With possible duplicates)
> > Suppose i have a document as : " I am working. He is not working"
> >
> > I want to insert this document to a table in hbase, say table "doc"
> >
> > =doc table=
> > -----
> > rowKey : doc_id
> > cf: doc_content
> > value: "I am working. He is not working"
> >
> > Now, i to create another table that stores the word count, say "doc_idx"
> >
> > doc_idx table
> > ---
> > rowKey : I, cf: count, value: 1
> > rowKey : am, cf: count, value: 1
> > rowKey : working, cf: count, value: 2
> > rowKey : He, cf: count, value: 1
> > rowKey : is, cf: count, value: 1
> > rowKey : not, cf: count, value: 1
> >
> > My MR job code:
> > ==============
> >
> > if(doc.checkAndPut(rowKey, doc_content, "", null, putDoc)) {
> >    for(String word : doc_content.split("\\s+")) {
> >       Increment inc = new Increment(Bytes.toBytes(word));
> >       inc.addColumn("count", "", 1);
> >    }
> > }
> >
> > Now, i wanted to do some experiments with coprocessors. So, i modified
> > the code as follows.
> >
> > My MR job code:
> > ===============
> >
> > doc.checkAndPut(rowKey, doc_content, "", null, putDoc);
> >
> > Coprocessor code:
> > ===============
> >
> >    public void start(CoprocessorEnvironment env)  {
> >        pool = new HTablePool(conf, 100);
> >    }
> >
> >    public boolean postCheckAndPut(c,  row,  family, byte[] qualifier,
> > compareOp,     comparator,  put, result) {
> >
> >                if(!result) return true; // check if the put succeeded
> >
> >        HTableInterface table_idx = pool.getTable("doc_idx");
> >
> >        try {
> >
> >            for(KeyValue contentKV = put.get("doc_content", "")) {
> >                            for(String word :
> > contentKV.getValue().split("\\s+")) {
> >                                Increment inc = new
> > Increment(Bytes.toBytes(word));
> >                                inc.addColumn("count", "", 1);
> >                                table_idx.increment(inc);
> >                            }
> >                       }
> >        } finally {
> >            table_idx.close();
> >        }
> >        return true;
> >    }
> >
> >    public void stop(env) {
> >        pool.close();
> >    }
> >
> > I am a newbee to HBASE. I am not sure this is the way to do.
> > Given that, why is the cooprocessor enabled version much slower than
> > the one without?
> >
> >
> > Sincerely,
> > Prakash Kadel
> >
> >
> > On Mon, Feb 18, 2013 at 9:11 PM, Michael Segel
> > <michael_se...@hotmail.com> wrote:
> >>
> >> The  issue I was talking about was the use of a check and put.
> >> The OP wrote:
> >>>>>> each map inserts to doc table.(checkAndPut)
> >>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some
> rows to
> >>>>>> a index table.
> >>
> >> My question is why does the OP use a checkAndPut, and the
> RegionObserver's postChecAndPut?
> >>
> >>
> >> Here's a good example...
>
> http://stackoverflow.com/questions/13404447/is-hbase-checkandput-latency-higher-than-simple-put
>
> >>
> >> The OP doesn't really get in to the use case, so we don't know why the
> Check and Put in the M/R job.
> >> He should just be using put() and then a postPut().
> >>
> >> Another issue... since he's writing to  a different HTable... how? Does
> he create an HTable instance in the start() method of his RO object and
> then reference it later? Or does he create the instance of the HTable on
> the fly in each postCheckAndPut() ?
> >> Without seeing his code, we don't know.
> >>
> >> Note that this is synchronous set of writes. Your overall return from
> the M/R call to put will wait until the second row is inserted.
> >>
> >> Interestingly enough, you may want to consider disabling the WAL on the
> write to the index.  You can always run a M/R job that rebuilds the index
> should something occur to the system where you might lose the data.
> Indexes *ARE* expendable. ;-)
> >>
> >> Does that explain it?
> >>
> >> -Mike
> >>
> >> On Feb 18, 2013, at 4:57 AM, yonghu <yongyong...@gmail.com> wrote:
> >>
> >>> Hi, Michael
> >>>
> >>> I don't quite understand what do you mean by "round trip back to the
> >>> client". In my understanding, as the RegionServer and TaskTracker can
> >>> be the same node, MR don't have to pull data into client and then
> >>> process.  And you also mention the "unnecessary overhead", can you
> >>> explain a little bit what operations or data processing can be seen as
> >>> "unnecessary overhead".
> >>>
> >>> Thanks
> >>>
> >>> yong
> >>> On Mon, Feb 18, 2013 at 10:35 AM, Michael Segel
> >>> <michael_se...@hotmail.com> wrote:
> >>>> Why?
> >>>>
> >>>> This seems like an unnecessary overhead.
> >>>>
> >>>> You are writing code within the coprocessor on the server.
> Pessimistic code really isn't recommended if you are worried about
> performance.
> >>>>
> >>>> I have to ask... by the time you have executed the code in your
> co-processor, what would cause the initial write to fail?
> >>>>
> >>>>
> >>>> On Feb 18, 2013, at 3:01 AM, Prakash Kadel <prakash.ka...@gmail.com>
> wrote:
> >>>>
> >>>>> its a local read. i just check the last param of PostCheckAndPut
> indicating if the Put succeeded. Incase if the put success, i insert a row
> in another table
> >>>>>
> >>>>> Sincerely,
> >>>>> Prakash Kadel
> >>>>>
> >>>>> On Feb 18, 2013, at 2:52 PM, Wei Tan <w...@us.ibm.com> wrote:
> >>>>>
> >>>>>> Is your CheckAndPut involving a local or remote READ? Due to the
> nature of
> >>>>>> LSM, read is much slower compared to a write...
> >>>>>>
> >>>>>>
> >>>>>> Best Regards,
> >>>>>> Wei
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> From:   Prakash Kadel <prakash.ka...@gmail.com>
> >>>>>> To:     "user@hbase.apache.org" <user@hbase.apache.org>,
> >>>>>> Date:   02/17/2013 07:49 PM
> >>>>>> Subject:        coprocessor enabled put very slow, help please~~~
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> hi,
> >>>>>> i am trying to insert few million documents to hbase with
> mapreduce. To
> >>>>>> enable quick search of docs i want to have some indexes, so i tried
> to use
> >>>>>> the coprocessors, but they are slowing down my inserts. Arent the
> >>>>>> coprocessors not supposed to increase the latency?
> >>>>>> my settings:
> >>>>>> 3 region servers
> >>>>>> 60 maps
> >>>>>> each map inserts to doc table.(checkAndPut)
> >>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some
> rows to
> >>>>>> a index table.
> >>>>>>
> >>>>>>
> >>>>>> Sincerely,
> >>>>>> Prakash
> >>>>
> >>>> Michael Segel  | (m) 312.755.9623
> >>>>
> >>>> Segel and Associates
> >
>
>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: coprocessor enabled put very slow, help please~~~

Reply via email to