Re: coprocessor enabled put very slow, help please~~~

Michael Segel Tue, 19 Feb 2013 08:02:23 -0800

Good question.. 

You create a class MyRO.


How many instances of  MyRO exist per RS?

How many queries can access the instance MyRO at the same time? 




On Feb 19, 2013, at 9:15 AM, Wei Tan <w...@us.ibm.com> wrote:

> A side question: if HTablePool is not encouraged to be used... how we 
> handle the thread safeness in using HTable? Any replacement for 
> HTablePool, in plan?
> Thanks,
> 
> 
> Best Regards,
> Wei
> 
> 
> 
> 
> From:   Michel Segel <michael_se...@hotmail.com>
> To:     "user@hbase.apache.org" <user@hbase.apache.org>, 
> Date:   02/18/2013 09:23 AM
> Subject:        Re: coprocessor enabled put very slow, help please~~~
> 
> 
> 
> Why are you using an HTable Pool?
> Why are you closing the table after each iteration through?
> 
> Try using 1 HTable object. Turn off WAL
> Initiate in start()
> Close in Stop()
> Surround the use in a try / catch
> If exception caught, re instantiate new HTable connection.
> 
> Maybe want to flush the connection after puts. 
> 
> 
> Again not sure why you are using check and put on the base table. Your 
> count could be off.
> 
> As an example look at poem/rhyme 'Marry had a little lamb'.
> Then check your word count.
> 
> Sent from a remote device. Please excuse any typos...
> 
> Mike Segel
> 
> On Feb 18, 2013, at 7:21 AM, prakash kadel <prakash.ka...@gmail.com> 
> wrote:
> 
>> Thank you guys for your replies,
>> Michael,
>>  I think i didnt make it clear. Here is my use case,
>> 
>> I have text documents to insert in the hbase. (With possible duplicates)
>> Suppose i have a document as : " I am working. He is not working"
>> 
>> I want to insert this document to a table in hbase, say table "doc"
>> 
>> =doc table=
>> -----
>> rowKey : doc_id
>> cf: doc_content
>> value: "I am working. He is not working"
>> 
>> Now, i to create another table that stores the word count, say "doc_idx"
>> 
>> doc_idx table
>> ---
>> rowKey : I, cf: count, value: 1
>> rowKey : am, cf: count, value: 1
>> rowKey : working, cf: count, value: 2
>> rowKey : He, cf: count, value: 1
>> rowKey : is, cf: count, value: 1
>> rowKey : not, cf: count, value: 1
>> 
>> My MR job code:
>> ==============
>> 
>> if(doc.checkAndPut(rowKey, doc_content, "", null, putDoc)) {
>>   for(String word : doc_content.split("\\s+")) {
>>      Increment inc = new Increment(Bytes.toBytes(word));
>>      inc.addColumn("count", "", 1);
>>   }
>> }
>> 
>> Now, i wanted to do some experiments with coprocessors. So, i modified
>> the code as follows.
>> 
>> My MR job code:
>> ===============
>> 
>> doc.checkAndPut(rowKey, doc_content, "", null, putDoc);
>> 
>> Coprocessor code:
>> ===============
>> 
>>   public void start(CoprocessorEnvironment env)  {
>>       pool = new HTablePool(conf, 100);
>>   }
>> 
>>   public boolean postCheckAndPut(c,  row,  family, byte[] qualifier,
>> compareOp,     comparator,  put, result) {
>> 
>>               if(!result) return true; // check if the put succeeded
>> 
>>       HTableInterface table_idx = pool.getTable("doc_idx");
>> 
>>       try {
>> 
>>           for(KeyValue contentKV = put.get("doc_content", "")) {
>>                           for(String word :
>> contentKV.getValue().split("\\s+")) {
>>                               Increment inc = new
>> Increment(Bytes.toBytes(word));
>>                               inc.addColumn("count", "", 1);
>>                               table_idx.increment(inc);
>>                           }
>>                      }
>>       } finally {
>>           table_idx.close();
>>       }
>>       return true;
>>   }
>> 
>>   public void stop(env) {
>>       pool.close();
>>   }
>> 
>> I am a newbee to HBASE. I am not sure this is the way to do.
>> Given that, why is the cooprocessor enabled version much slower than
>> the one without?
>> 
>> 
>> Sincerely,
>> Prakash Kadel
>> 
>> 
>> On Mon, Feb 18, 2013 at 9:11 PM, Michael Segel
>> <michael_se...@hotmail.com> wrote:
>>> 
>>> The  issue I was talking about was the use of a check and put.
>>> The OP wrote:
>>>>>>> each map inserts to doc table.(checkAndPut)
>>>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some 
> rows to
>>>>>>> a index table.
>>> 
>>> My question is why does the OP use a checkAndPut, and the 
> RegionObserver's postChecAndPut?
>>> 
>>> 
>>> Here's a good example... 
> http://stackoverflow.com/questions/13404447/is-hbase-checkandput-latency-higher-than-simple-put
> 
>>> 
>>> The OP doesn't really get in to the use case, so we don't know why the 
> Check and Put in the M/R job.
>>> He should just be using put() and then a postPut().
>>> 
>>> Another issue... since he's writing to  a different HTable... how? Does 
> he create an HTable instance in the start() method of his RO object and 
> then reference it later? Or does he create the instance of the HTable on 
> the fly in each postCheckAndPut() ?
>>> Without seeing his code, we don't know.
>>> 
>>> Note that this is synchronous set of writes. Your overall return from 
> the M/R call to put will wait until the second row is inserted.
>>> 
>>> Interestingly enough, you may want to consider disabling the WAL on the 
> write to the index.  You can always run a M/R job that rebuilds the index 
> should something occur to the system where you might lose the data. 
> Indexes *ARE* expendable. ;-)
>>> 
>>> Does that explain it?
>>> 
>>> -Mike
>>> 
>>> On Feb 18, 2013, at 4:57 AM, yonghu <yongyong...@gmail.com> wrote:
>>> 
>>>> Hi, Michael
>>>> 
>>>> I don't quite understand what do you mean by "round trip back to the
>>>> client". In my understanding, as the RegionServer and TaskTracker can
>>>> be the same node, MR don't have to pull data into client and then
>>>> process.  And you also mention the "unnecessary overhead", can you
>>>> explain a little bit what operations or data processing can be seen as
>>>> "unnecessary overhead".
>>>> 
>>>> Thanks
>>>> 
>>>> yong
>>>> On Mon, Feb 18, 2013 at 10:35 AM, Michael Segel
>>>> <michael_se...@hotmail.com> wrote:
>>>>> Why?
>>>>> 
>>>>> This seems like an unnecessary overhead.
>>>>> 
>>>>> You are writing code within the coprocessor on the server. 
> Pessimistic code really isn't recommended if you are worried about 
> performance.
>>>>> 
>>>>> I have to ask... by the time you have executed the code in your 
> co-processor, what would cause the initial write to fail?
>>>>> 
>>>>> 
>>>>> On Feb 18, 2013, at 3:01 AM, Prakash Kadel <prakash.ka...@gmail.com> 
> wrote:
>>>>> 
>>>>>> its a local read. i just check the last param of PostCheckAndPut 
> indicating if the Put succeeded. Incase if the put success, i insert a row 
> in another table
>>>>>> 
>>>>>> Sincerely,
>>>>>> Prakash Kadel
>>>>>> 
>>>>>> On Feb 18, 2013, at 2:52 PM, Wei Tan <w...@us.ibm.com> wrote:
>>>>>> 
>>>>>>> Is your CheckAndPut involving a local or remote READ? Due to the 
> nature of
>>>>>>> LSM, read is much slower compared to a write...
>>>>>>> 
>>>>>>> 
>>>>>>> Best Regards,
>>>>>>> Wei
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From:   Prakash Kadel <prakash.ka...@gmail.com>
>>>>>>> To:     "user@hbase.apache.org" <user@hbase.apache.org>,
>>>>>>> Date:   02/17/2013 07:49 PM
>>>>>>> Subject:        coprocessor enabled put very slow, help please~~~
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> hi,
>>>>>>> i am trying to insert few million documents to hbase with 
> mapreduce. To
>>>>>>> enable quick search of docs i want to have some indexes, so i tried 
> to use
>>>>>>> the coprocessors, but they are slowing down my inserts. Arent the
>>>>>>> coprocessors not supposed to increase the latency?
>>>>>>> my settings:
>>>>>>> 3 region servers
>>>>>>> 60 maps
>>>>>>> each map inserts to doc table.(checkAndPut)
>>>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some 
> rows to
>>>>>>> a index table.
>>>>>>> 
>>>>>>> 
>>>>>>> Sincerely,
>>>>>>> Prakash
>>>>> 
>>>>> Michael Segel  | (m) 312.755.9623
>>>>> 
>>>>> Segel and Associates
>> 
> 
>

Re: coprocessor enabled put very slow, help please~~~

Reply via email to