thanks, i am going to do some test and let you know
On Mon, Feb 18, 2013 at 11:13 PM, Michel Segel <michael_se...@hotmail.com> wrote: > Why are you using an HTable Pool? > Why are you closing the table after each iteration through? > > Try using 1 HTable object. Turn off WAL > Initiate in start() > Close in Stop() > Surround the use in a try / catch > If exception caught, re instantiate new HTable connection. > > Maybe want to flush the connection after puts. > > > Again not sure why you are using check and put on the base table. Your count > could be off. > > As an example look at poem/rhyme 'Marry had a little lamb'. > Then check your word count. > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Feb 18, 2013, at 7:21 AM, prakash kadel <prakash.ka...@gmail.com> wrote: > >> Thank you guys for your replies, >> Michael, >> I think i didnt make it clear. Here is my use case, >> >> I have text documents to insert in the hbase. (With possible duplicates) >> Suppose i have a document as : " I am working. He is not working" >> >> I want to insert this document to a table in hbase, say table "doc" >> >> =doc table= >> ----- >> rowKey : doc_id >> cf: doc_content >> value: "I am working. He is not working" >> >> Now, i to create another table that stores the word count, say "doc_idx" >> >> doc_idx table >> --- >> rowKey : I, cf: count, value: 1 >> rowKey : am, cf: count, value: 1 >> rowKey : working, cf: count, value: 2 >> rowKey : He, cf: count, value: 1 >> rowKey : is, cf: count, value: 1 >> rowKey : not, cf: count, value: 1 >> >> My MR job code: >> ============== >> >> if(doc.checkAndPut(rowKey, doc_content, "", null, putDoc)) { >> for(String word : doc_content.split("\\s+")) { >> Increment inc = new Increment(Bytes.toBytes(word)); >> inc.addColumn("count", "", 1); >> } >> } >> >> Now, i wanted to do some experiments with coprocessors. So, i modified >> the code as follows. >> >> My MR job code: >> =============== >> >> doc.checkAndPut(rowKey, doc_content, "", null, putDoc); >> >> Coprocessor code: >> =============== >> >> public void start(CoprocessorEnvironment env) { >> pool = new HTablePool(conf, 100); >> } >> >> public boolean postCheckAndPut(c, row, family, byte[] qualifier, >> compareOp, comparator, put, result) { >> >> if(!result) return true; // check if the put succeeded >> >> HTableInterface table_idx = pool.getTable("doc_idx"); >> >> try { >> >> for(KeyValue contentKV = put.get("doc_content", "")) { >> for(String word : >> contentKV.getValue().split("\\s+")) { >> Increment inc = new >> Increment(Bytes.toBytes(word)); >> inc.addColumn("count", "", 1); >> table_idx.increment(inc); >> } >> } >> } finally { >> table_idx.close(); >> } >> return true; >> } >> >> public void stop(env) { >> pool.close(); >> } >> >> I am a newbee to HBASE. I am not sure this is the way to do. >> Given that, why is the cooprocessor enabled version much slower than >> the one without? >> >> >> Sincerely, >> Prakash Kadel >> >> >> On Mon, Feb 18, 2013 at 9:11 PM, Michael Segel >> <michael_se...@hotmail.com> wrote: >>> >>> The issue I was talking about was the use of a check and put. >>> The OP wrote: >>>>>>> each map inserts to doc table.(checkAndPut) >>>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some rows >>>>>>> to >>>>>>> a index table. >>> >>> My question is why does the OP use a checkAndPut, and the RegionObserver's >>> postChecAndPut? >>> >>> >>> Here's a good example... >>> http://stackoverflow.com/questions/13404447/is-hbase-checkandput-latency-higher-than-simple-put >>> >>> The OP doesn't really get in to the use case, so we don't know why the >>> Check and Put in the M/R job. >>> He should just be using put() and then a postPut(). >>> >>> Another issue... since he's writing to a different HTable... how? Does he >>> create an HTable instance in the start() method of his RO object and then >>> reference it later? Or does he create the instance of the HTable on the fly >>> in each postCheckAndPut() ? >>> Without seeing his code, we don't know. >>> >>> Note that this is synchronous set of writes. Your overall return from the >>> M/R call to put will wait until the second row is inserted. >>> >>> Interestingly enough, you may want to consider disabling the WAL on the >>> write to the index. You can always run a M/R job that rebuilds the index >>> should something occur to the system where you might lose the data. >>> Indexes *ARE* expendable. ;-) >>> >>> Does that explain it? >>> >>> -Mike >>> >>> On Feb 18, 2013, at 4:57 AM, yonghu <yongyong...@gmail.com> wrote: >>> >>>> Hi, Michael >>>> >>>> I don't quite understand what do you mean by "round trip back to the >>>> client". In my understanding, as the RegionServer and TaskTracker can >>>> be the same node, MR don't have to pull data into client and then >>>> process. And you also mention the "unnecessary overhead", can you >>>> explain a little bit what operations or data processing can be seen as >>>> "unnecessary overhead". >>>> >>>> Thanks >>>> >>>> yong >>>> On Mon, Feb 18, 2013 at 10:35 AM, Michael Segel >>>> <michael_se...@hotmail.com> wrote: >>>>> Why? >>>>> >>>>> This seems like an unnecessary overhead. >>>>> >>>>> You are writing code within the coprocessor on the server. Pessimistic >>>>> code really isn't recommended if you are worried about performance. >>>>> >>>>> I have to ask... by the time you have executed the code in your >>>>> co-processor, what would cause the initial write to fail? >>>>> >>>>> >>>>> On Feb 18, 2013, at 3:01 AM, Prakash Kadel <prakash.ka...@gmail.com> >>>>> wrote: >>>>> >>>>>> its a local read. i just check the last param of PostCheckAndPut >>>>>> indicating if the Put succeeded. Incase if the put success, i insert a >>>>>> row in another table >>>>>> >>>>>> Sincerely, >>>>>> Prakash Kadel >>>>>> >>>>>> On Feb 18, 2013, at 2:52 PM, Wei Tan <w...@us.ibm.com> wrote: >>>>>> >>>>>>> Is your CheckAndPut involving a local or remote READ? Due to the nature >>>>>>> of >>>>>>> LSM, read is much slower compared to a write... >>>>>>> >>>>>>> >>>>>>> Best Regards, >>>>>>> Wei >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> From: Prakash Kadel <prakash.ka...@gmail.com> >>>>>>> To: "user@hbase.apache.org" <user@hbase.apache.org>, >>>>>>> Date: 02/17/2013 07:49 PM >>>>>>> Subject: coprocessor enabled put very slow, help please~~~ >>>>>>> >>>>>>> >>>>>>> >>>>>>> hi, >>>>>>> i am trying to insert few million documents to hbase with mapreduce. To >>>>>>> enable quick search of docs i want to have some indexes, so i tried to >>>>>>> use >>>>>>> the coprocessors, but they are slowing down my inserts. Arent the >>>>>>> coprocessors not supposed to increase the latency? >>>>>>> my settings: >>>>>>> 3 region servers >>>>>>> 60 maps >>>>>>> each map inserts to doc table.(checkAndPut) >>>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some rows >>>>>>> to >>>>>>> a index table. >>>>>>> >>>>>>> >>>>>>> Sincerely, >>>>>>> Prakash >>>>> >>>>> Michael Segel | (m) 312.755.9623 >>>>> >>>>> Segel and Associates >>