A coprocessor is some code running in a server process. The resources available and rules of the road are different from client side programming. HTablePool (and HTable in general) is problematic for server side programming in my opinion: http://search-hadoop.com/m/XtAi5Fogw32 Since this comes up now and again seems like a lightweight alternative for server side IPC could be useful.
On Tue, Feb 19, 2013 at 7:15 AM, Wei Tan <w...@us.ibm.com> wrote: > A side question: if HTablePool is not encouraged to be used... how we > handle the thread safeness in using HTable? Any replacement for > HTablePool, in plan? > Thanks, > > > Best Regards, > Wei > > > > > From: Michel Segel <michael_se...@hotmail.com> > To: "user@hbase.apache.org" <user@hbase.apache.org>, > Date: 02/18/2013 09:23 AM > Subject: Re: coprocessor enabled put very slow, help please~~~ > > > > Why are you using an HTable Pool? > Why are you closing the table after each iteration through? > > Try using 1 HTable object. Turn off WAL > Initiate in start() > Close in Stop() > Surround the use in a try / catch > If exception caught, re instantiate new HTable connection. > > Maybe want to flush the connection after puts. > > > Again not sure why you are using check and put on the base table. Your > count could be off. > > As an example look at poem/rhyme 'Marry had a little lamb'. > Then check your word count. > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Feb 18, 2013, at 7:21 AM, prakash kadel <prakash.ka...@gmail.com> > wrote: > > > Thank you guys for your replies, > > Michael, > > I think i didnt make it clear. Here is my use case, > > > > I have text documents to insert in the hbase. (With possible duplicates) > > Suppose i have a document as : " I am working. He is not working" > > > > I want to insert this document to a table in hbase, say table "doc" > > > > =doc table= > > ----- > > rowKey : doc_id > > cf: doc_content > > value: "I am working. He is not working" > > > > Now, i to create another table that stores the word count, say "doc_idx" > > > > doc_idx table > > --- > > rowKey : I, cf: count, value: 1 > > rowKey : am, cf: count, value: 1 > > rowKey : working, cf: count, value: 2 > > rowKey : He, cf: count, value: 1 > > rowKey : is, cf: count, value: 1 > > rowKey : not, cf: count, value: 1 > > > > My MR job code: > > ============== > > > > if(doc.checkAndPut(rowKey, doc_content, "", null, putDoc)) { > > for(String word : doc_content.split("\\s+")) { > > Increment inc = new Increment(Bytes.toBytes(word)); > > inc.addColumn("count", "", 1); > > } > > } > > > > Now, i wanted to do some experiments with coprocessors. So, i modified > > the code as follows. > > > > My MR job code: > > =============== > > > > doc.checkAndPut(rowKey, doc_content, "", null, putDoc); > > > > Coprocessor code: > > =============== > > > > public void start(CoprocessorEnvironment env) { > > pool = new HTablePool(conf, 100); > > } > > > > public boolean postCheckAndPut(c, row, family, byte[] qualifier, > > compareOp, comparator, put, result) { > > > > if(!result) return true; // check if the put succeeded > > > > HTableInterface table_idx = pool.getTable("doc_idx"); > > > > try { > > > > for(KeyValue contentKV = put.get("doc_content", "")) { > > for(String word : > > contentKV.getValue().split("\\s+")) { > > Increment inc = new > > Increment(Bytes.toBytes(word)); > > inc.addColumn("count", "", 1); > > table_idx.increment(inc); > > } > > } > > } finally { > > table_idx.close(); > > } > > return true; > > } > > > > public void stop(env) { > > pool.close(); > > } > > > > I am a newbee to HBASE. I am not sure this is the way to do. > > Given that, why is the cooprocessor enabled version much slower than > > the one without? > > > > > > Sincerely, > > Prakash Kadel > > > > > > On Mon, Feb 18, 2013 at 9:11 PM, Michael Segel > > <michael_se...@hotmail.com> wrote: > >> > >> The issue I was talking about was the use of a check and put. > >> The OP wrote: > >>>>>> each map inserts to doc table.(checkAndPut) > >>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some > rows to > >>>>>> a index table. > >> > >> My question is why does the OP use a checkAndPut, and the > RegionObserver's postChecAndPut? > >> > >> > >> Here's a good example... > > http://stackoverflow.com/questions/13404447/is-hbase-checkandput-latency-higher-than-simple-put > > >> > >> The OP doesn't really get in to the use case, so we don't know why the > Check and Put in the M/R job. > >> He should just be using put() and then a postPut(). > >> > >> Another issue... since he's writing to a different HTable... how? Does > he create an HTable instance in the start() method of his RO object and > then reference it later? Or does he create the instance of the HTable on > the fly in each postCheckAndPut() ? > >> Without seeing his code, we don't know. > >> > >> Note that this is synchronous set of writes. Your overall return from > the M/R call to put will wait until the second row is inserted. > >> > >> Interestingly enough, you may want to consider disabling the WAL on the > write to the index. You can always run a M/R job that rebuilds the index > should something occur to the system where you might lose the data. > Indexes *ARE* expendable. ;-) > >> > >> Does that explain it? > >> > >> -Mike > >> > >> On Feb 18, 2013, at 4:57 AM, yonghu <yongyong...@gmail.com> wrote: > >> > >>> Hi, Michael > >>> > >>> I don't quite understand what do you mean by "round trip back to the > >>> client". In my understanding, as the RegionServer and TaskTracker can > >>> be the same node, MR don't have to pull data into client and then > >>> process. And you also mention the "unnecessary overhead", can you > >>> explain a little bit what operations or data processing can be seen as > >>> "unnecessary overhead". > >>> > >>> Thanks > >>> > >>> yong > >>> On Mon, Feb 18, 2013 at 10:35 AM, Michael Segel > >>> <michael_se...@hotmail.com> wrote: > >>>> Why? > >>>> > >>>> This seems like an unnecessary overhead. > >>>> > >>>> You are writing code within the coprocessor on the server. > Pessimistic code really isn't recommended if you are worried about > performance. > >>>> > >>>> I have to ask... by the time you have executed the code in your > co-processor, what would cause the initial write to fail? > >>>> > >>>> > >>>> On Feb 18, 2013, at 3:01 AM, Prakash Kadel <prakash.ka...@gmail.com> > wrote: > >>>> > >>>>> its a local read. i just check the last param of PostCheckAndPut > indicating if the Put succeeded. Incase if the put success, i insert a row > in another table > >>>>> > >>>>> Sincerely, > >>>>> Prakash Kadel > >>>>> > >>>>> On Feb 18, 2013, at 2:52 PM, Wei Tan <w...@us.ibm.com> wrote: > >>>>> > >>>>>> Is your CheckAndPut involving a local or remote READ? Due to the > nature of > >>>>>> LSM, read is much slower compared to a write... > >>>>>> > >>>>>> > >>>>>> Best Regards, > >>>>>> Wei > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> From: Prakash Kadel <prakash.ka...@gmail.com> > >>>>>> To: "user@hbase.apache.org" <user@hbase.apache.org>, > >>>>>> Date: 02/17/2013 07:49 PM > >>>>>> Subject: coprocessor enabled put very slow, help please~~~ > >>>>>> > >>>>>> > >>>>>> > >>>>>> hi, > >>>>>> i am trying to insert few million documents to hbase with > mapreduce. To > >>>>>> enable quick search of docs i want to have some indexes, so i tried > to use > >>>>>> the coprocessors, but they are slowing down my inserts. Arent the > >>>>>> coprocessors not supposed to increase the latency? > >>>>>> my settings: > >>>>>> 3 region servers > >>>>>> 60 maps > >>>>>> each map inserts to doc table.(checkAndPut) > >>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some > rows to > >>>>>> a index table. > >>>>>> > >>>>>> > >>>>>> Sincerely, > >>>>>> Prakash > >>>> > >>>> Michael Segel | (m) 312.755.9623 > >>>> > >>>> Segel and Associates > > > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)