I have also considered this method. But what about other columns without default value(status's default value is 0, so I can think absence as 0) e.g. depth, insertTime, ... anyway, if using put instead of checkAndPut will make it much faster, I will consider this method.
On Tue, Apr 29, 2014 at 9:44 AM, Jean-Marc Spaggiari <[email protected]> wrote: > Simply don't set your status to 0 when you write it first. > > Absence mean not read. > 1 mean read. > So there is no risk that someone try to set 0 and someone else try to set 1. > > Will that be an option? > > > 2014-04-28 21:23 GMT-04:00 Li Li <[email protected]>: > >> I am using hbase to store information for a web spider. >> I have a table to save information of a webpage, the rowkey is url, >> and there are other columns such as status(int) and depth(int) >> in the beginning, the status is 0. A worker thread will select urls >> whose status is 0 and do something with it and modify it to 1,... >> there are more than 1 urls link to a given url. >> e.g. url1->url url2->url >> there are two times insertion of url. If I do not use checkAndPut, >> when thread 1 insert url and the worker thread do something with url >> and modify its status to 1. Then thread 2 again insert url and reset >> the status to 0, then the worker thread will do somthing again. That's >> not I want. >> >> On Tue, Apr 29, 2014 at 8:56 AM, Jean-Marc Spaggiari >> <[email protected]> wrote: >> > Why do you want to make sure the row is only inserted once? If you insert >> > the same raw twice the 2nd one will simple overwrite the first one and >> > HBase will take care of the versions. >> > >> > regarding the codes fragments, I don't think the autoflush is going to >> do a >> > big difference compared to the cost of the check & put... >> > >> > >> > 2014-04-28 20:50 GMT-04:00 Li Li <[email protected]>: >> > >> >> I must use checkAndPut to ensure a row is only inserted once. >> >> if I have 1000 checkAndPut,will setAutoFlush(false) useful? >> >> is there any performance difference of the following two code fragments? >> >> 1. >> >> table.setAutoFlush(false); >> >> for(int i=0;i<1000;i++){ >> >> Put put=... >> >> table.checkAndPut(,....put); >> >> } >> >> 2. >> >> table.setAutoFlush(true); >> >> for(int i=0;i<1000;i++){ >> >> Put put=... >> >> table.checkAndPut(,....put); >> >> } >> >> >> >> On Tue, Apr 29, 2014 at 8:36 AM, Jean-Marc Spaggiari >> >> <[email protected]> wrote: >> >> > It depends. Batch a list of puts/gets wll be way faster than >> checkAndPut, >> >> > but the result will not be the same... a batch of puts will not do any >> >> > check... >> >> > >> >> > >> >> > 2014-04-28 20:17 GMT-04:00 Li Li <[email protected]>: >> >> > >> >> >> but I have many checkAndPut operations. >> >> >> will use batch a better solution? >> >> >> >> >> >> On Mon, Apr 28, 2014 at 8:01 PM, Jean-Marc Spaggiari >> >> >> <[email protected]> wrote: >> >> >> > Hi Li Li, >> >> >> > >> >> >> > Yes, threads will impact the performances. If you send all you >> writes >> >> >> with >> >> >> > a single thread, a single HBase handler will take care of them, >> etc. >> >> >> HBase >> >> >> > does not provide a single handler for a single client connexion. >> It's >> >> >> able >> >> >> > to handle multiple threads and clients. >> >> >> > >> >> >> > However, it also all depends on the way you send your writes. If >> you >> >> >> send a >> >> >> > single puts(<10000>) per seconds, if will not be better to send 10 >> 000 >> >> >> > threads with a single put. >> >> >> > >> >> >> > I will recommend you to run some perf tests on your installation to >> >> find >> >> >> a >> >> >> > good number for your configuration. >> >> >> > >> >> >> > JM >> >> >> > >> >> >> > >> >> >> > 2014-04-28 6:27 GMT-04:00 Li Li <[email protected]>: >> >> >> > >> >> >> >> hi all, >> >> >> >> with the same read/write data, will threads count affect >> >> performance? >> >> >> >> e.g. I have 10,000 write request/second. I don't care the order >> >> very >> >> >> >> much. >> >> >> >> how many writer threads should I use to obtain maximum >> throughput? >> >> >> >> >> >> >> >> >> >>
