Re: ANNOUNCE: Yu Li joins the Apache HBase PMC

2017-04-15 Thread hongbin ma
congrats, Yu! On Sat, Apr 15, 2017 at 9:54 AM, Jerry He wrote: > Congratulations and welcome, Yu! > > On Fri, Apr 14, 2017 at 6:47 PM, Andrew Purtell > wrote: > > > Congratulations and welcome! > > > > > > On Fri, Apr 14, 2017 at 7:22 AM, Anoop John

Re: Rows per second for RegionScanner

2016-05-19 Thread hongbin ma
Vlad > > > On Thu, Apr 21, 2016 at 7:22 PM, hongbin ma <mahong...@apache.org> wrote: > > > hi Thakrar > > > > Thanks for your reply. > > > > My settings for the RegionScanner Scan is > > > > scan.setCaching(1024) > > scan.setMaxResul

Re: Rows per second for RegionScanner

2016-04-21 Thread hongbin ma
gt; > You can read up on it - https://hbase.apache.org/book.html > > Another thing, are you just looking for pure scan-read performance > optimization? > Depending upon the table size you can also look into caching the table or > not caching at all. > > -Original Message- &g

Rows per second for RegionScanner

2016-04-21 Thread hongbin ma
​Hi, experts, I'm trying to figure out how fast hbase can scan. I'm setting up the RegionScan in a endpoint coprocessor so that no network overhead will be included. My average key length is 35 and average value length is 5. My test result is that if I warm all my interested blocks in the block

Re: Fuzzy Row filter with ranges

2016-01-31 Thread hongbin ma
i suggest you reading the current implementation of fuzzy row filter, and modify it according to your requirement, it should be easy On Mon, Feb 1, 2016 at 11:50 AM, Rajeshkumar J wrote: > Hi, > > Can any one guide me details regarding how to implement fuzzy row

performance difference between observer and endpoint

2015-12-27 Thread hongbin ma
hi experts, we're working on using observer/endpoint to visit hbase rows. During our experiments we found that for *small interval scans* (where the scan range is within a single region, and the scan range contains only tens/hundreds of rows), observer is times faster than endpoint. Our question

isolation level of put and scan

2015-11-17 Thread hongbin ma
hi,experts: i have two concurrent threads to read/write same the same htable row. this row has two columns A and B. Currently this rows value is (A:a1, B:b1) thread 1 wants to read the value of this row's column value for A, B and thread 2 wants to update this row to (A:a2, B:b2) if thread 1,2

Re: isolation level of put and scan

2015-11-17 Thread hongbin ma
PM, hongbin ma <mahong...@apache.org> wrote: > hi,experts: > > i have two concurrent threads to read/write same the same htable row. > this row has two columns A and B. > > Currently this rows value is (A:a1, B:b1) > thread 1 wants to read the value of this row's column

Re: toStringBinary output is painful

2015-04-13 Thread hongbin ma
+1 I found the mix of ascii/hex very painful when I need to compare two binary keys/values On Tue, Apr 14, 2015 at 6:16 AM, Dave Latham lat...@davelink.net wrote: Wish I had started this conversation 5 years ago... When we're using binary data, especially in row keys (and therefore region

Does hbase WAL ensures no data loss?

2015-02-16 Thread hongbin ma
hi, all It seems WAL.append() in hbase, the javadoc says: * * Append a set of edits to the WAL. The WAL is not flushed/sync'd after this transaction* * * completes BUT on return this edit must have its region edit/sequence id assigned* * * else it messes up our unification of mvcc and

Re: Help needed on choosing OCR software

2015-02-16 Thread hongbin ma
I used to came across this: https://code.google.com/p/tesseract-ocr/ AFAIK, OCR requires training if you want to get a high quality recognition. and it's not easy to have a model that suits all styles of hand writings On Mon, Feb 16, 2015 at 7:33 PM, N. Ramasubramanian

Re: Streaming data to htable

2015-02-16 Thread hongbin ma
) But you need to be sure that you actually need to do such micromanagement and not just stick with regular Puts. HBase can sustain quite good amount of input data to start worry about. Cheers. On Fri, Feb 13, 2015 at 6:20 AM, hongbin ma mahong...@apache.org wrote

Streaming data to htable

2015-02-12 Thread hongbin ma
hi, I'm trying to use a htable to store data that comes in a streaming fashion. The streaming in data is guaranteed to have a larger KEY than ANY existing keys in the table. And the data will be READONLY. The data is streaming in at a very high rate, I don't want to issue a PUT operation for

Re: ColumnSuffixFilter in HBase

2015-02-10 Thread hongbin ma
Will there be any performance issues ? i'm curious if there's an efficient way of implement such kind of filer. On Wed, Feb 11, 2015 at 1:39 PM, Alok Singh aloksi...@gmail.com wrote: You could use a QualifierFilter with a RegexStringComparator to do the same. Alok On Tue, Feb 10, 2015 at

Re: Hbase table export and import

2015-02-07 Thread hongbin ma
before the import, right ? You can obtain the region boundaries from the original table and split the target table accordingly. Cheers On Thu, Feb 5, 2015 at 11:10 PM, hongbin ma mahong...@apache.org wrote: hi, For test purpose, we're trying to export a test hbase table and import

Hbase table export and import

2015-02-05 Thread hongbin ma
hi, For test purpose, we're trying to export a test hbase table and import it into a minicluster using org.apache.hadoop.hbase.mapreduce.Export and org.apache.hadoop.hbase.mapreduce.Import. The thing is, there was four regions in our original htable, but after Importing, the imported htable has