Re: Question on efficient, ordered composite keys

2014-01-15 Thread Henning Blohm
Thanks Ted. So in effect it is relevant but there is nothing available OOTB. On 01/14/2014 04:02 PM, Ted Yu wrote: Please take a look at HBASE-8089 which is an umbrella JIRA. Some of its subtasks are in 0.96 bq. claiming that short keys (as well as short column names) are relevant bq. Is that

Re: Question on efficient, ordered composite keys

2014-01-15 Thread Henning Blohm
I am definitely considering that (started looking into Phoenix anyway). Thanks, Henning On 01/14/2014 10:01 PM, James Taylor wrote: Hi Henning, My favorite implementation of efficient composite row keys is Phoenix. We support composite row keys whose byte representation sorts according to the

Re: Question on efficient, ordered composite keys

2014-01-15 Thread Henning Blohm
Until James' reply I wasn't aware of the possibility to use Phoenix Keys independently of the rest (which would be a requirement currently). So the best choice among existing implementations seemed to be Orderly but it looks a little abandoned. I will ping Nick on that. Thanks, Henning On

Re: Fast scan with PrefixFilter?

2014-01-15 Thread Ted Yu
Take a look at this blog: http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/ From your earlier description, the components of your rowkey have fixed length. Thus you can consider using fuzzy row filter. Cheers On Jan 14, 2014, at

Is get a private case of scan ?

2014-01-15 Thread Amit Sela
Hi all, I was wondering if Get is implemented as a private case of scan ? In HRegion, I see that the get passed is used to construct a Scan object for the RegionScanner to use. I was wondering if executing Scan(Get) vs Get from client api should have any overhead ? Thanks, Amit.

KeyValue size in bytes compared to store files size

2014-01-15 Thread Amit Sela
Hi all, I'm trying to measure the size (in bytes) of the data I'm about to load into HBase. I'm using bulk load with PutSortReducer. All bulk load data is loaded into new regions and not added to existing ones. In order to count the size of all KeyValues in the Put object I iterate over the Put's

Re: KeyValue size in bytes compared to store files size

2014-01-15 Thread Ted Yu
See previous discussion: http://search-hadoop.com/m/85S3A1DgZHP1 On Wed, Jan 15, 2014 at 5:44 AM, Amit Sela am...@infolinks.com wrote: Hi all, I'm trying to measure the size (in bytes) of the data I'm about to load into HBase. I'm using bulk load with PutSortReducer. All bulk load data is

Re: KeyValue size in bytes compared to store files size

2014-01-15 Thread Amit Sela
I'm talking about the store files size and the ratio between store file size and the byte count as counted in PutSortReducer. On Wed, Jan 15, 2014 at 5:35 PM, Ted Yu yuzhih...@gmail.com wrote: See previous discussion: http://search-hadoop.com/m/85S3A1DgZHP1 On Wed, Jan 15, 2014 at 5:44 AM,

Filter seeking using Hint incase of INCLUDE

2014-01-15 Thread Sagar Naik
Hello All, I am writing a filter which skips using hint and includes KeyValue when conditions are met. However, when I include A KeyValue, the next call to filterKeyValue(KeyValue kv) is always the next lexicographical KeyValue. I would like to skip to the next hint when a Keyvalue is included so

Re: Filter seeking using Hint incase of INCLUDE

2014-01-15 Thread Ted Yu
This is related: https://issues.apache.org/jira/browse/HBASE-5512 On Wed, Jan 15, 2014 at 10:33 AM, Sagar Naik sn...@splunk.com wrote: Hello All, I am writing a filter which skips using hint and includes KeyValue when conditions are met. However, when I include A KeyValue, the next call to

Re: KeyValue size in bytes compared to store files size

2014-01-15 Thread Stack
There can be a lot of duplication in what ends up in HFiles but 500MB - 32MB does seem too good to be true. Could you try writing without GZIP or mess with the hfile reader[1] to see what your keys look like when at rest in an HFile (and maybe save the decompressed hfile to compare sizes?)

Re: Is get a private case of scan ?

2014-01-15 Thread Stack
On Wed, Jan 15, 2014 at 5:34 AM, Amit Sela am...@infolinks.com wrote: Hi all, I was wondering if Get is implemented as a private case of scan ? In HRegion, I see that the get passed is used to construct a Scan object for the RegionScanner to use. A Get is a Scan, yes. I was wondering if

Unable to find region for hello_world,,99999999999999 after 10 tries.

2014-01-15 Thread Fernando Iwamoto - Plannej
Versions of Hadoop 2.2, Hbase 0.96.1, Pig 0.12 that I'm using. My problem is that whenever I run this script raw_data = LOAD 'sample_data.csv' USING PigStorage( ',' ) AS ( listing_id: chararray, fname: chararray, lname: chararray ); STORE raw_data INTO 'hbase://hello_world' USING

Your conference is coming up fast: HBaseCon2014 is May 5th in San Francisco

2014-01-15 Thread Stack
On May 5th in San Francisco, the third HBaseCon, HBaseCon2014, THE HBase Community Conference, will take place. You should ALL come! The call for papers is out now and closes February 14th, so don't dally. If you need help w/ a submission, ping your program committee for help; we are al up to

0.96 Replication to Elasticsearch

2014-01-15 Thread Pradeep Gollakota
Hi All, I have a use case where I need to replicate data from HBase into Elasticsearch. I've found two implementations of an HBase River that accomplishes this. One uses timestamps to do a timerange scan of the table (since last sync) and replicates data across. For many reasons this is not

Re: Unable to find region for hello_world,,99999999999999 after 10 tries.

2014-01-15 Thread Ted Yu
Can you check master log to see what happened to hello_world,,99 ? BTW you should be using HBase 0.96.1.1 release. Cheers On Wed, Jan 15, 2014 at 12:56 PM, Fernando Iwamoto - Plannej fernando.iwam...@plannej.com.br wrote: Versions of Hadoop 2.2, Hbase 0.96.1, Pig 0.12 that I'm

Re: 0.96 Replication to Elasticsearch

2014-01-15 Thread Otis Gospodnetic
Why not copy the approach (+ some code) from https://github.com/NGDATA/hbase-indexer ? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Jan 15, 2014 at 9:03 PM, Pradeep Gollakota pradeep...@gmail.comwrote: Hi All, I

org.apache.hadoop.hbase.KeyValue not found in gnu.gcj.runtime.SystemClassLoader

2014-01-15 Thread le...@agrant.cn
Hi, I write a simple java class to query the rowkey of a hase table. It compile successfully but meet errors when running it. Exception in thread main java.lang.NoClassDefFoundError: HBaseT at java.lang.Class.initializeClass(libgcj.so.10) Caused by: java.lang.ClassNotFoundException:

Re: org.apache.hadoop.hbase.KeyValue not found in gnu.gcj.runtime.SystemClassLoader

2014-01-15 Thread Ted Yu
Which version of HBase are you using ? Can you tell us how you launched your program ? Cheers On Wed, Jan 15, 2014 at 6:48 PM, le...@agrant.cn le...@agrant.cn wrote: Hi, I write a simple java class to query the rowkey of a hase table. It compile successfully but meet errors when running

Re: Re: org.apache.hadoop.hbase.KeyValue not found in gnu.gcj.runtime.SystemClassLoader

2014-01-15 Thread leiwang...@gmail.com
Got it. It is because i should use the full path of java cmd( on my site it is /usr/java/default/bin/java). If just us java, it is actually point to /usr/bin/java. Thanks, Lei leiwang...@gmail.com From: Ted Yu Date: 2014-01-16 13:32 To: user@hbase.apache.org Subject: Re:

How to shrink HBase region size?

2014-01-15 Thread Ramon Wang
Hi All I'm wondering is there a simple way we can decrease the number of regions for a table? We are using HBase 0.94.6-cdh4.4.0, one of our table has more than 100 regions, following is some information of the table: SPLIT_POLICY =

Re: How to shrink HBase region size?

2014-01-15 Thread Bharath Vissapragada
Your subject seems to be different from your actual question. If you want to reduce the number of regions, you can increase the hbase.hregion.max.filesize and do region merge (offline merge suggested ,hbase org.apache.hadoop.hbase.util.Merge tbl_name region_1 region_2). You can use compression to

Re: Is get a private case of scan ?

2014-01-15 Thread Amit Sela
I have a case where I want to split rows with a lot of qualifiers (a very small amount of rows 1%, with an exceptional number of qualifiers), into a number of rows. Say like: row1. row1_DELIMITER_UUID row1_DELIMITER_UUID2 row2 I was thinking of using a postGet() RegionObserver (the split rows

Re: How to shrink HBase region size?

2014-01-15 Thread Ramon Wang
org.apache.hadoop.hbase.util.Merge, is what i'm looking for, thanks Bharath. Cheers Ramon On Thu, Jan 16, 2014 at 3:10 PM, Bharath Vissapragada bhara...@cloudera.com wrote: Your subject seems to be different from your actual question. If you want to reduce the number of regions, you can