Re: How to quickly count the rows that meet several conditions using hbase coprocessor

2014-01-17 Thread Ted Yu
Can you use other string for fake value ? DOESNOTEXIST is a bit long. Shouldn't be difficult to come up with a single character string that doesn't appear in the first two columns. Cheers On Jan 17, 2014, at 8:34 PM, "leiwang...@gmail.com" wrote: > Hi Lars, > > public class AggregationCoun

Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor

2014-01-17 Thread leiwang...@gmail.com
Hi Ted, I have tried. However, this will not return the expected result for 2 QualifierFilter. For example, String filterString = "QualifierFilter(=, 'binary:tags') AND QualifierFilter(=, 'binary:googleid')" return 0. Although there are rows that have both column 'tags' and column 'googl

Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor

2014-01-17 Thread leiwang...@gmail.com
Hi Lars, public class AggregationCountForMultiFilter { private static final byte[] TABLE_NAME = Bytes.toBytes("userdigest"); private static final byte[] CF = Bytes.toBytes("cf"); private static final byte[] FAKE_VLAUE = Bytes.toBytes("DOESNOTEXIST"); public static void main(String[] args) { Con

Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor

2014-01-17 Thread Ted Yu
Please take a look at TestParseFilter#testCompoundFilter2 You can construct compound filter which involves more than one QualifierFilter. Cheers On Fri, Jan 17, 2014 at 7:41 PM, leiwang...@gmail.com wrote: > Hi Ted, > > According to the initial email, I need two QualifierFilter and one > Sing

Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor

2014-01-17 Thread leiwang...@gmail.com
Hi Ted, According to the initial email, I need two QualifierFilter and one SingleColumnValueFilter. But apply 2 QualifierFilter on a scan will not work as described http://stackoverflow.com/questions/13379350/how-to-apply-several-qualifierfilter-to-a-row-in-hbase so i transfer them to Single

Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor

2014-01-17 Thread lars hofhansl
Offhand there is no reason for that. If you send some sample code that can seed the data and then run the filter that shows the problem, I'll offer to do some profiling. Which version of HBase are you using? -- Lars From: "leiwang...@gmail.com" To: user Cc

Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor

2014-01-17 Thread Ted Yu
Copying the last reply from the link you gave: @Henrik I don't know how much data you have but I'm afraid you are right. Another option would be to implement a custom filter which takes the qualifier list you are looking for It was acknowledged that the proposed transformation may not give the be

Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor

2014-01-17 Thread leiwang...@gmail.com
Hi Ted, Haven't tried the performance without using coprocessor. Actually I transfered two 2 QualifierFilter to 2 SingleColumnValueFilter according to the describtion http://stackoverflow.com/questions/13379350/how-to-apply-several-qualifierfilter-to-a-row-in-hbase and then pass the scan to

Re: Balancer with replica affinity

2014-01-17 Thread Ted Yu
Please see this thread: http://search-hadoop.com/m/ZXnXG1tQu4s/Rolling+restart+retain+hbase&subj=Rolling+Restart+and+Load+Balancer+ Note the --reload option. Cheers On Jan 17, 2014, at 5:22 PM, Sagar Naik wrote: > Hi, > I am doing some perf testing using various filters. > I was wondering tha

Re: How to quickly count the rows that meet several conditions using hbase coprocessor

2014-01-17 Thread Ted Yu
Do you see the same slowness scanning regions with FilterList outside coprocessor ? Thanks On Jan 17, 2014, at 5:24 PM, "leiwang...@gmail.com" wrote: > Hi, > > I have tried. > For a talbe with about 600 million rowkey, just pass a single > QualifierFilter, it can get the result quickly

Re: Re: How to quickly count the rows that meet several conditions using hbase coprocessor

2014-01-17 Thread leiwang...@gmail.com
Hi, I have tried. For a talbe with about 600 million rowkey, just pass a single QualifierFilter, it can get the result quickly. But when i add the SingleColumnValueFilter with FilterList, it becoumes very slow and i can't stand it. I think i can write my own custumed aggregation client.

Balancer with replica affinity

2014-01-17 Thread Sagar Naik
Hi, I am doing some perf testing using various filters. I was wondering that when I do rolling-restart, I might lose the datanode-regionserver block affinity. Is this a valid concern ? If so is there a balancer which will take into account the hdfs-block replica location and position the rs over

Re: KeyValue size in bytes compared to store files size

2014-01-17 Thread lars hofhansl
Somewhat unrelated, but you might benefit from block encoding in addition to compression in your case. Try to set DATA_BLOCK_ENCODING to FAST_DIFF in your column families. -- Lars - Original Message - From: Amit Sela To: user@hbase.apache.org Cc: Sent: Thursday, January 16, 2014 1:00

Re: HBase Standalone Error

2014-01-17 Thread Stack
On Tue, Jan 14, 2014 at 9:18 PM, Jeff Zhang wrote: > Hi Stack, > > here's the step I use, > > 1. Download hbase 0.96 and untar it > 2. Edit the hbase-site.xml as following > > > hbase.rootdir > /var/hbase > > > hbase.zookeeper.property.dataDir >

Re: KeyValue size in bytes compared to store files size

2014-01-17 Thread Stack
On Thu, Jan 16, 2014 at 1:00 AM, Amit Sela wrote: > ... > > Could such a compression ratio make sense in case of many qualifiers per > row in a table (avg is 16 but in practice there are some rows with much > more and even a small number of rows with hundreds of thousands...) ? If > each KeyValue

Re: Suggest that turn the msg "Request is a replay (34) - PROCESS_TGS" from logging level from ERROR to WARN

2014-01-17 Thread Stack
Lets change it in both places. Please file issues. Lets try minimize the freakout incidents running your hbase/hadoop cluster. Thanks Takeshi, St.Ack On Thu, Jan 16, 2014 at 9:57 PM, takeshi wrote: > Hi All, > > Recently we got the error msg "Request is a replay (34) - PROCESS_TGS" > while we

Re: How to remove duplicate data in HBase?

2014-01-17 Thread Michael Segel
First, you should define what you mean when you say duplicate data. Depending on your definition… it may already be handled. On Jan 17, 2014, at 7:39 AM, Ted Yu wrote: > Can you tell us where the duplicate data resides - between column families or > between columns in a single column family ?

Re: Region server logs not getting updated..

2014-01-17 Thread Ted Yu
On the machine where region server runs, you can use ps command to find the command line associated with region server process. The log directory would be shown there. Cheers On Jan 17, 2014, at 4:00 AM, Sandeep B A wrote: > Hi All > Currently im facing the issue where region server not

Re: How to remove duplicate data in HBase?

2014-01-17 Thread Ted Yu
Can you tell us where the duplicate data resides - between column families or between columns in a single column family ? Cheers On Jan 17, 2014, at 4:46 AM, oc tsdb wrote: > Hi all, > > We want to know if there is any option to remove duplicate data in Hbase > based on column family dynamica

How to remove duplicate data in HBase?

2014-01-17 Thread oc tsdb
Hi all, We want to know if there is any option to remove duplicate data in Hbase based on column family dynamically? Thanks, OC

Re: Fine tunning

2014-01-17 Thread Michael Segel
In your mapreduce, in the mapper.setup() read the lookup tables in to memory and then access them as needed in your Mapper.map() method. Same for reducer... See mapper joins in Map/Reduce ... On Jan 6, 2014, at 3:23 AM, Ranjini Rathinam wrote: > Hi, > > I have a input File of 16 fields in

Region server logs not getting updated..

2014-01-17 Thread Sandeep B A
Hi All Currently im facing the issue where region server not updating the logs. I don't know where to check nor how to proceed further. I'm using cloudera manager to manage hbase cluster.. Current hbase version in use 0.94.6-CDH4.5.0 If i check from master status page, All the region servers

Re: Fast scan with PrefixFilter?

2014-01-17 Thread Asaf Mesika
Your start rowkey is the prefix it self. Your end row key is the prefix +1. Increase the last byte by 1. I can send you the increase function if you need. No filter needed. On Wednesday, January 15, 2014, Ramon Wang wrote: > Hi Folks > > We have a table with fixed pattern row key design, the for

Re: How to quickly count the rows that meet several conditions using hbase coprocessor

2014-01-17 Thread Ted Yu
Take a look at http://hbase.apache.org/0.94/apidocs/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.html#rowCount(byte[],%20org.apache.hadoop.hbase.coprocessor.ColumnInterpreter,%20org.apache.hadoop.hbase.client.Scan) You can pass custom filter through Scan parameter. Cheers On Ja

How to quickly count the rows that meet several conditions using hbase coprocessor

2014-01-17 Thread leiwang...@gmail.com
Hi, I know that hbase copocessor provides a quick way to count the rows of a table. But how can i count the rows that meet several conditions. Take this for example. I have a hbase table with one column family, several columns. I want to caculate the number of rows that meet 3 conditions: has c