hbase-master-server slept

2013-02-08 Thread So Hibino
Our hbase-master-server was shutdown with following message. Hbase is runnig in Distributed mode in a single node. I checked that GC completed in a very short time at the time of output the WARN. In addition the other system that is running in the same architecture doesn't output the following

Re: Regions in transition

2013-02-08 Thread kiran
after some searching i found that main reason for regions with same keys is splitting and it is the culprit in our case for inconsistency. But I set my filesize to a very large size but i am unsure why still splitting is happening On Fri, Feb 8, 2013 at 1:20 PM, kiran kiran.sarvabho...@gmail.com

Re: Meetup in March in San Francisco. Any preference for 3/12 or 3/13 or 3/14?

2013-02-08 Thread Jean-Marc Spaggiari
If anyone has some spare plane tickets, feel free to send me one ;) JM 2013/2/8, Stack st...@duboce.net: I put up the meetup announcement here http://www.meetup.com/hbaseusergroup/events/103587852/ I'll push it out soon. If anyone wants to talk about their hbase usage or some aspect of

Re: How to use put command in Java for dynamic field name creation

2013-02-08 Thread Jean-Marc Spaggiari
Hi Rams, I think I understand the way you want to build your table, but what I'm not sure is, what's the issue? Are you asking how to write java code? If so, you put code should look like that: byte[] tableName = Bytes.toBytes(branch); byte[] row = Bytes.toBytes(1); byte[] columnFamilly =

Re: restrict clients

2013-02-08 Thread Stas Maksimov
Hi Rita, As far as I know ACL is on a user basis. Here's a link for you: http://hbase.apache.org/book/hbase.accesscontrol.configuration.html Thanks, Stas On 8 February 2013 15:20, Rita rmorgan...@gmail.com wrote: Hi, In an enterprise deployment, how can I restrict who can access the data?

Re: Acceptable CPU_WIO % ?

2013-02-08 Thread Jean-Marc Spaggiari
Hi Kevin, I think it will take time before I get a chance to have 5 drives in the same server, so I will see at that time to test RAID5. I'm going to add one drive per server today or tomorrow to try to improve that. What IOPs should I try to have? 100? Less? It will all be SATA3 drives and I

Re: column count guidelines

2013-02-08 Thread Asaf Mesika
Can you elaborate more on that features? I thought 4 was just for bug fixes. Sent from my iPhone On 8 בפבר 2013, at 02:34, Ted Yu yuzhih...@gmail.com wrote: How many column families are involved ? Have you considered upgrading to 0.94.4 where you would be able to benefit from lazy seek, Data

Re: Acceptable CPU_WIO % ?

2013-02-08 Thread Kevin O'dell
JM, Basically, you will have to replace failed disk and rebuild RAID0 since the other half of the data is worthless. There is not a real recommended value, but anything under 150 - 200 would make me more comfortable. On Fri, Feb 8, 2013 at 10:43 AM, Jean-Marc Spaggiari

Re: column count guidelines

2013-02-08 Thread Dave Wang
Mike, CDH4.2 will be out shortly, will be based on HBase 0.94, and will include both of the features that Ted mentioned and more. - Dave On Thu, Feb 7, 2013 at 8:34 PM, Michael Ellery mell...@opendns.com wrote: thanks for reminding me of the HBASE version in CDH4 - that's something we'll

Re: Compressing data sent from HBase client

2013-02-08 Thread Ted Yu
I think the following JIRA is related to what you ask: HBASE-6966 Compressed RPCs for HBase (HBASE-5355) port to trunk Before that gets integrated, you can manage compression yourself. prePut() is given reference to the Put: void prePut(final ObserverContextRegionCoprocessorEnvironment c,

Re: hbase-master-server slept

2013-02-08 Thread Jean-Daniel Cryans
On Fri, Feb 8, 2013 at 1:26 AM, Marcos Ortiz mlor...@uci.cu wrote: Regards, So, Can you provide more information about your setup? - HBase version - Hadoop version - Operating System - Java version This would be helpful. I would also like to see that GC log please. Did you check the

Re: column count guidelines

2013-02-08 Thread Ted Yu
The reason I mentioned 0.94.4 was that it is the most recent 0.94 release. For the features, you can refer to the following JIRAs: HBASE-4465 Lazy-seek optimization for StoreFile scanners HBASE-4218 Data Block Encoding of KeyValues (aka delta encoding / prefix compression) Cheers On Fri, Feb

Re: Getting RetriesExhaustedException while getting rows

2013-02-08 Thread Vidosh Sahu
Hi Ted, Thanks for the response. Hbase version - *0.90.5* * * Here is the RS log - ## 2013-02-08 23:15:23,457 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@80ecfd) from

Re: Getting RetriesExhaustedException while getting rows

2013-02-08 Thread Ted Yu
0.90.5 is so old. Can you upgrade (0.94.4) ? What did the log from 117.196.234.171 http://117.196.234.171:60293/ look like ? Thanks On Fri, Feb 8, 2013 at 9:52 AM, Vidosh Sahu vid...@girnarsoft.com wrote: Hi Ted, Thanks for the response. Hbase version - *0.90.5* * * Here is the RS log

split table data into two or more tables

2013-02-08 Thread alxsss
Hello, I wondered if there is a way of splitting data from one table into two or more tables in hbase with iidentical schemas, i.e. if table A has 100M records put 50M into table B, 50M into table C and delete table A. Currently, I use hbase-0.92.1 and hadoop-1.4.0 Thanks. Alex.

Re: split table data into two or more tables

2013-02-08 Thread Ted Yu
May I ask the rationale behind this ? Were you aiming for higher write throughput ? Please also tell us how many regions you have in the current table. Thanks BTW please consider upgrading to 0.94.4 On Fri, Feb 8, 2013 at 10:36 AM, alx...@aim.com wrote: Hello, I wondered if there is a way

Re: split table data into two or more tables

2013-02-08 Thread Kevin O'dell
Alex, Your best bet would be to do this through either MapReduce or Happybase(python). There is not an innate way to handle that through the shell. On Fri, Feb 8, 2013 at 1:36 PM, alx...@aim.com wrote: Hello, I wondered if there is a way of splitting data from one table into two or more

Re: split table data into two or more tables

2013-02-08 Thread Ted Yu
BTW I think hadoop-1.4.0 was a typo: it should be 1.0.4 On Fri, Feb 8, 2013 at 10:40 AM, Ted Yu yuzhih...@gmail.com wrote: May I ask the rationale behind this ? Were you aiming for higher write throughput ? Please also tell us how many regions you have in the current table. Thanks BTW

Re: split table data into two or more tables

2013-02-08 Thread alxsss
Hi, The rationale is that I have a mapred job that adds new records to an hbase table, constantly. The next mapred job selects these new records, but it must iterate over all records and check if it is a candidate for selection. Since there are too many old records iterating though them in a

Re: split table data into two or more tables

2013-02-08 Thread Ted Yu
bq. in a cluster of 2 nodes +1 master I assume you're limited by hardware in the regard. bq. job selects these new records Have you used time-range scan ? Cheers On Fri, Feb 8, 2013 at 10:59 AM, alx...@aim.com wrote: Hi, The rationale is that I have a mapred job that adds new records to an

Re: split table data into two or more tables

2013-02-08 Thread Ted Yu
In 0.94, there is optimization in StoreFileScanner.requestSeek() where a real seek is only done when seekTimestamp maxTimestampInFile. I suggest upgrading to 0.94.4 so that you can utilize this facility. On Fri, Feb 8, 2013 at 11:04 AM, Ted Yu yuzhih...@gmail.com wrote: bq. in a cluster of 2

Re: split table data into two or more tables

2013-02-08 Thread Marcos Ortiz
On 02/08/2013 01:59 PM, alx...@aim.com wrote: Hi, The rationale is that I have a mapred job that adds new records to an hbase table, constantly. The next mapred job selects these new records, but it must iterate over all records and check if it is a candidate for selection. Since there are

Open Files Limits

2013-02-08 Thread Marco Gallotta
Hey guys I'm running hbase on Ubuntu, and I'm experiencing problems with too many open files. I've got the following in limits.conf: *-nofile 5 *-nproc 5 And added session required pam_limits.so to /etc/pam.d/common-session .

Re: Open Files Limits

2013-02-08 Thread Marcos Ortiz
On 02/08/2013 03:16 PM, Marco Gallotta wrote: Hey guys I'm running hbase on Ubuntu, and I'm experiencing problems with too many open files. I've got the following in limits.conf: *-nofile 5 *-nproc 5 I think that the correct

Re: Open Files Limits

2013-02-08 Thread Marco Gallotta
On Friday 08 February 2013 at 7:23 AM, Marcos Ortiz wrote: On 02/08/2013 03:16 PM, Marco Gallotta wrote: Hey guys I'm running hbase on Ubuntu, and I'm experiencing problems with too many open files. I've got the following in limits.conf: * - nofile 5 * - nproc 5I think that the

Re: Open Files Limits

2013-02-08 Thread Marco Gallotta
On Friday 08 February 2013 at 12:27 PM, Marco Gallotta wrote: On Friday 08 February 2013 at 7:23 AM, Marcos Ortiz wrote: On 02/08/2013 03:16 PM, Marco Gallotta wrote: Hey guys I'm running hbase on Ubuntu, and I'm experiencing problems with too many open files. I've got the following in

Re: Getting RetriesExhaustedException while getting rows

2013-02-08 Thread Vidosh Sahu
Thanks Ted. Upgrade to your suggested version with minor hbase-site.xml modifications make it worked. Thanks, Vidosh On Fri, Feb 8, 2013 at 11:30 PM, Ted Yu yuzhih...@gmail.com wrote: 0.90.5 is so old. Can you upgrade (0.94.4) ? What did the log from 117.196.234.171

Re: split table data into two or more tables

2013-02-08 Thread alxsss
Hi, here is the hbase-site.xml file. property namehbase.hregion.majorcompaction/name value0/value /property property namehbase.regionserver.codecs/name valuesnappy,gz/value /property property namehbase.rootdir/name valuehdfs://master:9000/hbase/value /property property

Re: Get on a row with multiple columns

2013-02-08 Thread Varun Sharma
+user On Fri, Feb 8, 2013 at 5:38 PM, Varun Sharma va...@pinterest.com wrote: Hi, When I do a Get on a row with multiple column qualifiers. Do we sort the column qualifers and make use of the sorted order when we get the results ? Thanks Varun

Re: split table data into two or more tables

2013-02-08 Thread alxsss
Hi, Thanks for suggestions. How a time range scan can be implemented in java code. Is there any sample code or tutorials? Also, is it possible to select by a value of a column? Let say I know that records has family f and column m, and new records has m=5. I need to instruct hbase to send only

independent scans to same region processed serially

2013-02-08 Thread James Taylor
Wanted to check with folks and see if they've seen an issue around this before digging in deeper. I'm on 0.94.2. If I execute in parallel multiple scans to different parts of the same region, they appear to be processed serially. It's actually faster from the client side to execute a single

Re: Get on a row with multiple columns

2013-02-08 Thread lars hofhansl
Everything is stored as a KeyValue in HBase. The Key part of a KeyValue contains the row key, column family, column name, and timestamp in that order. Each column family has it's own store and store files. So in a nutshell a get is executed by starting a scan at the row key (which is a prefix

Re: independent scans to same region processed serially

2013-02-08 Thread James Taylor
All data is the blockcache and there are plenty of handlers. To repro, you could: - create a table pre-split into, for example, three regions - execute serially a scan on the middle region - execute two parallel scans each on half of the middle region - you'd expect the parallel scan to execute

Re: Get on a row with multiple columns

2013-02-08 Thread Ted Yu
Which HBase version are you using ? Is there a way to place 10 delete markers from application side instead of 300 ? Thanks On Fri, Feb 8, 2013 at 10:05 PM, Varun Sharma va...@pinterest.com wrote: We are given a set of 300 columns to delete. I tested two cases: 1) deleteColumns() - with the

Re: Get on a row with multiple columns

2013-02-08 Thread Varun Sharma
Using hbase 0.94.3. Tried that too, ran into performance issues with having to retrieve the entire row first (this was getting slow when one particular row is hammered) since row can be big (few megs, some times 10s of megs) and then finding the columns and then doing a delete. To me, it looks

Re: Get on a row with multiple columns

2013-02-08 Thread Ted
How often do you need to perform such delete operation ? Is there way to utilize ttl so that you can avoid deletions ? Pardon me for not knowing your use case very well. On Feb 8, 2013, at 10:16 PM, Varun Sharma va...@pinterest.com wrote: Using hbase 0.94.3. Tried that too, ran into

Re: Get on a row with multiple columns

2013-02-08 Thread lars hofhansl
Can you organize your columns and then delete by column family? deleteColumn without specifying a TS is expensive, since HBase first has to figure out what the latest TS is. Should be better in 0.94.1 or later since deletes are batched like Puts (still need to retrieve the latest version,

Re: Get on a row with multiple columns

2013-02-08 Thread lars hofhansl
You could use the KeyOnly filter to only retrieve the key part of the KVs. From: Varun Sharma va...@pinterest.com To: user@hbase.apache.org Sent: Friday, February 8, 2013 10:16 PM Subject: Re: Get on a row with multiple columns Using hbase 0.94.3. Tried that

Re: Get on a row with multiple columns

2013-02-08 Thread Varun Sharma
The use case is like your twitter feed. Tweets from people u follow. When someone unfollows, you need to delete a bunch of his tweets from the following feed. So, its frequent, and we are essentially running into some extreme corner cases like the one above. We need high write throughput for this,

Re: Get on a row with multiple columns

2013-02-08 Thread lars hofhansl
Sorry.. I meant set these two config parameters to true (not false as I state below). - Original Message - From: lars hofhansl la...@apache.org To: user@hbase.apache.org user@hbase.apache.org Cc: Sent: Friday, February 8, 2013 11:41 PM Subject: Re: Get on a row with multiple columns