Re: hbase master retries to RS/DN

2011-05-19 Thread Jack Levin
Thanks, now with setting that value to "2", we still get slow DN death master recovery of logs: 2011-05-19 23:34:55,109 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #3 from primary datanode 10.103.7.21:50010 java.net.ConnectException: Call to /10.103.7.21:50020 failed on connecti

Re: Storing XML object in Hbase

2011-05-19 Thread Stack
Yes. Render your xml snippet as a byte array and store that. St.Ack On Thu, May 19, 2011 at 9:28 PM, James Ram wrote: > Hi, > > I am a newbie to Hbase. Is it possible to store an XML object or XML File > directly in hbase? > > -- > With Regards, > Jr. >

Storing XML object in Hbase

2011-05-19 Thread James Ram
Hi, I am a newbie to Hbase. Is it possible to store an XML object or XML File directly in hbase? -- With Regards, Jr.

Re: How to speedup Hbase query throughput

2011-05-19 Thread Weihua JIANG
Sorry for missing the background. We assume user is more interested in his latest bills than his old bills. Thus, the query generator is worked as below: 1. randomly generate a number and reverse it as user id. 2. randomly generate a prioritied month based on the above assumpation. 3. ask HBase to

Re: Hbase 2077 status?

2011-05-19 Thread Jean-Daniel Cryans
It is, but I'm trying to give you solutions until we get that fixed. In any case, you will hit your task timeout at some point anyway. J-D On Thu, May 19, 2011 at 4:09 PM, Vidhyashankar Venkataraman wrote: >>> maybe you could bump the timeouts high enough so that you don't >>> hit the issue at a

Re: Hbase 2077 status?

2011-05-19 Thread Himanshu Vashishtha
Hey, I am also facing the similar issue but in a different context (related to coprocessor), but that's a totally different use case (still need to look in to it). But in this case, will reducing the scanner cache size helps (at least temporarily)? In a case when scanner is busy collecting/computi

Re: Hbase 2077 status?

2011-05-19 Thread Vidhyashankar Venkataraman
>> maybe you could bump the timeouts high enough so that you don't >> hit the issue at all? Don't you think setting a high timeout might be a little ad hoc? This might just work except that it could lead to a really long delay during cases when there should be a timeout. Also we have non-homogen

Re: GC and High CPU

2011-05-19 Thread Jack Levin
so far its going well, no more crazy GC, while load remains the same. 2011-05-18T11:01:53.149-0700: 52288.388: [CMS-concurrent-preclean: 5.764/216.720 secs] [Times: user=103.71 sys=15.85, real=216.68 secs] 2011-05-18T11:01:53.149-0700: 52288.388: [CMS-concurrent-abortable-preclean-start] 2011-05-

Re: Hbase 2077 status?

2011-05-19 Thread Jean-Daniel Cryans
The latest patch would need some more work, I did more than what's really required. If you are really taking more than a minute to do a single next() call, maybe you could bump the timeouts high enough so that you don't hit the issue at all? The default is pretty arbitrary. J-D On Thu, May 19, 2

Hbase 2077 status?

2011-05-19 Thread Vidhyashankar Venkataraman
I had spoken a while back about this problem (clients timing out when scanners do not return with a row yet: search for "A possible bug in the scanner. " I am trying to fix the problem in the next few days: our system is a little crippled without the fix (We use filters in scans and the bug crop

Re: Performance degrades on moving from desktop to blade environment

2011-05-19 Thread tsuna
On Thu, May 19, 2011 at 11:50 AM, Jack Levin wrote: > Himanish, it hard to say without trend graphs.  Setup ganglia and get > fsreadlatancy, as well as thread count graphs to see what the issue > might be. You might wanna setup OpenTSDB instead of Ganglia, it would give more fine grained details

ANN: HBase 0.90.3 available for download

2011-05-19 Thread Stack
The Apache HBase team is happy to announce that HBase 0.90.3 is available from the Apache mirror of choice: http://www.apache.org/dyn/closer.cgi/hbase/ HBase 0.90.3 is a maintenance release that fixes several important bugs since version 0.90.3, while retaining API and data compatibility. The r

Re: hbase master retries to RS/DN

2011-05-19 Thread Jean-Daniel Cryans
The config and the retries you pasted are unrelated. The former controls the number of retries when regions are moving and the client must query .META. or -ROOT- The latter is the Hadoop RPC client timeout and looking at the code the config is ipc.client.connect.max.retries from https://github.co

Re: IO Error when using multiple HBaseStorage in PIG

2011-05-19 Thread Jean-Daniel Cryans
Your attachement didn't make it, it rarely does on the mailing lists. I suggest you use a gist.github or a pastebin. Regarding the error, looks like something closed the HCM and someone else is trying to use it. Since this is client side, it would point to a Pig problem. J-D On Thu, May 19, 2011

Re: How to speedup Hbase query throughput

2011-05-19 Thread Matt Corgan
I think i traced this to a bug in my compaction scheduler that would have missed scheduling about half the regions, hence the 240gb vs 480gb. To confirm: major compaction will always run when asked, even if the region is already major compacted, the table settings haven't changed, and it was last

IO Error when using multiple HBaseStorage in PIG

2011-05-19 Thread Keric Donnelly
To All, I'm running into IO issues when trying to write to an Hbase table using multiple STORE commands in PIG script. I can comment out any 2 of the STORE statements and run the script and then the data inserts fine. If I try and run with all 3 get the following: java.io.IOException: org.apache

Re: major compaction best practice

2011-05-19 Thread Lars George
You can also check the compactionQueue on all RegionServers through the metrics or JMX. On Thu, May 19, 2011 at 5:01 PM, Stack wrote: > On Thu, May 19, 2011 at 6:47 AM, Oleg Ruchovets wrote: >> --What is the way to see how the major compaction process is executing (log >> files or something else

Re: Max Table Count

2011-05-19 Thread Jean-Daniel Cryans
Yes, you will be wasting some IO, this is a well known bug in HBase, but it's not because empty families would be flushes. In HBase, usually if something is empty it means it doesn't exist (that's why sparse columns are free). Now if you insert in 4 families in different rows but all in the same re

Re: Cannot Run HBase + Hadoop on a single Node Cluster - Hdfs Problem

2011-05-19 Thread Jean-Daniel Cryans
The master log doesn't contain anything special, if anything weird happened it would have been right after what you pasted. Regarding the versions you are using, I'm pretty sure that HBase 0.20.6 can't work with Hadoop 0.20.203.0 because of the security-related code. Don't be confused by the "0.20

Re: Performance degrades on moving from desktop to blade environment

2011-05-19 Thread Jack Levin
Himanish, it hard to say without trend graphs. Setup ganglia and get fsreadlatancy, as well as thread count graphs to see what the issue might be. -Jack On Thu, May 19, 2011 at 11:46 AM, Himanish Kushary wrote: > Hi, > > Could anybody suggest what may be the issue. I ran YCSB on both the > deve

hbase master retries to RS/DN

2011-05-19 Thread Jack Levin
Hello, we have a situation when when RS/DN crashes hard, master is very slow to recover, we notice that it waits on these log lines: 2011-05-19 11:20:57,766 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.103.7.22:50020. Already tried 0 time(s). 2011-05-19 11:20:58,767 INFO org.a

Re: Performance degrades on moving from desktop to blade environment

2011-05-19 Thread Himanish Kushary
Hi, Could anybody suggest what may be the issue. I ran YCSB on both the development and production servers. The loading of data performs better on the production cluster but the 50% read-50% write workloada performs better on the development.The average latency for read shoots up to 30-40 ms on p

Re: HBase API HRegionServer

2011-05-19 Thread Stack
On Thu, May 19, 2011 at 10:50 AM, Miguel Costa wrote: > Is it possible to start a region server from the HBase aPI if the region > server is down? > You'd need to have a running JVM to do the new HRegionServer. Do you have such a thing. Has something started the VM for you? St.Ack

HBase API HRegionServer

2011-05-19 Thread Miguel Costa
Hi, Is it possible to start a region server from the HBase aPI if the region server is down? Can you give me some example please? I want to do something like this: Configuration config = HBaseConfiguration.create(); config.addResource(new Path(ZooKeeperPath)); HRegionServer sw = new

Re: Cannot Run HBase + Hadoop on a single Node Cluster - Hdfs Problem

2011-05-19 Thread Joey Echeverria
It looks like your region server isn't connected to zookeeper. Can you find a line in your region server log that looks like this: INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server localhost/127.0.0.1:2181 -Joey On Thu, May 19, 2011 at 9:18 AM, Florent G. wrote: > > Hi, > I

Cannot Run HBase + Hadoop on a single Node Cluster - Hdfs Problem

2011-05-19 Thread Florent G.
Hi, I am trying to run Haddop (hadoop-0.20.203.0) with Hbase (hbase-0.20.6) on a single node cluster in localhost. Hadoop start is ok without any errors but when i start Hbase : Master is running but cannot see any .ROOT. Table. I think it is maybe a problem on the hdfs settings from any of the

Re: Port 0 being used when calling HBaseTestingUtility().startMiniCluster(1) on AWS

2011-05-19 Thread Ian Stevens
It doesn't look like a mislog. Here's the root cause that my test code was obscuring: 2011-05-19 15:20:09,194 DEBUG [main] regionserver.HLog(547): closing hlog writer in hdfs://localhost:37249/user/root/.META./1028785192/.logs Traceback (most recent call last): File "", line 1, in at

Re: How to speedup Hbase query throughput

2011-05-19 Thread Joey Echeverria
I'm surprised the major compactions didn't balance the cluster better. I wonder if you've stumbled upon a bug in HBase that's causing it to leak old HFiles. Is the total amount of data in HDFS what you expect? -Joey On Thu, May 19, 2011 at 8:35 AM, Matt Corgan wrote: > that's right > > > On Thu

Re: How to speedup Hbase query throughput

2011-05-19 Thread Matt Corgan
that's right On Thu, May 19, 2011 at 8:23 AM, Joey Echeverria wrote: > Am I right to assume that all of your data is in HBase, ie you don't > keep anything in just HDFS files? > > -Joey > > On Thu, May 19, 2011 at 8:15 AM, Matt Corgan wrote: > > I wanted to do some more investigation before po

Re: Max Table Count

2011-05-19 Thread Wayne
How about Column Families? We have 4 column families per table due to different settings (versions etc.). They are sparse in that a given row will only ever write to a single CF and even regions usually have only 1 CF's data/store file except at the border between row key naming conventions (each C

Re: How to speedup Hbase query throughput

2011-05-19 Thread Joey Echeverria
Am I right to assume that all of your data is in HBase, ie you don't keep anything in just HDFS files? -Joey On Thu, May 19, 2011 at 8:15 AM, Matt Corgan wrote: > I wanted to do some more investigation before posting to the list, but it > seems relevant to this conversation... > > Is it possible

Re: How to speedup Hbase query throughput

2011-05-19 Thread Matt Corgan
I wanted to do some more investigation before posting to the list, but it seems relevant to this conversation... Is it possible that major compactions don't always localize the data blocks? Our cluster had a bunch of regions full of historical analytics data that were already major compacted, the

Re: major compaction best practice

2011-05-19 Thread Stack
On Thu, May 19, 2011 at 6:47 AM, Oleg Ruchovets wrote: > --What is the way to see how the major compaction process is executing (log > files or something else ) > Curently yes, the only way to see state of the compaction is by viewing logs (I added HBASE-3900 to expose it UI and shell). St.Ack

Re: How to speedup Hbase query throughput

2011-05-19 Thread Michel Segel
I had asked the question about how he created random keys... Hadn't seen a response. Sent from a remote device. Please excuse any typos... Mike Segel On May 18, 2011, at 11:27 PM, Stack wrote: > On Wed, May 18, 2011 at 5:11 PM, Weihua JIANG wrote: >> All the DNs almost have the same number o

Re: major compaction best practice

2011-05-19 Thread Oleg Ruchovets
Hi , I turn off major compaction hbase.hregion.majorcompaction *0* The time (in miliseconds) between 'major' compactions of all HStoreFiles in a region. Default: 1 day. and run from hbase shell hbase(main):004:0> major_compact 'MYTABLE' 0 row(s) in 0.1760 seconds --

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-05-19 Thread Weishung Chung
Awesome, I'm going to check it out and use it today. Thank you :) On Thu, May 19, 2011 at 8:14 AM, Alex Baranau wrote: > Implemented RowKeyDistributorByHashPrefix. From README: > > Another useful RowKeyDistributor is RowKeyDistributorByHashPrefix. Please > see > example below. It creates "distrib

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-05-19 Thread Alex Baranau
Implemented RowKeyDistributorByHashPrefix. From README: Another useful RowKeyDistributor is RowKeyDistributorByHashPrefix. Please see example below. It creates "distributed key" based on original key value so that later when you have original key and want to update the record you can calculate dis

RE: Mapping "Object-HBase data" Framework!

2011-05-19 Thread Vivek Mishra
Hi, Kundera is an open source and currently supporting ORM over CASSANDRA, Hbase, MongoDB. Support for REDIS will be there in future. https://github.com/impetus-opensource/Kundera Blogs for reference are: http://xamry.wordpress.com/2011/05/02/working-with-mongodb-using-kundera/ http://mevivs

RE: HFiles created by MR Jobs and HBase Performance

2011-05-19 Thread Panayotis Antonopoulos
Thank you for your help! I hadn't understood the use of the TotalOrderPartitioner correctly. > Date: Tue, 17 May 2011 09:14:37 -0500 > Subject: Re: HFiles created by MR Jobs and HBase Performance > From: c...@email.com > To: user@hbase.apache.org > > If I understand hbase bulk loading correctly

Re: Regions count is not consistant between the WEBUI and LoaderBalancer

2011-05-19 Thread bijieshan
I didn't run the hbck to check the system. The environment has been recovered now, so I need to recreate the problem,and run then run hbck, maybe it could give some helpful information. Thanks! Jieshan Bean -邮件原件- 发件人: jdcry...@gmail.com [mailto:jdcry...@gmail.com] 代表 Jean-Daniel Cryans