Re: Deployment architecture for Hadoop, HBase & Hive recommendations?

2010-08-03 Thread Jeff Hammerbacher
Hey Maxim, Very cool stuff, and J-D definitely hit the high notes.. For a cluster that's going to do real work, unless you're sold on AWS for all of your infrastructure, I'd definitely recommend real hardware from a vendor like Supermicro or moving to a "bare metal" cloud environment like SoftLaye

Re: Secondary Index versus Full Table Scan

2010-08-03 Thread Todd Lipcon
Hey Luke, A couple comments inline below: On Tue, Aug 3, 2010 at 8:40 AM, Luke Forehand < luke.foreh...@networkedinsights.com> wrote: > Thanks to the help of people on this mailing list and Cloudera, our team > has > managed to get our 3 data node cluster with HBase running like a top. Our > im

Re: Deployment architecture for Hadoop, HBase & Hive recommendations?

2010-08-03 Thread Jean-Daniel Cryans
Sorry took a day to answer, see inline. J-D On Mon, Aug 2, 2010 at 10:47 AM, Maxim Veksler wrote: > Hello, > > We're setting up a data warehouse environment that includes Hadoop, HBase, > Hive and our own in-house MR jobs. > I would like with your permission to discuss the architecture we should

Re: Question on changing block settings...

2010-08-03 Thread Jean-Daniel Cryans
Inline. J-D On Tue, Aug 3, 2010 at 2:44 PM, Vidhyashankar Venkataraman wrote: > A few issues I have been observing on changing block settings: > >  1.  What happens if we change the block size of a column family on an > already populated database? Will this not throw apps on db out of whack >

Re: redundancy testing

2010-08-03 Thread Jean-Daniel Cryans
On a small cluster like that I wouldn't bother giving 3 machines to zookeeper since your cluster is a reliable as your master node. Instead, make sure that your master has some redundant hardware and put a standalone zookeeper on it. J-D On Tue, Aug 3, 2010 at 3:41 PM, Justin Cohen wrote: > I ha

redundancy testing

2010-08-03 Thread Justin Cohen
I have an hbase setup with 1 master, 3 zookeepers and 10 region servers in distributed mode. What kind of stability should I expect? I can lose 1 zookeper, right? What happens if I lose 2? What if a region server goes down, or if the master goes down? thanks, -justin

Question on changing block settings...

2010-08-03 Thread Vidhyashankar Venkataraman
A few issues I have been observing on changing block settings: 1. What happens if we change the block size of a column family on an already populated database? Will this not throw apps on db out of whack because of compression and Hfile index which depend on block size? So, once the db is po

Re: Secondary Index versus Full Table Scan

2010-08-03 Thread Luke Forehand
Hegner, Travis writes: > > Going out on a limb, I think it will perform MUCH faster with multiple copies, as the data is already sitting > in each mappers memory, ready to be accessed locally. The time to process per mapper should be very > dramatically reduced. With that in mind, you only have

RE: Secondary Index versus Full Table Scan

2010-08-03 Thread Hegner, Travis
Going out on a limb, I think it will perform MUCH faster with multiple copies, as the data is already sitting in each mappers memory, ready to be accessed locally. The time to process per mapper should be very dramatically reduced. With that in mind, you only have to scale up as disk space requi

Re: Regionserver tanked, can't seem to get master back up fully

2010-08-03 Thread Jean-Daniel Cryans
We'll know for sure when we see those stack traces (both master and DNs). J-D On Tue, Aug 3, 2010 at 6:22 AM, Jamie Cockrill wrote: > Hi JD, > > The cluster is on a separated network, I'll see if any of the traces > remain. As for the ulimit and xceivers bit, those are setup correctly > as per t

Re: Secondary Index versus Full Table Scan

2010-08-03 Thread Luke Forehand
Edward Capriolo writes: > Generally speaking: If you are doing full range scans of a table > indexes will not help. Adding indexes will make the performance worse, > it will take longer to load your data and now fetching the data will > involve two lookups instead of one. > > If you are doing fu

Re: Secondary Index versus Full Table Scan

2010-08-03 Thread Edward Capriolo
On Tue, Aug 3, 2010 at 11:40 AM, Luke Forehand wrote: > Thanks to the help of people on this mailing list and Cloudera, our team has > managed to get our 3 data node cluster with HBase running like a top.  Our > import rate is now around 3 GB per job which takes about 10 minutes.  This is > great.

Secondary Index versus Full Table Scan

2010-08-03 Thread Luke Forehand
Thanks to the help of people on this mailing list and Cloudera, our team has managed to get our 3 data node cluster with HBase running like a top. Our import rate is now around 3 GB per job which takes about 10 minutes. This is great. Now we are trying to tackle reading. With our current setup,

Re: Regionserver tanked, can't seem to get master back up fully

2010-08-03 Thread Jamie Cockrill
PS, yes that was coming from master On 3 August 2010 14:22, Jamie Cockrill wrote: > Hi JD, > > The cluster is on a separated network, I'll see if any of the traces > remain. As for the ulimit and xceivers bit, those are setup correctly > as per the API doc you mention. > > Thanks > > Jamie > > On

Re: Regionserver tanked, can't seem to get master back up fully

2010-08-03 Thread Jamie Cockrill
Hi JD, The cluster is on a separated network, I'll see if any of the traces remain. As for the ulimit and xceivers bit, those are setup correctly as per the API doc you mention. Thanks Jamie On 2 August 2010 19:18, Jean-Daniel Cryans wrote: > Is that coming from the master? If so, it means tha

Re: [stargate] transaction?

2010-08-03 Thread Andrew Purtell
Thanks, confirmed. See https://issues.apache.org/jira/browse/HBASE-2897 and https://review.hbase.org/r/482/ - Andy > From: Sasha Maksimenko > Subject: Re: [stargate] transaction? > To: user@hbase.apache.org > Date: Monday, August 2, 2010, 2:32 AM > same error in hbase 0.20.6 > > On Mon,