Re: Confusing the retrieve result with a given timestamp

2010-11-19 Thread Lars George
Have a read here: http://outerthought.org/blog/417-ot.html Especially: "One interesting option that is missing is the ability to retrieve the latest version less than or equal to a given timestamp, thus giving the 'latest' state of the record at a certain point in time. Update: this is (obviously

Re: Ganglia website refuses connection despite proxy (Hbase EC2)

2010-11-19 Thread Lars George
Yeah, this will be superseded by WHIRR-25 over the next month or two. The "root" name was simply a choice, no reason not to change it. As for Ganglia, do you see the Ganglia daemon run on each node? If not, please have a look into the logs on the servers, the user scripts usually log their process

Why did Facebook prefer to HBase than Cassandra?

2010-11-19 Thread MauMau
Hello, (especially Mr. Jonathan Gray, Facebook folks), I'm sorry for mentioning particular people in a public ML. I saw the following note from Facebook that says Facebook chose HBase, not Cassandra, as the storage for the next messaging infrastructure. http://www.facebook.com/notes/facebook-

map task performance degradation - any idea why?

2010-11-19 Thread Henning Blohm
We have a Hadoop 0.20.2 + Hbase 0.20.6 setup with three data nodes (12GB, 1.5TB each) and one master node (24GB, 1.5TB). We store a relatively simple table in HBase (1 column familiy, 5 columns, rowkey about 100chars). In order to better understand the load behavior, I wanted to put 5*10^8 rows

Re: Confusing the retrieve result with a given timestamp

2010-11-19 Thread Pan W
Hi, Lars It's very nice of you to show the helpful blog to me. Now I see :-) -- Pan W

Re: map task performance degradation - any idea why?

2010-11-19 Thread Lars George
Hi Henning, Could you look at the Master UI while doing the import? The issue with a cold bulk import is that you are hitting one region server initially, and while it is filling up its in-memory structures all is nice and dandy. Then ou start to tax the server as it has to flush data out and it b

Re: map task performance degradation - any idea why?

2010-11-19 Thread Henning Blohm
Hi Lars, thanks. Yes, this is just the first test setup. Eventually the data load will be significantly higher. At the moment (looking at the master after the run) the number of regions is well-distributed (684,685,685 regions). The overall HDFS use is ~700G. (replication factor is 3 btw). I

Re: map task performance degradation - any idea why?

2010-11-19 Thread Lars George
Hi Henning, And you what you have seen is often difficult to explain. What I listed are the obvious contenders. But ideally you would do a post mortem on the master and slave logs for Hadoop and HBase, since that would give you a better insight of the events. For example, when did the system start

Re: map task performance degradation - any idea why?

2010-11-19 Thread Henning Blohm
Hi Lars, we do not have anything like ganglia up. Unfortunately. I use regular puts with autoflush turned off, with a buffer of 4MB (could be bigger right?). We write to WAL. I flush every 1000 recs. I will try again - maybe over the weekend - and see if I can find out more. Thanks,

Re: map task performance degradation - any idea why?

2010-11-19 Thread Lars George
Yeah, turning of the WAL would have been my next suggestion. Apart from that Ganglia is really easily set up - you might want to consider getting used to it now :) On Fri, Nov 19, 2010 at 4:29 PM, Henning Blohm wrote: > Hi Lars, > >  we do not have anything like ganglia up. Unfortunately. > >  I

Re: Ganglia website refuses connection despite proxy (Hbase EC2)

2010-11-19 Thread Saptarshi Guha
Thanks. On Fri, Nov 19, 2010 at 2:15 AM, Lars George wrote: > Yeah, this will be superseded by WHIRR-25 over the next month or two. > The "root" name was simply a choice, no reason not to change it. As > for Ganglia, do you see the Ganglia daemon run on each node? If not, > please have a look in

Re: Ganglia website refuses connection despite proxy (Hbase EC2)

2010-11-19 Thread Saptarshi Guha
Yes, messages is the right place. Saw this Nov 19 12:25:26 ip-10-98-154-214 /usr/sbin/gmetad[1293]: Unable to mkdir(/var/lib/ganglia/rrds/unspecified): No such file or directory Cheers J On Fri, Nov 19, 2010 at 2:15 AM, Lars George wrote: > Yeah, this will be superseded by WHIRR-25 over the n

Re: Why did Facebook prefer to HBase than Cassandra?

2010-11-19 Thread Jean-Daniel Cryans
This isn't the right forum for that kind of discussion. I recommend going on Quora which already has a few good threads on the subject, answered by FB folks, namely: http://www.quora.com/Why-did-Facebook-pick-HBase-instead-of-Cassandra-for-the-new-messaging-platform and http://www.quora.com/How

Adding packages to the HBase AMI (ami-fe698397 and ami-c86983a1) - no space on device

2010-11-19 Thread Saptarshi Guha
Hello, Both packages have HBAse 0.89.20100726 installed, the former is c1.xlarge and the latter is medium). I'm trying to install some extra packages (see [1]) By the time I've come to install R, I'm almost out of space on the root device. I would like to add some packages to the task nodes (whic

Re: Adding packages to the HBase AMI (ami-fe698397 and ami-c86983a1) - no space on device

2010-11-19 Thread Saptarshi Guha
I Think i found the answer. It appears this AMI was bundled with 3GB but there is no compelling reason to do so. I can recreate an AMI from the c1.xlarge AMI bundling it with e.g. 5GB. That should cover my needs. I honestly don't mind 5 minute start up times - coffee and a cigarette - or is there s

Re: Adding packages to the HBase AMI (ami-fe698397 and ami-c86983a1) - no space on device

2010-11-19 Thread Andrew Purtell
The root device on instance-store type instances is small, but there are several additional disk volumes supplied depending on the instance size, 2 420 GB volumes for m1.xlarge and 4 420 GB volumes for c1.xlarge. Our ec2 scripts mount them as /mnt, /mnt2, /mnt3, etc. and configure the Hadoop D

Re: Adding packages to the HBase AMI (ami-fe698397 and ami-c86983a1) - no space on device

2010-11-19 Thread Saptarshi Guha
True, and thanks for putting the scripts together. But doesn't yum demand free space on the / device? There is plenty of space on /mnt but then do instruct yum to install packages elsewhere. I'd have to link /lib/ etc to /mnt or something. Cheers J On Fri, Nov 19, 2010 at 2:27 PM, Andrew Purtell

Re: Adding packages to the HBase AMI (ami-fe698397 and ami-c86983a1) - no space on device

2010-11-19 Thread Andrew Purtell
> I'd have to link /lib/ etc to /mnt or something. Yes. Copy first. Then use bind mounts ('mount --bind ...') to overlay the additional storage wherever you prefer in the filesystem hierarchy. Then invoke yum. I am sure there are other approaches but the above can be scripted easily enough.

Re: Why did Facebook prefer to HBase than Cassandra?

2010-11-19 Thread MauMau
Hello, Jean-Daniel, Thank you for telling me good pointers. I was afraid I could get my question to be noticed by FB people and obtain reliable answers, so I asked here. One more reason is that I'm a fan of HBase (I'm sorry I thought this might not be the best place to ask this question.) An

Re: Ganglia website refuses connection despite proxy (Hbase EC2)

2010-11-19 Thread Saptarshi Guha
I logged into the master 1. In hbase-ec2-init-remote.sh, the block is evaluated (on master) if [ "$IS_MASTER" = "true" ]; then sed -i -e "s|\( *mcast_join *=.*\)|#\1|" \ -e "s|\( *bind *=.*\)|#\1|" \ -e "s|\( *mute *=.*\)| mute = yes|" \ -e "s|\( *location *=.*\)| l