comparing hbase backed by HDFS verses S3

2008-04-30 Thread Clint Morgan
We are considering using S3 as the DFS impl for hbase. I ran some benchmarks to get an idea for the performance differences. We are particularly interested in being able to serve data to users from hbase, so want low latency responses for getting 10s of rows. Each row ("transaction") has about 1K

Re: comparing hbase backed by HDFS verses S3

2008-04-30 Thread Chris K Wensel
Anything relating to S3 will be slower thus it probably shouldn't be used as the default FileSystem for Hadoop. It works great if you need to park data between cluster runs, assuming you do not need external (from Hadoop and the cluster) applications to be able to read the data, as data in

Re: comparing hbase backed by HDFS verses S3

2008-05-01 Thread Leon Mergen
On Thu, May 1, 2008 at 2:30 AM, Chris K Wensel <[EMAIL PROTECTED]> wrote: > Further, once support for appends is added to Hadoop/HDFS, I am unsure if > it will be inherited by S3FS. I think this is a critical issue for HBase. > Does that mean that support for an S3-backed storage backend will be

Re: comparing hbase backed by HDFS verses S3

2008-05-01 Thread Chris K Wensel
No. It just means I have no idea how appends will be implemented and how it affects the other FileSystems. On May 1, 2008, at 2:59 AM, Leon Mergen wrote: On Thu, May 1, 2008 at 2:30 AM, Chris K Wensel <[EMAIL PROTECTED]> wrote: Further, once support for appends is added to Hadoop/HDFS, I

Re: comparing hbase backed by HDFS verses S3

2008-05-01 Thread Clint Morgan
Thanks for the input as it confirmed my suspicions. We were debating running off of S3 just to minimize moving parts. But it does not look feasible. We are wanting the cluster to "live forever" in that once the app is live, hbase will always be needed to serve data. A primary concern is data los

Re: comparing hbase backed by HDFS verses S3

2008-05-01 Thread stack
Thanks for trying this "interesting" experiment Clint. I'm a little surprised the thing worked at all (smile). What do you need for HBASE-50? Is it sufficient forcing the cluster to go read-only flushing all in memory while the copy runs? CopyFiles/distcp should be able to go between filesy

Re: comparing hbase backed by HDFS verses S3

2008-05-01 Thread Clint Morgan
> What do you need for HBASE-50? Is it sufficient forcing the cluster to go > read-only flushing all in memory while the copy runs? Hopefully we can minimize the time we are read-only. We'd like the system to behave as close to normally as possible while snapshotting. Is the only danger of allow

Re: comparing hbase backed by HDFS verses S3

2008-05-05 Thread Clint Morgan
Actually, I think a more simple approach will get what we want here: Give hbase a custom filesystem which writes to hdfs, then to s3, but reads just from hdfs. This way we maintain a fresh backup on s3.Then when hdfs crashes we can piece it back together from s3. Meanwhile, we could even try to ke

Re: comparing hbase backed by HDFS verses S3

2008-05-05 Thread stack
Clint Morgan wrote: Actually, I think a more simple approach will get what we want here: Give hbase a custom filesystem which writes to hdfs, then to s3, but reads just from hdfs. Thats an interesting idea Clint. What would you call it? (hdfs3?) ... On Thu, May 1, 2008 at 5:14 PM, Clint

Re: comparing hbase backed by HDFS verses S3

2008-05-05 Thread Jim R. Wilson
> Actually, I think a more simple approach will get what we want here: > Give hbase a custom filesystem which writes to hdfs, then to s3, but > reads just from hdfs. +1 :) That would be fantastic for making geographically distributed live-backups. -- Jim R. Wilson (jimbojw) On Mon, May 5, 2008

Re: comparing hbase backed by HDFS verses S3

2008-05-05 Thread Leon Mergen
On Mon, May 5, 2008 at 6:12 PM, Clint Morgan <[EMAIL PROTECTED]> wrote: > Actually, I think a more simple approach will get what we want here: > Give hbase a custom filesystem which writes to hdfs, then to s3, but > reads just from hdfs. Keep in mind that Amazon is about to release permanent fil