> Consistency issues: Since S3 has read after write consistency for new
>
objects
Eventually. The problem is reads for the new objects may fail for some
arbitrary time first, with 404 or 500 responses as I mentioned before.
>
Appends: An append in S3 can be modeled as a read / copy /
hbase relies on .tmp directories to do some sort of "atomic" file creation.
and avoid problems like half data written when it crashes.
there is a jira open, to solve that problem in one of the next major
releases:
https://issues.apache.org/jira/browse/HBASE-14090
There is a document in it, if you
Thanks Matteo. If I understand correctly, one example of how the .tmp
directories help prevent issues is as follows: If HBase were to crash
during a compaction, since these .tmp directories are cleared out at start,
cleanup is much easier, right?
On Wed, Sep 9, 2015 at 7:31 PM, Matteo Bertozzi
Hi all,
I'm investigating the use of S3 as a backing store for HBase. Would there
be any major issues with modifying HBase in such a way where when an S3
location is set for the rootdir, writes to .tmp are removed and minimized,
instead writing directly to the final destination? The reason I'd
It cannot work to use S3 as a backing store for HBase. This has been
attempted in the past (although not by me, so this isn't firsthand
knowledge). One basic problem is HBase expects to be able to read what it
has written immediately after the write completes. For example, opening a
store file
Hi Anthony,
Just curious, you mention your access pattern is mostly reads. Is it random
reads, M/R jobs over a portion of the dataset, or other?
Best,
--
Iain Wright
This email message is confidential, intended only for the recipient(s)
named above and may contain information that is
I see -
If it was only M/R, Hive, or a similar reporting/analytics workload without
the low latency get's/read requirement, I was going to suggest writing to
S3 directly and using spark+hivecontext.
Cheers,
--
Iain Wright
This email message is confidential, intended only for the recipient(s)
Hi Iain,
Random reads for now, with further MR work w/ Hive possibly down the line.
Thanks,
-t
On Wed, Sep 9, 2015 at 9:31 PM, iain wright wrote:
> Hi Anthony,
>
> Just curious, you mention your access pattern is mostly reads. Is it random
> reads, M/R jobs over a portion