Re: Completing a bulk load from HFiles stored in S3

Austin Heyne Tue, 12 Nov 2019 11:33:28 -0800

Yes, that's correct. I've never tried bulk loading from S3 on 2.x


-Austin

On 11/12/19 1:32 PM, Josh Elser wrote:

Thanks for the info, Austin. I'm guessing that's how 1.x works sinceyou mention EMR?
I think this code has changed in 2.x with the SecureBulkLoad stuffmoving into "core" (instead of external as a coproc endpoint).
On 11/12/19 10:39 AM, Austin Heyne wrote:
Sorry for the late reply. You should be able to bulk load files fromS3 as it will detect that they're not the same filesystem and havethe regionservers copy the files locally and then up to HDFS. This isrelated to a problem I reported a while ago when using HBase on S3with EMR.
https://issues.apache.org/jira/browse/HBASE-20774

-Austin

On 11/1/19 8:04 AM, Wellington Chevreuil wrote:
Ah yeah, didn't realise it would assume same FS, internally. Indeed,no way
to have rename working between different FSes.
Em qui, 31 de out de 2019 às 16:25, Josh Elser <[email protected]>escreveu:
Short answer: no, it will not work and you need to copy it to HDFSfirst.
IIRC, the bulk load code is ultimately calling a filesystem renamefrom
the path you provided to the proper location in the hbase.rootdir's
filesystem. I don't believe that an `fs.rename` is going to workacrossfilesystems because you can't do this atomically, which HDFSguarantees
for the rename method [1]

Additionally, for Kerberos-secured clusters, the server-side bulk load
logic expects that the filesystem hosting your hfiles is HDFS (inorderto read the files with the appropriate authentication). This failsright
now, but is something our PeterS is looking at.

[1]
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29
On 10/31/19 6:55 AM, Wellington Chevreuil wrote:
I believe you can specify your s3 path for the hfiles directly, ashdfs
FileSystem does support s3a scheme, but you would need to add your s3
access and secret key to your completebulkload configuration.

Em qua, 30 de out de 2019 às 19:43, Gautham Acharya <
[email protected]> escreveu:
If I have Hfiles stored in S3, can I run CompleteBulkLoad andprovide anS3 Endpoint to run a single command, or do I need to first copythe S3
Hfiles to HDFS first? The documentation is not very clear.

Re: Completing a bulk load from HFiles stored in S3

Reply via email to