Re: Completing a bulk load from HFiles stored in S3

Wellington Chevreuil Fri, 01 Nov 2019 05:22:26 -0700

Ah yeah, didn't realise it would assume same FS, internally. Indeed, no way
to have rename working between different FSes.


Em qui, 31 de out de 2019 às 16:25, Josh Elser <els...@apache.org> escreveu:

> Short answer: no, it will not work and you need to copy it to HDFS first.
>
> IIRC, the bulk load code is ultimately calling a filesystem rename from
> the path you provided to the proper location in the hbase.rootdir's
> filesystem. I don't believe that an `fs.rename` is going to work across
> filesystems because you can't do this atomically, which HDFS guarantees
> for the rename method [1]
>
> Additionally, for Kerberos-secured clusters, the server-side bulk load
> logic expects that the filesystem hosting your hfiles is HDFS (in order
> to read the files with the appropriate authentication). This fails right
> now, but is something our PeterS is looking at.
>
> [1]
>
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29
>
> On 10/31/19 6:55 AM, Wellington Chevreuil wrote:
> > I believe you can specify your s3 path for the hfiles directly, as hdfs
> > FileSystem does support s3a scheme, but you would need to add your s3
> > access and secret key to your completebulkload configuration.
> >
> > Em qua, 30 de out de 2019 às 19:43, Gautham Acharya <
> > gauth...@alleninstitute.org> escreveu:
> >
> >> If I have Hfiles stored in S3, can I run CompleteBulkLoad and provide an
> >> S3 Endpoint to run a single command, or do I need to first copy the S3
> >> Hfiles to HDFS first? The documentation is not very clear.
> >>
> >
>

Re: Completing a bulk load from HFiles stored in S3

Reply via email to