Ah yeah, didn't realise it would assume same FS, internally. Indeed, no way to have rename working between different FSes.
Em qui, 31 de out de 2019 às 16:25, Josh Elser <els...@apache.org> escreveu: > Short answer: no, it will not work and you need to copy it to HDFS first. > > IIRC, the bulk load code is ultimately calling a filesystem rename from > the path you provided to the proper location in the hbase.rootdir's > filesystem. I don't believe that an `fs.rename` is going to work across > filesystems because you can't do this atomically, which HDFS guarantees > for the rename method [1] > > Additionally, for Kerberos-secured clusters, the server-side bulk load > logic expects that the filesystem hosting your hfiles is HDFS (in order > to read the files with the appropriate authentication). This fails right > now, but is something our PeterS is looking at. > > [1] > > https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29 > > On 10/31/19 6:55 AM, Wellington Chevreuil wrote: > > I believe you can specify your s3 path for the hfiles directly, as hdfs > > FileSystem does support s3a scheme, but you would need to add your s3 > > access and secret key to your completebulkload configuration. > > > > Em qua, 30 de out de 2019 às 19:43, Gautham Acharya < > > gauth...@alleninstitute.org> escreveu: > > > >> If I have Hfiles stored in S3, can I run CompleteBulkLoad and provide an > >> S3 Endpoint to run a single command, or do I need to first copy the S3 > >> Hfiles to HDFS first? The documentation is not very clear. > >> > > >