Not sure, what's your backup approach. One option can be archiving[1] the
files which were done for yarn logs[2].
To Speed on this, you can write one mapreduce job for archiving the files.
Please refer to the Document for sample mapreduce[3].
Using DistCp is the only option AFAIK. Distcp does support webhdfs, then try
playing with the number of mappers and so to tune it for better performance
-Ayush
> On 09-Oct-2022, at 8:56 AM, Abhishek wrote:
>
>
> Hi,
> We want to backup large no of hadoop small files (~1mn) with webhdfs API
Hi,
We want to backup large no of hadoop small files (~1mn) with webhdfs API
We are getting a performance bottleneck here and it's taking days to back
it up.
Anyone know any solution where performance could be improved using any xml
settings?
This would really help us.
v 3.1.1
Appreciate your
There’s no way to do that.
Once YARN launches containers, it doesn’t communicate with them for anything
after that. The tasks / containers can obviously always reach out to YARN
services. But even that in this case is not helpful because YARN never exposes
through APIs what it is doing with