Re: Performance with large no of files

2022-10-08 Thread Brahma Reddy Battula
Not sure, what's your backup approach. One option can be archiving[1] the files which were done for yarn logs[2]. To Speed on this, you can write one mapreduce job for archiving the files. Please refer to the Document for sample mapreduce[3].

Re: Performance with large no of files

2022-10-08 Thread Ayush Saxena
Using DistCp is the only option AFAIK. Distcp does support webhdfs, then try playing with the number of mappers and so to tune it for better performance -Ayush > On 09-Oct-2022, at 8:56 AM, Abhishek wrote: > >  > Hi, > We want to backup large no of hadoop small files (~1mn) with webhdfs API

Performance with large no of files

2022-10-08 Thread Abhishek
Hi, We want to backup large no of hadoop small files (~1mn) with webhdfs API We are getting a performance bottleneck here and it's taking days to back it up. Anyone know any solution where performance could be improved using any xml settings? This would really help us. v 3.1.1 Appreciate your

Re: Communicating between yarn and tasks after delegation token renewal

2022-10-08 Thread Vinod Kumar Vavilapalli
There’s no way to do that. Once YARN launches containers, it doesn’t communicate with them for anything after that. The tasks / containers can obviously always reach out to YARN services. But even that in this case is not helpful because YARN never exposes through APIs what it is doing with