Hi, We used a commercial FT and scheduler tool in clustered mode. This was a traditional active-active cluster that supported multiple protocols like FTPS etc.
Now I am interested in evaluating a Distributed way of crawling FTP sites and downloading files using Hadoop. I thought since we have to process thousands of files Hadoop jobs can do it. Are Hadoop jobs used for this type of file transfers ? Moreover there is a requirement for a scheduler also. What is the recommendation of the forum ? Thanks, Mohan