Hi Thanks for the code snippet. If the executable inside the map process needs to access directories and files present in the local file system. Is it possible? I know they are running in slave node in a temporary working directory and i can think about distributed cache. But still would like to know if the map process can access local file system
Regards Bala On 01-Jul-2016 7:46 am, "Sun Rui" <sunrise_...@163.com> wrote: > Say you have got all of your folder paths into a val folders: Seq[String] > > val add = sc.parallelize(folders, folders.size).mapPartitions { iter => > val folder = iter.next > val status: Int = <call your executable with the folder path string> > Seq(status).toIterator > } > > On Jun 30, 2016, at 16:42, Balachandar R.A. <balachandar...@gmail.com> > wrote: > > Hello, > > I have some 100 folders. Each folder contains 5 files. I have an > executable that process one folder. The executable is a black box and hence > it cannot be modified.I would like to process 100 folders in parallel using > Apache spark so that I should be able to span a map task per folder. Can > anyone give me an idea? I have came across similar questions but with > Hadoop and answer was to use combineFileInputFormat and pathFilter. > However, as I said, I want to use Apache spark. Any idea? > > Regards > Bala > > >