Enumerate the file locations (map) , put them in a queue like rabbit or Kafka 
(Persist the map), have a bunch of threads , workers, containers, whatever pop 
off the queue , process the item (reduce).


--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On May 20, 2018, 7:24 AM -0400, Raymond Xie <xie3208...@gmail.com>, wrote:
> I know how to do indexing on file system like single file or folder, but
> how do I do that in a parallel way? The data I need to index is of huge
> volume and can't be put on HDFS.
>
> Thank you
>
> *------------------------------------------------*
> *Sincerely yours,*
>
>
> *Raymond*

Reply via email to