Re: How to do parallel indexing on files (not on HDFS)

Rahul Singh Wed, 23 May 2018 05:16:49 -0700

Enumerate the file locations (map) , put them in a queue like rabbit or Kafka 
(Persist the map), have a bunch of threads , workers, containers, whatever pop 
off the queue , process the item (reduce).



--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On May 20, 2018, 7:24 AM -0400, Raymond Xie <xie3208...@gmail.com>, wrote:
> I know how to do indexing on file system like single file or folder, but
> how do I do that in a parallel way? The data I need to index is of huge
> volume and can't be put on HDFS.
>
> Thank you
>
> *------------------------------------------------*
> *Sincerely yours,*
>
>
> *Raymond*

Re: How to do parallel indexing on files (not on HDFS)

Reply via email to