On Jan 25, 2008 6:30 AM, roger dimitri <[EMAIL PROTECTED]> wrote:

> Hi,
>   I am very new to Hadoop, and I have a project where I need to use Lucene
> to index some input given either as a a huge collection of Java objects or
> one huge java object.
>  I read about Hadoop's MapReduce utilities and I want to leverage that
> feature in my case described above.
>  Can some one please tell me how I can approach the problem described
> above. Because all the Hadoop's MapReduce examples out there show only File
> based input and don't explicitly deal with data coming in as a huge Java
> object or so to speak.


Something that came just out of my head. When your input is a collection of
smaller objects, each independent of the other, you could serialize all the
objects and write to a file, specify the RecordReader and the reducer would
deserialize each object and perform indexing. I'll have to look into more
details on java.io.Serializable and lucene API to be able to comment more on
it.

-- 
N. Rajagopal,
Visit me at http://www.raja-gopal.com

Reply via email to