indexing documents that arrive in pieces

Taher H. Haveliwala Sun, 13 Oct 2002 08:26:08 -0700

What is the cleanest way in Lucene to add documents to
an index, if the entire document is not readily
available at one time?


E.g., I want to index the text as well as the
anchor-text of a stream of html pages, where the
anchor-text terms get associated with the page _being
pointed to_.  For a document d_i, I don't know all the
terms that should be added to its "anchor" field,
until I've seen all documents d_j that link to d_i.

Of course I can make a pass over the web pages, and
gather up the relevant terms myself, but if Lucene has
the necessary machinery to add portions of a document
at different times, it would save me work. 

Thanks
Taher

__________________________________________________
Do you Yahoo!?
Faith Hill - Exclusive Performances, Videos & More
http://faith.yahoo.com

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

indexing documents that arrive in pieces

Reply via email to