Otis Gospodnetic wrote:
Maybe I'm not following your situation 100%, but it sounded like
pulling the values of purely stored fields is the slow part.
*Perhaps* using a non-Lucene data store just for the saved fields
would be faster.

For this purpose Nutch uses external files in Hadoop MapFile format. MapFile-s offer quick search & get by key (using binary search over an in-memory index of keys).

The benefit of this solution is that the bulky content is decoupled from Lucene indexes, and it can be put in a physically different location (e.g. a dedicated page content server).

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to