Otis Gospodnetic wrote:
Maybe I'm not following your situation 100%, but it sounded like pulling the values of purely stored fields is the slow part. *Perhaps* using a non-Lucene data store just for the saved fields would be faster.
For this purpose Nutch uses external files in Hadoop MapFile format. MapFile-s offer quick search & get by key (using binary search over an in-memory index of keys).
The benefit of this solution is that the bulky content is decoupled from Lucene indexes, and it can be put in a physically different location (e.g. a dedicated page content server).
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com