Hi,

I'm working on a news crawler with continuous indexing. Thus indexes are 
merged frequently and older documents aren't as important as recent ones.

Therefor I'd like to store the fulltext of documents in an external storage 
(HBase?) so that merging of indexes isn't as IO intensive. This would give me 
the additional benefit, that I could selectively delete the fulltext of older 
articles when running out of disc space while keeping the url of the document 
in the index.

Do you know, whether sth. like this would be possible?

Best regards,

Thomas Koch, http://www.koch.ro

Reply via email to