Hi, I'm working on a news crawler with continuous indexing. Thus indexes are merged frequently and older documents aren't as important as recent ones.
Therefor I'd like to store the fulltext of documents in an external storage (HBase?) so that merging of indexes isn't as IO intensive. This would give me the additional benefit, that I could selectively delete the fulltext of older articles when running out of disc space while keeping the url of the document in the index. Do you know, whether sth. like this would be possible? Best regards, Thomas Koch, http://www.koch.ro