It wouldn't be that bad to merge the index externally and the reindex
the results, if it is as simple as your example. Search for id:[1 TO *]
and a fq for the category, increment the slice of the results you need
to process until you have covered all of the docs in the category.
Request the content
You *might* be able to reconstruct enough of the "original" documents
from your indexes to create another without recrawling. I know Luke
can reconstruct documents form an index, but for unstored data it's
slow and may be lossy.
But it may suit your needs given how long it takes to make your index
is it possible to query out the stored data as, uh, tokens I suppose.
Then, index those tokens in the next index?
thanks
gene
On Wed, Sep 17, 2008 at 1:14 PM, Gene Campbell <[EMAIL PROTECTED]> wrote:
> I was pretty sure you'd say that. But, I means lots that you take the
> time to confirm it.
I was pretty sure you'd say that. But, I means lots that you take the
time to confirm it. Thanks Otis.
I don't want to give details, but we crawl for our data, and we don't
save it in a DB or on disk. It goes from download to index. Was a
good idea at the time; when we thought our designs were
You can't copy+merge+flatten indices like that. Reindexing would be the
easiest. Indexing taking weeks sounds suspicious. How much data are you
reindexing and how big are your indices?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: ris