6 okt 2009 kl. 18.54 skrev David Causse:

David, your timing couldn't be better. Just the other day I proposed that we deprecate InstantiatedIndexWriter. The sum of the reasons to this is that I'm a bit lazy. Your mail makes me reconsider.

https://issues.apache.org/jira/browse/LUCENE-1948

On the index time InstantiatedIndex is behind RAMDirectory, but the time

Would you mind benchmarking some for me using your corpora? The issue suggests that people use the InstantiatedIndex(IndexReader) constructor to create the index rather than using InstantiatedIndexWriter. Is it way slower for you to produce the index using RAMDirectory/IndexWriter and pass an IndexReader to InstantiatedIndex?


This is what the package level javadocs says about InstantiatedIndexWriter:

"Hardly any effort has been put in to optimizing the InstantiatedIndexWriter, only minimizing the amount of time needed to write-lock the index has been considered."

I'm sure there are ways to speed it up, I just never managed to find the time to look in to it. I never really used IIW.


It might be worth mentioning that when InstantiatedIndex#commit returns it has yeilded an optimized "single segment" index. This is not quite how a Directory/IndexWriter acts.

gained over queries make it better (for what I see it can be 2 times
faster).

InstantiatedIndex will be our default volatile mini index store for our
next production release.

Very cool!!

Whe should have other needs of this index but the lack of addIndexes
support make it impossible for us to use it in other situations. So we
continue to use RAMDirectory in such situations.

Have you considered using multiple InstantiatedIndex and a MultiReader? That would pretty much be the same thing, just that the store wouldn't be quite as optimized. It would definitly use more RAM than if it was the same index. You could of course also pass this MultiReader to a new InstantiatedIndex. I have no real clue about the difference in speed and RAM consuption between these solutions so you should benchmark all solutions.

Do you think we could reach RAMDirectory index time by tweaking some initialCap
stuff inside java.util.Collections you use?


Maybe. But I think it would be a relatively small gain. But don't take my words for granted, benchmark it.

Using the InstantiatedIndex(IndexReader) constructor will create rather optimal size of the collections.

As for InstantiatedIndexWriter I think it's pretty much only the transient collections in #commit that will help you, my guess is that you should expemient with the dirtyTerms and termsByText attributes. Count the number of terms in your complete index and see how much it speeds thing up by creating the collections with this size from the start.




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to