Re: moving nontrivial application from lucene to solr

jm Tue, 30 Oct 2007 00:53:49 -0800

Thanks Otis, yes I noticed it was a bit long after writing it (trying
to explain all I guess).


About the opening the new index that would be configurable time based
I guess, so starting with indexSmall and indexBig, every N time it
would happen:

1. stop all indexing/searching processes
2. merge indexSmall to indexBig
3. optimize
4. empty indexSmall
5. restart all indexing/searching processes

That would work I think.

For 3, I understand optimizing takes longer when index is bigger
obviously, so it might take a while when indexBig is huge. I think I
remember seeing something in the lucene list about optimizing but not
to the best case, only to a less than optimum state, but using less
time, am I right? Does somebdoy now if this could be bound by a time
given for the optimization? I mean something like taht:

indexWriter.optimize(300);
300 being seconds for instance, and meaning, optimize during 300
seconds, and stop at that time if still not finished totally
optimizing. That way you could be sure and optimize call I make wont
take extemelly long. I should ask this in lucene list probably...

thanks again
javi

On 10/30/07, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> Hi,
> Short answer to your long email: I didn't see anything Lucene-specific in 
> your description that would prevent you from using Solr for this.
> Figuring out when to open/start a new index would have to be done by your 
> application, but with SOLR-215, Solr can now host several indices.  Yes, you 
> can minimize all caches and essentially disable them, though that likely 
> won't affect your indexing performance, unless you are really low on RAM, in 
> which case you better run to the store. ;)
>
> Otis
>
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> ----- Original Message ----
> From: jm <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Monday, October 29, 2007 2:01:07 PM
> Subject: moving nontrivial application from lucene to solr
>
>
> Hello all,
>
> I have an application running based on lucene 2.2. Maybe it is not the
> most typical usage of Lucene, the main features regarding lucene are:
>
> - I use many indexes, tens or hundreds of them(all contain the same
> structure of fields etc),and the number of indexes grows with time.
> The indexes are separted on a time-of-document basis, so I can have
> monthly indexes, or every day, 5 days etc. I keep then all under a
> given directory, what we call an index location (contains many
> indexes)
> - When I need more than one process indexing at the same time, I have
> two use one index location per process (as writing is exclusive). So
> that effectively doubles the number of indexes if I use 2 processes,
> or even worse if I want more processes. I know I could merge indexes
> but right now I dont do that, too much trouble.
> - I am mostly worried about indexing performance, I dont mind if
> searching takes longer. For speeding indexing I keep a certain amount
> of indexes open etc.
> - While searching, I alwasy need all docs (I use a HitCollector
> extension). I use a multisearcher to search over many indexes.
> - I use my custom Analyzer
> - documents have 6 fields, all but one non stored. The stored one is
> small, around 100 bytes. The other can vary from 1 byte to huge (only
> two can be really huge)
>
> I got to know about Solr when my code was already working so it was
> not possible to change. I have played around with Solr, and even built
> a smaller project with it. And know I am thinking that porting to solr
> would have many advantages:
> - would allow N processes to index at the same time and it would keep
> only one index (or more see later). I could get rid of maintaining the
> index locations etc. A big plus for me.
> - I would reduce my lucene related code.
> - deleting docs would work easily, now I have to do some special
> treatment to delete docs in some cases cause I have to wait until
> nobody is writing to the index etc
> - I would take the oportunity to refactor the way I use the documents
> to get even more advantages for my application unrelated to
> lucene/solr
> - might even be faster indexing?
>
> Before committing to do the change I would like to request the experts
> opinion on the following:
> 1. For starters I could live with having only a single (potentially
> huge) index that is being constantly written into, and to a lesser
> extent searched. But in case the index is too big, or I want better
> indexing performance, I might want to have two indexes.
> One would be smaller, to index all new content (and searching), and
> the second one would be for searching only. At some point the small
> one would be merged to the big one and emptied to start again.
> I would like to know how this could be done:
> -a. that is already being done in solr core (or there is a ticket open
> with similar functionality)
> -b. could be done with some scripting without modifying solr src.
> -c should be something implemented by modifying the solr src and using
>  https://issues.apache.org/jira/browse/SOLR-215 or
> https://issues.apache.org/jira/browse/SOLR-255. If this case which of
> both patches would be more appropiate? I see SOLR-215 is already
> commited to trunk but SOLR-255 is not done yet.
>
> 2. As I said I am not worried about search performance, my queries are
> all batch queries (well, sort of). All I need is maximum indexing
> performance, and without too much memory if possible. From what I read
> solr caches stuff heavily at various levels to provide fast searches.
> As said in the wiki, i can comment out the caching sections in the
> solrconfig.xml. Will this totally disable all the caching, warming
> etc?? If not, do I need to modifying the source?
> I would like to do that so less memory is used and I suppose trying to
> make search faster is not beneficial for indexing performance.
>
> thanks for your thoughts.
>
>
>
>

Re: moving nontrivial application from lucene to solr

Reply via email to