Hi, Short answer to your long email: I didn't see anything Lucene-specific in your description that would prevent you from using Solr for this. Figuring out when to open/start a new index would have to be done by your application, but with SOLR-215, Solr can now host several indices. Yes, you can minimize all caches and essentially disable them, though that likely won't affect your indexing performance, unless you are really low on RAM, in which case you better run to the store. ;)
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- From: jm <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, October 29, 2007 2:01:07 PM Subject: moving nontrivial application from lucene to solr Hello all, I have an application running based on lucene 2.2. Maybe it is not the most typical usage of Lucene, the main features regarding lucene are: - I use many indexes, tens or hundreds of them(all contain the same structure of fields etc),and the number of indexes grows with time. The indexes are separted on a time-of-document basis, so I can have monthly indexes, or every day, 5 days etc. I keep then all under a given directory, what we call an index location (contains many indexes) - When I need more than one process indexing at the same time, I have two use one index location per process (as writing is exclusive). So that effectively doubles the number of indexes if I use 2 processes, or even worse if I want more processes. I know I could merge indexes but right now I dont do that, too much trouble. - I am mostly worried about indexing performance, I dont mind if searching takes longer. For speeding indexing I keep a certain amount of indexes open etc. - While searching, I alwasy need all docs (I use a HitCollector extension). I use a multisearcher to search over many indexes. - I use my custom Analyzer - documents have 6 fields, all but one non stored. The stored one is small, around 100 bytes. The other can vary from 1 byte to huge (only two can be really huge) I got to know about Solr when my code was already working so it was not possible to change. I have played around with Solr, and even built a smaller project with it. And know I am thinking that porting to solr would have many advantages: - would allow N processes to index at the same time and it would keep only one index (or more see later). I could get rid of maintaining the index locations etc. A big plus for me. - I would reduce my lucene related code. - deleting docs would work easily, now I have to do some special treatment to delete docs in some cases cause I have to wait until nobody is writing to the index etc - I would take the oportunity to refactor the way I use the documents to get even more advantages for my application unrelated to lucene/solr - might even be faster indexing? Before committing to do the change I would like to request the experts opinion on the following: 1. For starters I could live with having only a single (potentially huge) index that is being constantly written into, and to a lesser extent searched. But in case the index is too big, or I want better indexing performance, I might want to have two indexes. One would be smaller, to index all new content (and searching), and the second one would be for searching only. At some point the small one would be merged to the big one and emptied to start again. I would like to know how this could be done: -a. that is already being done in solr core (or there is a ticket open with similar functionality) -b. could be done with some scripting without modifying solr src. -c should be something implemented by modifying the solr src and using https://issues.apache.org/jira/browse/SOLR-215 or https://issues.apache.org/jira/browse/SOLR-255. If this case which of both patches would be more appropiate? I see SOLR-215 is already commited to trunk but SOLR-255 is not done yet. 2. As I said I am not worried about search performance, my queries are all batch queries (well, sort of). All I need is maximum indexing performance, and without too much memory if possible. From what I read solr caches stuff heavily at various levels to provide fast searches. As said in the wiki, i can comment out the caching sections in the solrconfig.xml. Will this totally disable all the caching, warming etc?? If not, do I need to modifying the source? I would like to do that so less memory is used and I suppose trying to make search faster is not beneficial for indexing performance. thanks for your thoughts.