Re: solr booosting

2009-02-10 Thread Marc Sturlese
Thanks Hoss, that was really useful information. hossman wrote: : As I understood lucene's boost, if you search for John Le Carre it will : give better score to the results that contains just the searched string that : results that have, for example, 50 words and the search is contained

Re: Moving from single core to multicore

2009-02-10 Thread Michael Lackhoff
On 10.02.2009 02:39 Chris Hostetter wrote: : Now all that is left is a more cosmetic change I would like to make: : I tried to place the solr.xml in the example dir to get rid of the : -Dsolr.solr.home=multicore for the start and changed the first entry : from core0 to solr and moved the

Feedback needed on sharding/distributed search

2009-02-10 Thread Rajiv2
Hello, we’re currently in the midst of re-designing our search hardware architecture and I have some questions about sharding and distributed search. 1. What is the benefit of using sharding/distributed search over keeping the index intact? 2. What is the best approach to

Re: Feedback needed on sharding/distributed search

2009-02-10 Thread Yonik Seeley
On Tue, Feb 10, 2009 at 10:13 AM, Rajiv2 rajiv.roo...@gmail.com wrote: 1. What is the benefit of using sharding/distributed search over keeping the index intact? Primarily response time of single requests. If your response times are fast enough with a single index, then simply

Re: Solr Cell (ExtractingRequestHandler) and plain text files

2009-02-10 Thread Grant Ingersoll
OK, I have reproduced this. Let me debug for a moment and then we can likely file a JIRA On Feb 9, 2009, at 10:17 PM, Erik Hatcher wrote: One other person has reported this to me off-list, and I just encountered it myself. ExtractingRequestHandler does not handle plain text files

Re: Solr Cell (ExtractingRequestHandler) and plain text files

2009-02-10 Thread Grant Ingersoll
So, this seems to be an issue with Tika and it's mime type detection of plain text. For some discussion on it, see http://www.lucidimagination.com/search/document/64e27546d23e67b9/mime_type_identification_of_plain_text_files and also https://issues.apache.org/jira/browse/TIKA-154, which has

Re: Solr Cell (ExtractingRequestHandler) and plain text files

2009-02-10 Thread Erik Hatcher
On Feb 10, 2009, at 10:57 AM, Grant Ingersoll wrote: So, this seems to be an issue with Tika and it's mime type detection of plain text. For some discussion on it, see http://www.lucidimagination.com/search/document/64e27546d23e67b9/mime_type_identification_of_plain_text_files and also

optimization failed

2009-02-10 Thread Qingdi
Our index size is about 60G. Most of the time, the optimization works fine. But this morning, the optimization kept creating new segment files until all the free disk space (300G) was used up. Here is how the files generated during optimization look like:

Re: Vertical Partitioning advice

2009-02-10 Thread Mark Kranz
I ended up pursuing the ParallelWriter http://issues.apache.org/jira/browse/LUCENE-600 , so we can map different fields to different indexes. This appears to keep the indexes in sync, although I still need to do more testing. However, some ugly hackery was needed to get it to extend

Recent Paging Change?

2009-02-10 Thread wojtekpia
Has there been a recent change (since Dec 2/08) in the paging algorithm? I'm seeing much worse performance (75% drop in throughput) when I request 20 records starting at record 180 (page 10 in my application). Thanks. Wojtek -- View this message in context:

Is there a way to query for this value?

2009-02-10 Thread Ian Connor
I have tried to escape the characters as best I can, but cannot seem to find one that works. The value is: 10.1002/(SICI)1096-9136(199604)13:4lt;390::AID-DIA121gt;3.0.CO;2-4 It is a doi (see http://doi.org), so is a valid value to search on. However, when I query this through ruby or even the

score filter

2009-02-10 Thread Cheng Zhang
Hello, Is there a way to set a score filter? I tried +score:[1.2 TO *] but it did not work. Many thanks, Kevin

Re: Is there a way to query for this value?

2009-02-10 Thread Otis Gospodnetic
Hi Ian, I'll assume this actually did get indexed as a single token, so there is no problem there. As for query string escaping, perhaps this method from Lucene's QueryParser will help: /** * Returns a String where those characters that QueryParser * expects to be escaped are escaped

Re: optimization failed

2009-02-10 Thread Otis Gospodnetic
Hi Qingdi, Hm, I've never encountered this problem. You didn't mention your Solr version. If I were you I would grab the nightly build tomorrow, because tonight's Solr nightly build should include the very latest Lucene jars. Of course, this means running Solr 1.4-dev. Otis -- Sematext --

Re: Performance degradation caused by choice of range fields

2009-02-10 Thread Otis Gospodnetic
Hi, Did you commit (reopen the searcher) during the performance degradation period and did any of your queries use sort? If so, perhaps your JVM is accumulating those thrown-away FieldCache objects and then GC has more and more garbage to clean up, causing pauses and lowering your overall

Re: Recent Paging Change?

2009-02-10 Thread Otis Gospodnetic
Hi, I don't recall any paging changes. Perhaps you can run things with something like YouKit and see where queries were spending the most time in the old version and where they are spending time in the new version? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch