Thanks Hoss, that was really useful information.
hossman wrote:
: As I understood lucene's boost, if you search for John Le Carre it
will
: give better score to the results that contains just the searched string
that
: results that have, for example, 50 words and the search is contained
On 10.02.2009 02:39 Chris Hostetter wrote:
: Now all that is left is a more cosmetic change I would like to make:
: I tried to place the solr.xml in the example dir to get rid of the
: -Dsolr.solr.home=multicore for the start and changed the first entry
: from core0 to solr and moved the
Hello, we’re currently in the midst of re-designing our search hardware
architecture and I have some questions about sharding and distributed
search.
1. What is the benefit of using sharding/distributed search over keeping
the
index intact?
2. What is the best approach to
On Tue, Feb 10, 2009 at 10:13 AM, Rajiv2 rajiv.roo...@gmail.com wrote:
1. What is the benefit of using sharding/distributed search over keeping
the
index intact?
Primarily response time of single requests. If your response times
are fast enough with a single index, then simply
OK, I have reproduced this. Let me debug for a moment and then we can
likely file a JIRA
On Feb 9, 2009, at 10:17 PM, Erik Hatcher wrote:
One other person has reported this to me off-list, and I just
encountered it myself. ExtractingRequestHandler does not handle
plain text files
So, this seems to be an issue with Tika and it's mime type detection
of plain text. For some discussion on it, see http://www.lucidimagination.com/search/document/64e27546d23e67b9/mime_type_identification_of_plain_text_files
and also https://issues.apache.org/jira/browse/TIKA-154, which has
On Feb 10, 2009, at 10:57 AM, Grant Ingersoll wrote:
So, this seems to be an issue with Tika and it's mime type detection
of plain text. For some discussion on it, see http://www.lucidimagination.com/search/document/64e27546d23e67b9/mime_type_identification_of_plain_text_files
and also
Our index size is about 60G. Most of the time, the optimization works fine.
But this morning, the optimization kept creating new segment files until all
the free disk space (300G) was used up.
Here is how the files generated during optimization look like:
I ended up pursuing the ParallelWriter
http://issues.apache.org/jira/browse/LUCENE-600 , so we can map different
fields to different indexes. This appears to keep the indexes in sync,
although I still need to do more testing.
However, some ugly hackery was needed to get it to extend
Has there been a recent change (since Dec 2/08) in the paging algorithm? I'm
seeing much worse performance (75% drop in throughput) when I request 20
records starting at record 180 (page 10 in my application).
Thanks.
Wojtek
--
View this message in context:
I have tried to escape the characters as best I can, but cannot seem to find
one that works.
The value is:
10.1002/(SICI)1096-9136(199604)13:4lt;390::AID-DIA121gt;3.0.CO;2-4
It is a doi (see http://doi.org), so is a valid value to search on. However,
when I query this through ruby or even the
Hello,
Is there a way to set a score filter? I tried +score:[1.2 TO *] but it did
not work.
Many thanks,
Kevin
Hi Ian,
I'll assume this actually did get indexed as a single token, so there is no
problem there.
As for query string escaping, perhaps this method from Lucene's QueryParser
will help:
/**
* Returns a String where those characters that QueryParser
* expects to be escaped are escaped
Hi Qingdi,
Hm, I've never encountered this problem. You didn't mention your Solr version.
If I were you I would grab the nightly build tomorrow, because tonight's Solr
nightly build should include the very latest Lucene jars. Of course, this
means running Solr 1.4-dev.
Otis
--
Sematext --
Hi,
Did you commit (reopen the searcher) during the performance degradation period
and did any of your queries use sort? If so, perhaps your JVM is accumulating
those thrown-away FieldCache objects and then GC has more and more garbage to
clean up, causing pauses and lowering your overall
Hi,
I don't recall any paging changes. Perhaps you can run things with something
like YouKit and see where queries were spending the most time in the old
version and where they are spending time in the new version?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
16 matches
Mail list logo