Re: Broken attachment link on Wiki

2011-07-11 Thread Simon Wistow
Bump? On Mon, Jun 27, 2011 at 06:17:42PM +0100, me said: On the SolrJetty page http://wiki.apache.org/solr/SolrJetty there's a link to a tar ball http://wiki.apache.org/solr/SolrJetty?action=AttachFiledo=viewtarget=DEMO_multiple_webapps_jetty_6.1.3.tgz which fails with the error

Broken attachment link on Wiki

2011-06-27 Thread Simon Wistow
On the SolrJetty page http://wiki.apache.org/solr/SolrJetty there's a link to a tar ball http://wiki.apache.org/solr/SolrJetty?action=AttachFiledo=viewtarget=DEMO_multiple_webapps_jetty_6.1.3.tgz which fails with the error You are not allowed to do AttachFile on this page. Can someone fix

Multiple Solrs on the same box

2011-06-20 Thread Simon Wistow
First, a couple of assumptions. We have boxes with a large amount (~70Gb) of memory which we're running Solr under. We've currently set -Xmx to 25Gb with the GC settings -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing We're reluctant to up the -Xmx

Expunging deletes from a very large index

2011-06-06 Thread Simon Wistow
Due to some emergency maintenance I needed to run delete on a large number of documents in a 200Gb index. The problem is that it's taking an inordinately long amount of time (2+ hours so far and counting) and is steadily eating up disk space - presumably up to 2x index size which is getting

Negative OR in fq field not working as expected

2011-04-25 Thread Simon Wistow
I have a field 'type' that has several values. If it's type 'foo' then it also has a field 'restriction_id'. What I want is a filter query which says either it's not a 'foo' or if it is then it has the restriction '1' I expect two matches - one of type 'bar' and one of type 'foo' Neither

Re: Negative OR in fq field not working as expected

2011-04-25 Thread Simon Wistow
On Mon, Apr 25, 2011 at 04:34:05PM -0400, Jonathan Rochkind said: This is what I do instead, to rewrite the query to mean the same thing but not give the lucene query parser trouble: fq=( (*:* AND -type:foo) OR restriction_id:1) *:* means everything, so (*:* AND -type:foo) means the same

Re: Negative OR in fq field not working as expected

2011-04-25 Thread Simon Wistow
On Mon, Apr 25, 2011 at 05:02:12PM -0400, Yonik Seeley said: It really shouldn't be that slow... how many documents are in your index, and how many match -type:foo? Total number of docs is 161,000,000 type:foo 39,000,000 -type:foo 122,200,000 type:bar 90,000,000 We're aware it's large and

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-04-05 Thread Simon Wistow
On Wed, Apr 06, 2011 at 12:05:57AM +0200, Jan Høydahl said: Just curious, was there any resolution to this? Not really. We tuned the GC pretty aggressively - we use these options -server -Xmx20G -Xms20G -Xss10M -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSIncrementalMode

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-02-07 Thread Simon Wistow
On Mon, Feb 07, 2011 at 02:06:00PM +0100, Markus Jelsma said: Heap usage can spike after a commit. Existing caches are still in use and new caches are being generated and/or auto warmed. Can you confirm this is the case? We see spikes after replication which I suspect is, as you say, because

Re: Searching for negative numbers very slow

2011-02-07 Thread Simon Wistow
On Fri, Jan 28, 2011 at 12:29:18PM -0500, Yonik Seeley said: That's odd - there should be nothing special about negative numbers. Here are a couple of ideas: - if you have a really big index and querying by a negative number is much more rare, it could just be that part of the index wasn't

Searching for negative numbers very slow

2011-01-27 Thread Simon Wistow
If I do qt=dismax fq=uid:1 (or any other positive number) then queries are as quick as normal - in the 20ms range. However, any of fq=uid:\-1 or fq=uid:[* TO -1] or fq=uid:[-1 to -1] or fq=-uid:[0 TO *] then queries are incredibly slow - in the 9

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-01-27 Thread Simon Wistow
On Tue, Jan 25, 2011 at 01:28:16PM +0100, Markus Jelsma said: Are you sure you need CMS incremental mode? It's only adviced when running on a machine with one or two processors. If you have more you should consider disabling the incremental flags. I'll test agin but we added those to get

Re: Searching for negative numbers very slow

2011-01-27 Thread Simon Wistow
On Thu, Jan 27, 2011 at 11:32:26PM +, me said: If I do qt=dismax fq=uid:1 (or any other positive number) then queries are as quick as normal - in the 20ms range. For what it's worth uid is a TrieIntField with precisionStep=0, omitNorms=true, positionIncrementGap=0

Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-01-24 Thread Simon Wistow
We have two slaves replicating off one master every 2 minutes. Both using the CMS + ParNew Garbage collector. Specifically -server -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing but periodically they both get into a GC storm and just keel over.

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-01-24 Thread Simon Wistow
On Mon, Jan 24, 2011 at 08:00:53PM +0100, Markus Jelsma said: Are you using 3rd-party plugins? No third party plugins - this is actually pretty much stock tomcat6 + solr from Ubuntu. The only difference is that we've adapted the directory layout to fit in with our house style

Re: Possible Memory Leaks / Upgrading to a Later Version of Solr or Lucene

2011-01-24 Thread Simon Wistow
On Mon, Jan 24, 2011 at 10:55:59AM -0800, Em said: Could it be possible that your slaves not finished their replicating until the new replication-process starts? If so, there you got the OOM :). This was one of my thoughts as well - we're currently running a slave which has no queries in it

Box occasionally pegs one cpu at 100%

2011-01-10 Thread Simon Wistow
I have a fairly classic master/slave set up. Response times on the slave are generally good with blips periodically, apparently when replication is happening. Occasionally however the process will have one incredibly slow query and will peg the CPU at 100%. The weird thing is that it will

Re: Box occasionally pegs one cpu at 100%

2011-01-10 Thread Simon Wistow
On Mon, Jan 10, 2011 at 01:56:27PM -0500, Brian Burke said: This sounds like it could be garbage collection related, especially with a heap that large. Depending on your jvm tuning, a FGC could take quite a while, effectively 'pausing' the JVM. Have you looked at something like jstat

Re: Box occasionally pegs one cpu at 100%

2011-01-10 Thread Simon Wistow
On Mon, Jan 10, 2011 at 05:58:42PM -0500, François Schiettecatte said: http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html(you need to read this one) http://java.sun.com/performance/reference/whitepapers/tuning.html (and this one). Yeah, I have these two pages

Very slow sorting, even on small result sets

2010-11-30 Thread Simon Wistow
We've got a largish corpus (~94 million documents). We'd like to be able to sort on one of the string fields. However this takes an incredibly long time. A warming query for that field takes about ~20 minutes. However most of the time the result sets are small since we use filters heavily -

Experiencing lots of full GC runs

2010-11-18 Thread Simon Wistow
We currently have a 30G index with 73M of .tii files running on a machine with 4 Intel 2.27GHz Xeons with 15G of memory. About once a second a process indexes ~10-20 smallish documents using the XML Update Handler. A commit happens after every update. However we see this behaviour even if the

Re: Experiencing lots of full GC runs

2010-11-18 Thread Simon Wistow
On Fri, Nov 19, 2010 at 12:01:09AM +, me said: I'm baffled - I've had way bigger indexes than this before with no performance problems. At first it was the frequent updates but the fact that it happens even when the indexer isn't running seems to put paid to that. More information: -

Re: Possible memory leaks with frequent replication

2010-11-02 Thread Simon Wistow
On Mon, Nov 01, 2010 at 05:42:51PM -0700, Lance Norskog said: You should query against the indexer. I'm impressed that you got 5s replication to work reliably. That's our current solution - I was just wondering if there was anything I was missing. Thanks!

Possible memory leaks with frequent replication

2010-11-01 Thread Simon Wistow
We've been trying to get a setup in which a slave replicates from a master every few seconds (ideally every second but currently we have it set at every 5s). Everything seems to work fine until, periodically, the slave just stops responding from what looks like it running out of memory:

Re: Sorting on arbitary 'custom' fields

2010-10-15 Thread Simon Wistow
On Mon, Oct 11, 2010 at 07:17:43PM +0100, me said: It was just an idea though and I was hoping that there would be a simpler more orthodox way of doing it. In the end, for anyone who cares, we used dynamic fields. There are a lot of them but we haven't seen performance impacted that badly so

Re: Sorting on arbitary 'custom' fields

2010-10-11 Thread Simon Wistow
On Sat, Oct 09, 2010 at 06:31:19PM -0400, Erick Erickson said: I'm confused. What do you mean that a user can set any number of arbitrarily named fields on a document. It sounds like you are talking about a user adding arbitrarily may entries to a multi-valued field? Or is it some kind of

Problems indexing spatial field - undefined subField

2010-08-31 Thread Simon Wistow
I'm trying to index a latLon field. I have a fieldType in my schema.xml that looks like fieldType name=latLon class=solr.LatLonType subFieldSuffix=_latLon/ and a field that looks like field name=location type=latLon indexed=true stored=true/ I'm trying upload via the JSON update handler but

Re: Problems indexing spatial field - undefined subField

2010-08-31 Thread Simon Wistow
On Wed, Sep 01, 2010 at 01:05:47AM +0100, me said: I'm trying to index a latLon field. fieldType name=latLon class=solr.LatLonType subFieldSuffix=_latLon/ field name=location type=latLon indexed=true stored=true/ Turns out changing it to fieldType name=latLon class=solr.LatLonType

Re: Getting the character offset from highlighted fragments

2010-04-22 Thread Simon Wistow
On Thu, Apr 22, 2010 at 02:15:08AM +0100, me said: It looks like org.apache.lucene.search.highlight.TextFragment has the right information to do this (i.e textStartPos) Turns out that it doesn't seem to have the right information in that textStartPos always seems to be 0 (and textEndPos just

Getting the character offset from highlighted fragments

2010-04-21 Thread Simon Wistow
Having poked around little it doesn't look like there's an query param to turn this on but it'd be really useful if highlighted fragments could have a character offset return somehow - maybe something like lst name=highlighting lst name=27314523 arr name=content str offset=600

Re: Slow QueryComponent.process() when queries have numbers in them

2010-02-05 Thread Simon Wistow
On Wed, Feb 03, 2010 at 07:38:13PM -0800, Lance Norskog said: The debugQuery parameter shows you how the query is parsed into a tree of Lucene query objects. Well, that's kind of what I'm asking - I know how the query is being parsed: str name=rawquerystringmyers 8e psychology chapter 9/str

Slow QueryComponent.process() when queries have numbers in them

2010-02-03 Thread Simon Wistow
According to my logs org.apache.solr.handler.component.QueryComponent.process() takes a significant amount of time (5s but I've seen up to 15s) when a query has an odd pattern of numbers in e.g neodymium megagauss-oersteds (MGOe) (1 MG·Oe = 7,958·10³ T·A/m = 7,958 kJ/m³ myers 8e psychology

Problems with spellchecker

2010-01-20 Thread Simon Wistow
The spellchecker in my 1.4 install started behaving increasingly erratically andsuggestions would only be returned some of the time with the same query. I tried to force a rebuild using spellcheck.build=yes The full request being /select/?q=alexandr the great indent=on fl=title

Oddly slow replication

2009-12-07 Thread Simon Wistow
I have a Master server with two Slaves populated via Solr 1.4 native replication. Slave1 syncs at a respectable speed i.e around 100MB/s but Slave2 runs much, much slower - the peak I've seen is 56KB/s. Both are running off the same hardware with the same config - compression is set to

Re: Oddness with Phrase Query

2009-11-23 Thread Simon Wistow
On Mon, Nov 23, 2009 at 12:10:42PM -0800, Chris Hostetter said: ...hmm, you shouldn't have to reindex everything. arey ou sure you restarted solr after making the enablePositionIncrements=true change to the query analyzer? Yup - definitely restarted what do the offsets look like when you

Re: Oddness with Phrase Query

2009-11-17 Thread Simon Wistow
On Tue, Nov 17, 2009 at 11:09:38AM -0800, Chris Hostetter said: Several things about your message don't make sense... Hmm, sorry - a byproduct of building up the mail over time I think. The query ?q=Here there be dragons fl=id,title,score debugQuery=on qt=dismax qf=title gets echoed as

Oddness with Phrase Query

2009-11-09 Thread Simon Wistow
I have a document with the title Here, there be dragons and a body. When I search for q = Here, there be dragons qf = title^2.0 body^0.8 qt = dismax Which is parsed as +DisjunctionMaxQuery((content:here dragon^0.8 | title:here dragon^2.0)~0.01) () I get the document as the first hit which

Re: Issues with SolrJ and IndexReader reopening

2009-11-04 Thread Simon Wistow
On Fri, Oct 30, 2009 at 11:20:19AM +0530, Shalin Shekhar Mangar said: That is very strange. IndexReaders do get re-opened after commits. Do you see a commit message in the Solr logs? Sorry for the delay - I've been trying to puzzle over this some more. The code looks like

Issues with SolrJ and IndexReader reopening

2009-10-29 Thread Simon Wistow
We've been trying to build an indexing pipeline using SolrJ but we've run into a couple of issues - namely that IndexReaders don't seem to get reopened after a commit(). After an index or delete the change doesn't show up until I restart solr. I've tried commit() and commit(true, true) just

Index Corruption (possibly during commit)

2009-10-19 Thread Simon Wistow
We have an indexing script which has been running for a couple of weeks now without problems. It indexes documents and then periodically commit (which is a tad redundant I suppose) both via the HTTP interface. All documents are indexed to a master and a slave rsyncs them off using the standard

'Down' boosting shorter docs

2009-10-14 Thread Simon Wistow
Our index has some items in it which basically contain a title and a single word body. If the user searches for a word in the title (especially if title is of itself only oen word) then that doc will get scored quite highly, despite the fact that, in this case, it's not really relevant. I've

Advantages of different Servlet Containers

2009-10-02 Thread Simon Wistow
I know that the Solr FAQ says Users should decide for themselves which Servlet Container they consider the easiest/best for their use cases based on their needs/experience. For high traffic scenarios, investing time for tuning the servlet container can often make a big difference. but is