Interesting...I guess I had logically assumed that having type="index" meant
it wasn't used for query time, but I see why that's not possible.  Here's
the thing though: We had one field defined using this fieldtype and we
deployed the new schema to solr when we started seeing the issue.  However,
we had not yet released our code that was using the new field (obviously we
have to make the change on the solr end before the code, so we
asynchronously do this offset by a few days).  So the field that was of that
fieldtype wasn't even being queried against.

The problem for us would be pretty easy to reproduce, but I don't think our
sys admins would appreciate experimenting with our production solr servers. 
We can pretty much only reproduce on our live environment because that's the
only environment that's really getting regular (100 qps) traffic, so I guess
you could say that it is traffic related.  

Just some other notes, we have a distributed index across 3 shards.  We also
regularly pick up snapshots from the master server about once per hour, so
whatever commits happen during snapinstalling may affect it, but the
timeline of the memory growing doesn't really line up with those commits.

Anyway, I know it all seems like mystery and I apologize if it seems like
I'm being vague, but the issue really is that simple.  Hopefully if someone
else ever experiences it they can come up with a better explanation why. 
Until then, we decided to just deploy our custom classes "the old way" by
exploding the war and placing the jars in there - not nearly as convenient,
but we haven't experienced any problems doing it this way (same code and
config btw, so since the only difference is using the lib directory vs. not,
that's most likely the problem).

Thanks for your help


hossman wrote:
> 
> 
> : <fieldtype name="text_lc" class="solr.TextField" tokenized="false">
> :   <analyzer type="index">
> :     <tokenizer class="my.custom.TokenizerFactory"/>
> :     <filter class="my.custom.FilterFactory" words="stopwords.txt"/>
> :     <filter class="solr.LowerCaseFilterFactory"/>
> :     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> :   </analyzer>
> : </fieldtype>
>       ...
> : only do indexing on the master server.  However, with this schema in
> place
> : on the slaves, as well as our custom.jar in the solrHome/lib directory,
> we
> : run into these issues where the memory usage grows and grows without
> : explanation.
> 
> ...even if you only o indexing on the master, having a single analyzer 
> defined for a field means it's used at both index and query time (even 
> though you say 'type="index"') so a memory leak in either of your custom 
> factories could cause a problem on a query box.
> 
> This however concerns me...
> 
> : fact, in a previous try, we had simply dropped one of our custom plugin
> jars
> : into the lib directory but forgot to deploy the new solrconfig or schema
> : files that referenced the classes in there, and the issue still
> occurred.
> 
> ...this i can't think of a rational explanation for.  Can you elaborate on 
> what you can do to create this problem .. ie: does the memory usage grow 
> even when solr doesn't get any requests? or do it happen when searches are 
> executed? or when commits happen? etc...
> 
> If the problem is as easy to reproduce as you describe, can you please 
> generate some heap dumps against a server that isn't processing any 
> queries -- one from when hte server first starts up, and one from when hte 
> server crashes from an OOM (there's a JVM option for generating heap dumps 
> on OOM that i can't think of off hte top of my head)
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Plugin-Performance-Issues-tp24295010p26201123.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to