As pointed out in this thread...

http://www.nabble.com/NullPointerException-at-lucene.analysis.StopFilter-with-1.3-to17564627.html#a17564627

3 of the TokenFilterFactories currently on the trunk are not actually backwards compatible with Solr 1.2...
  - StopFilterFactory
  - SynonymFilterFactory
  - EnglishPorterFilterFactory

This is because they were changed so that "setup" code formerly in the init(Map) method was moved to the new inform(ResourceLoader) method in order to make them no longer depend on the SolrCore.getSolrCore() singleton.

Introducing the ResourceLoaderAware interface allowed us to ensure that any existing custom analysis Factories people might have registered in their schema could still continue to work as they did -- but by changing these 3 factory implementations we broke the somewhat unexpected usecase of people with code that explicitly constructs a StopFilterFactory and calls init on it expecting it to now be ready for use.

Three possible ways of dealing with this incompatibility come to mind...

1) Delayed Initialization on First Use.
We can add code to the create method of each of these factories that does a quick check to see if the inform method was ever called, and if it wasn't then use the SolrCore singleton to do so...
  if(null == stopWords) { // :TODO:remove when singlton is removed
   this.inform(SolrCore.getSolrCore().getSolrConfig().getResourceLoader());
  }
...this could be made more robust using various lazy initialization techniques (fun fact i learned from Josh Bloch last week: double checked locking does work in Java1.5 if you use volitile and cut/paste it exactly from his book so that you use the appropriate temporary variable)

2) Superclass Insertion
Rename the current factory implementaitons using new names, create new (deprecated) subclasses using the existing names that call make the same "this.inform(...)" call as mentioned above but during the init method. change the example schema to advocate the new class names and advocate in CHANGES.txt that existing users change their schemas to refer to the new names.

3) Documentation and Education
Since this wasn't exactly a use case we ever advertised, we could punt on the problem by putting a disclaimer in the CAHNGES.txt that ayone directly constructing those 3 classes should explicitly call inform() on the instances after calling init.


#3 is obviously the simplest approach as developers, and to be quite honest: probably impacts the fewest total number of people (since there are probably very few people constructing Factory instances themselves) compared to the potential performance impacts of #1, or the need for many people to change their schemas in order to benefit from MultiCores if we go with #2 (particularly since with option #2, users with existing schemas that don't change them, but do start using multicores will silently get stopwords and synonms from the "last" core loaded in all other cores).



Opinions?




-Hoss

Reply via email to