As pointed out in this thread...
http://www.nabble.com/NullPointerException-at-lucene.analysis.StopFilter-with-1.3-to17564627.html#a17564627
3 of the TokenFilterFactories currently on the trunk are not actually
backwards compatible with Solr 1.2...
- StopFilterFactory
- SynonymFilterFactory
- EnglishPorterFilterFactory
This is because they were changed so that "setup" code formerly in the
init(Map) method was moved to the new inform(ResourceLoader) method in
order to make them no longer depend on the SolrCore.getSolrCore()
singleton.
Introducing the ResourceLoaderAware interface allowed us to ensure that
any existing custom analysis Factories people might have registered in
their schema could still continue to work as they did -- but by changing
these 3 factory implementations we broke the somewhat unexpected usecase
of people with code that explicitly constructs a StopFilterFactory and
calls init on it expecting it to now be ready for use.
Three possible ways of dealing with this incompatibility come to mind...
1) Delayed Initialization on First Use.
We can add code to the create method of each of these factories that does
a quick check to see if the inform method was ever called, and if it
wasn't then use the SolrCore singleton to do so...
if(null == stopWords) { // :TODO:remove when singlton is removed
this.inform(SolrCore.getSolrCore().getSolrConfig().getResourceLoader());
}
...this could be made more robust using various lazy initialization
techniques (fun fact i learned from Josh Bloch last week: double checked
locking does work in Java1.5 if you use volitile and cut/paste it exactly
from his book so that you use the appropriate temporary variable)
2) Superclass Insertion
Rename the current factory implementaitons using new names, create new
(deprecated) subclasses using the existing names that call make the same
"this.inform(...)" call as mentioned above but during the init method.
change the example schema to advocate the new class names and advocate
in CHANGES.txt that existing users change their schemas to refer to the
new names.
3) Documentation and Education
Since this wasn't exactly a use case we ever advertised, we could punt on
the problem by putting a disclaimer in the CAHNGES.txt that ayone directly
constructing those 3 classes should explicitly call inform() on the
instances after calling init.
#3 is obviously the simplest approach as developers, and to be quite
honest: probably impacts the fewest total number of people (since there
are probably very few people constructing Factory instances themselves)
compared to the potential performance impacts of #1, or the need for many
people to change their schemas in order to benefit from MultiCores if
we go with #2 (particularly since with option #2, users with existing
schemas that don't change them, but do start using multicores will
silently get stopwords and synonms from the "last" core loaded in all
other cores).
Opinions?
-Hoss