3 TokenFilter factories not compatible with 1.2

Chris Hostetter Wed, 04 Jun 2008 16:04:05 -0700


As pointed out in this thread...


http://www.nabble.com/NullPointerException-at-lucene.analysis.StopFilter-with-1.3-to17564627.html#a17564627

3 of the TokenFilterFactories currently on the trunk are not actuallybackwards compatible with Solr 1.2...

  - StopFilterFactory
  - SynonymFilterFactory
  - EnglishPorterFilterFactory

This is because they were changed so that "setup" code formerly in theinit(Map) method was moved to the new inform(ResourceLoader) method inorder to make them no longer depend on the SolrCore.getSolrCore()singleton.

Introducing the ResourceLoaderAware interface allowed us to ensure thatany existing custom analysis Factories people might have registered intheir schema could still continue to work as they did -- but by changingthese 3 factory implementations we broke the somewhat unexpected usecaseof people with code that explicitly constructs a StopFilterFactory andcalls init on it expecting it to now be ready for use.


Three possible ways of dealing with this incompatibility come to mind...

1) Delayed Initialization on First Use.

We can add code to the create method of each of these factories that doesa quick check to see if the inform method was ever called, and if itwasn't then use the SolrCore singleton to do so...

  if(null == stopWords) { // :TODO:remove when singlton is removed
   this.inform(SolrCore.getSolrCore().getSolrConfig().getResourceLoader());
  }

...this could be made more robust using various lazy initializationtechniques (fun fact i learned from Josh Bloch last week: double checkedlocking does work in Java1.5 if you use volitile and cut/paste it exactlyfrom his book so that you use the appropriate temporary variable)


2) Superclass Insertion

Rename the current factory implementaitons using new names, create new(deprecated) subclasses using the existing names that call make the same"this.inform(...)" call as mentioned above but during the init method.change the example schema to advocate the new class names and advocatein CHANGES.txt that existing users change their schemas to refer to thenew names.


3) Documentation and Education

Since this wasn't exactly a use case we ever advertised, we could punt onthe problem by putting a disclaimer in the CAHNGES.txt that ayone directlyconstructing those 3 classes should explicitly call inform() on theinstances after calling init.

#3 is obviously the simplest approach as developers, and to be quitehonest: probably impacts the fewest total number of people (since thereare probably very few people constructing Factory instances themselves)compared to the potential performance impacts of #1, or the need for manypeople to change their schemas in order to benefit from MultiCores ifwe go with #2 (particularly since with option #2, users with existingschemas that don't change them, but do start using multicores willsilently get stopwords and synonms from the "last" core loaded in allother cores).




Opinions?




-Hoss

3 TokenFilter factories not compatible with 1.2

Reply via email to