[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

Uwe Schindler (JIRA) Mon, 11 Jan 2010 15:53:20 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798937#action_12798937
 ]


Uwe Schindler commented on SOLR-1677:
-------------------------------------

{quote}
My suggestion for how to implement this would be...

# Add a new "luceneMatchVersion" attribute to the existing <schema/> tag.
# Add a new getLuceneMatchVersion() to the IndexSchema class ... SolrCore can 
use this to get the default.
# When init()ing new objects, include the key=>value pair of 
{{"luceneMatchVersion"=>schema.getLuceneMatchVersion()}} to the init method of 
the object if it's not already an init param for that particular instance.

This would eliminate the need to make any of the Analysis Factories 
SolrCoreAware (or even ResourceLoaderAware) just to know what the 
luceneMatchVersion should be -- the Base*Factories could still contain a 
{{protected Version luceneMatchVersion}} set by the base init() method that 
subclasses could use as needed.

NOTE: This still doesn't doesn't solve the "Analyzers must have no-arg 
constructors" part of hte issue -- but it doesn't make it worse.  We can make 
IndexSchema pass this.getLuceneMatchVersion() to any Analyzer with a single arg 
"Version" constructor fairly easily.  If/When we provide a more general 
mechanism for passing constructor args to Analyzers, any Version params could 
be defaulted just like with the factory init() methods.
{quote}

That was my proposal a few comments above. But: I still do not want it in 
schema.xml, as Version is a global Lucene thing! But the behaviour would be the 
same: The schema code can get the version from somewhere and pass it down to 
all schema components as you propose.

The Analyzers must have no-arg ctor is easy: Use reflection and look first for 
a ctor with Version, if exist use and pass ctor init/schema/config arg, if not 
exisatent use no-arg ctor. We already have this in Lucene's benchmark contrib 
since 3.0.

> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
> BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1677
>                 URL: https://issues.apache.org/jira/browse/SOLR-1677
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Uwe Schindler
>         Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
> SOLR-1677.patch
>
>
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
> compatibility with old indexes created using older versions of Lucene. The 
> most important example is StandardTokenizer, which changed its behaviour with 
> posIncr and incorrect host token types in 2.4 and also in 2.9.
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
> much more Unicode support, almost every Tokenizer/TokenFilter needs this 
> Version parameter. In 2.9, the deprecated old ctors without Version take 
> LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base 
> factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
> 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
> contains a helper map to decode the version strings, but in 3.0 is can be 
> replaced by Version.valueOf(String), as the Version is a subclass of Java5 
> enums. The default value is Version.LUCENE_24 (as this is the default for the 
> no-version ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from 
> StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
> to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

Reply via email to