[jira] Updated: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

Uwe Schindler (JIRA) Mon, 21 Dec 2009 06:50:41 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe Schindler updated SOLR-1677:
--------------------------------

    Attachment: SOLR-1677.patch

New patch with some schema and config hacking. Also new test:

- As a first hack the solrConfig schema has a new element <luceneMatchVersion> 
that contains a solr-wide default luceneMatchVersion value that is used as 
default for QueryParser, Analyzers if not specified different
- On the analyzer side, BaseTokenizerFactory and BaseTokenFilterFactory now 
extend SolrCoreAware (and I also allowed these classes to be SolrCoreAware) and 
get the SolrConfig.
- Both classes now use the default, if not local set as a param (like in the 
last patch), but the default is the one got from SolrConfig
- The parser for config strings was moved to Config
- Other components like QueryParserFactories can get the default matchVersion 
in the same way
- The default is LUCENE_24 as before.

This is a first idea, how it would work. Open points:
- should the default be in SolrConfig or in IndexConfig?
- I did not change the config.xsd file to reflect my change as open discussion
- all other example config files and schemas should use the default Lucene 
version shipped with the solr release (currently 2.9). So user that upgrade get 
their last lucene version their index is compatible with, and new users get the 
latest config.
- If users upgrade the default luceneMatchVersion, they have to possibly 
reindex (esp. when upgrading to LUCENE_31 soon, as new Unicode features in all 
Tokenizers/Filters)

> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
> BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1677
>                 URL: https://issues.apache.org/jira/browse/SOLR-1677
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Uwe Schindler
>         Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
> SOLR-1677.patch
>
>
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
> compatibility with old indexes created using older versions of Lucene. The 
> most important example is StandardTokenizer, which changed its behaviour with 
> posIncr and incorrect host token types in 2.4 and also in 2.9.
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
> much more Unicode support, almost every Tokenizer/TokenFilter needs this 
> Version parameter. In 2.9, the deprecated old ctors without Version take 
> LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base 
> factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
> 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
> contains a helper map to decode the version strings, but in 3.0 is can be 
> replaced by Version.valueOf(String), as the Version is a subclass of Java5 
> enums. The default value is Version.LUCENE_24 (as this is the default for the 
> no-version ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from 
> StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
> to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

Reply via email to