[
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800471#action_12800471
]
Hoss Man commented on SOLR-1677:
--------------------------------
bq. And I also can't see anyone really spending time to aggressively ensure
that the example schema etc is all up to date
I think you are vastly underestimating how much work is spent reviewing the
example schema.xml prior to releases. It would be trivial to search/replace
luceneMatchVersion="X" with luceneMatchVersion="Y" anytime the "current"
version of Version was updated in Lucene-Java
bq. the hardcoded 2.4 behavior is the action at a distance, because if i do not
specify Version in my configuration file, then i get this very old behavior.
I don't follow you at all -- you have identified no action, or distance in your
example.
When i say i'm worried about scary action at a distance, i'm talking about
editing some thing A in a config file, and having it result in changed behavior
(action) in things B, C and D that do not directly refer to A in any way
(distance). Further more these changes in behavior are silent (thus scary).
If I have {{<fieldType name="A"/>}} and much later in the config {{<field
name="B" type="A"/>}} the editing A results in and action on B at a distance --
but this should not suprise me at all because B explicitly refrences A.
Having a global {{<luceneMatchVersion/>}} tag that affects the behavior of a
variety of different things when it's modified leads to situations where people
might change that value triggering changes in many components w/o a clear idea
of what might have changed -- so they don't even know what things they should
focus on testing for correctness after makign that change.
The existing {{<schema version="X"/>}} property also leads to action at a
distance type situations -- but that is a lot less scary to me because at least
with it there is a uniform set of changes to *all* schema objects between any
two versions, so it's easy to document what cahnges when you go from 1.1 to
1.2, or 1.2 to 1.3 ... but with luceneMatchVersion the potential changes are
unique to every individual Class that cares about it.
{quote}
If this is really your concern, then i have an alternative i propose.
* No default anywhere, not even in the code
* Version is mandatory if the thing requires it
{quote}
This is something Uwe and i both discussed in previous comments...
https://issues.apache.org/jira/browse/SOLR-1677?focusedCommentId=12796872#action_12796872
https://issues.apache.org/jira/browse/SOLR-1677?focusedCommentId=12796937#action_12796937
...as i said: i'm fine with this idea in theory -- as a long term plan -- but
there has to be a gradual migration process for people. ie: it can be required
on certain objects in a future release, but for at least the next release it
needs to be possible to not specify the luceneMatchVersion on all of these
objects, and when people use them w/o specifying, they can log big fat warnings
on initi that it is defaulting to 2.4, and they should set the property
explicitly if that's what they want.
----
bq. I still do not want it in schema.xml, as Version is a global Lucene thing!
Uwe: I think you are missunderstanding the reason for a distinction between
solrconfig.xml and schema.xml in Solr. If (for hte sake of argument)
luceneMatchVersion really should be a "global Lucene thing" then that is
precisely why it should be in schema.xml.
schema.xml is for configuration that is inheriently part of the index, and must
be consistent regardless of who/how/why that index is being used.
solrconfig.xml is where settings are put that are specific to how a a
particular instance of an index is being used. If a setting is in
solrconfig.xml, then it should to be possible for that setting to be completley
different on differnet solr instances that use the exact same schema.xml --
even if they use cloned copies of the same index directory. (ie: master/slave
distinctions in replication; peer slaves with distinct handler/cache settings
to serve distinct use cases; etc...)
That's the reason why nothing that hangs off of IndexSchema is currently
allowed to be SolrCoreAware, or get access to the SolrConfig object (and the
SolrResourceLoader abstraction was created) ... nothing about the SolrCore
"instance" should be allowed to influence the resulting index, because that
index may later be used on a differnet instance with a different config.
As i mentioned before: solrconfig.xml can depend on schema.xml, but schema.xml
can not depend on solrconfig.xml
So if a global luceneMatchVersion can affect the behavior of an analyzer or
FieldType in a way that is "persisted" as part of hte index -- and other
classes (like QueryParser in Robert's example) need to make sure to use the
same luceneMatchVersion to behave correctly with that index, then that setting
needs to be in the schema.xml so it is consistent no matter how/where that
index and schema.xml file are used.
Does that make sense?
----
I'd still like to clarify this whole issue of wether "Lucene-Java", as a
project, has an expectation that client applications will always use a
consistent value for Version when constructing objects that interact with an
index, as Robert alluded to in a previous comment...
bq. I don't think Version is intended so you can use X.Y on this part and Y.Z
on this part
This was not my impression when Version was added -- but i freely admit I wasn'
paying that much attention.
In Uwe's comment he implied (but didn't actually state) that he concurred with
Robert...
bq. ...Version is a global Lucene thing...
*Iff* that expectation really is true in Lucnee-Java, and *iff* there really is
an expectation that using multiple Version values withing Solr is likely to
cause people problems as objects interact, then it seems to be that it be a
very bad idea to offer to any sort of out of the box support for per object
overriding of luceneMatchVersion in our solrconfig.xml/schema.xml.
i know, i know ... this is a complete 180 from my previous claim that we should
_only_ have per object configuration -- a claim that i still stand behind if
Lucene-Java "supports" applications using multiple values of Version, but if
that is not considered "supported" and if changes are actively being made in
Lucene-Java that explicitly assume consistent Version usage, then I'm not
convinced it owuld be a good idea to enable people to tweak things in that way.
Anyone who understands the underlying Java code enough to appreciate the
nuances of using A.B in one place and X.Y in another place can write their own
Factory that looks at a luceneMatchVersion nit param -- the out of hte box ones
should stick with the global setting.
BUT!!!!! ... those are Big "IFFs" ...
* Uwe: do you concur with Robert?
* Are there any threads/docs about the expecations of Version
homo/hetero-genousness in Lucene-Java?
> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and
> BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>
> Key: SOLR-1677
> URL: https://issues.apache.org/jira/browse/SOLR-1677
> Project: Solr
> Issue Type: Sub-task
> Components: Schema and Analysis
> Reporter: Uwe Schindler
> Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch,
> SOLR-1677.patch
>
>
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards
> compatibility with old indexes created using older versions of Lucene. The
> most important example is StandardTokenizer, which changed its behaviour with
> posIncr and incorrect host token types in 2.4 and also in 2.9.
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with
> much more Unicode support, almost every Tokenizer/TokenFilter needs this
> Version parameter. In 2.9, the deprecated old ctors without Version take
> LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base
> factories. Subclasses then can use the luceneMatchVersion decoded enum (in
> 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently
> contains a helper map to decode the version strings, but in 3.0 is can be
> replaced by Version.valueOf(String), as the Version is a subclass of Java5
> enums. The default value is Version.LUCENE_24 (as this is the default for the
> no-version ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from
> StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed
> to match Lucene 3.0.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.