On 6/14/2018 12:10 PM, Terry Steichen wrote:
> I don't disagree at all, but have a basic question: How do you easily
> transition from a system using a dynamic schema to one using a fixed one?

Not sure you need to actually transition.  Just remove the config in
solrconfig.xml that causes Solr to invoke the update chain where the
unknown fields are added, upload the new config to zookeeper, and reload
the collection.  When you do that, indexing with unknown fields will
fail, and if the indexing program has good error handling, somebody is
going to notice the failure.

The major difficulty with this will be more of a people problem than a
technical problem.  You have to convince people who use the Solr install
that it's a lot better that they get an indexing error and ask you to
fix it.  They may not care that you've got a major problem on your hands
when the system makes a mistake adding a field.

> I'm runnning 6.6.0 in cloud mode (only because it's necessary, as I
> understand it, to be in cloud mode for the authentication/authorization
> to work).  In my server/solr/configsets subdirectory there are
> directories "data_driven_schema_configs" and "basic_configs".  Both
> contain a file named "managed_schema."  Which one is the active one?

As of Solr 6.5.0, the basic authentication plugin also works in
non-cloud (standalone) mode.

https://issues.apache.org/jira/browse/SOLR-9481

I will typically recommend cloud mode to anyone setting up a brand new
Solr installation, mostly because it automates a lot of the steps of
setting up high availability.  I don't use cloud mode myself, because it
didn't exist when I set up my systems.  Converting to cloud mode would
require rewriting all of the tools I've written that keep the indexes up
to date.  I might do that one day, but not today.

In cloud mode, neither of the managed-schema files you have mentioned is
active.  The active config (solrconfig.xml, the schema, and all files
mentioned in either of those) is in zookeeper, not on the disk.

> From the AdminUI, each collection has an associated "managed_schema"
> (under the "Files" option).  I'm guessing that this collection-specific
> managed_schema is the result of the automated field discovery process,
> presumably using some baseline version (in configsets) to start with.

If you create a collection with "bin/solr create", the config that you
give it is usually uploaded to zookeeper and all shard replicas in the
collection use that uploaded config.  In older versions like 6.6.0,
basic_configs is used if no source config is named.  In newer versions,
_default is used.

When the update processor adds an unknown field, it is added to the
managed-schema file in zookeeper and the collection is reloaded.  The
source configset on disk is not touched.

> If that's true, then it would presumably make sense to save this
> collection-specific managed_schema to disk as schema.xml.  I further
> presume I'd create a config subdirectory for each of said collections
> and put schema.xml there.  Is that right?

As long as you're in cloud mode, all your index configs are in
zookeeper.  Any config you have on disk is NOT what is actually being used.

https://lucene.apache.org/solr/guide/6_6/using-zookeeper-to-manage-configuration-files.html

> Every time I read (and reread, and reread, ...) the Solr docs they seem
> to be making certain (very basic) assumptions that I'm unclear about, so
> your help in the preceding would be most appreciated.

The Solr documentation is not very friendly to novices.  Writing
documentation that an expert can use is sometimes difficult, but most
developers can manage it.  Writing documentation that a novice can use
is much harder, because it's not easy for someone who has intimate
knowledge of the system to step back and look at it from a place where
that knowledge isn't available.  Some success has been achieved in later
documentation versions.  It's going to take a lot of time and effort
before most of Solr's documentation is novice-friendly.

Thanks,
Shawn

Reply via email to