On 6/14/2018 12:10 PM, Terry Steichen wrote: > I don't disagree at all, but have a basic question: How do you easily > transition from a system using a dynamic schema to one using a fixed one?
Not sure you need to actually transition. Just remove the config in solrconfig.xml that causes Solr to invoke the update chain where the unknown fields are added, upload the new config to zookeeper, and reload the collection. When you do that, indexing with unknown fields will fail, and if the indexing program has good error handling, somebody is going to notice the failure. The major difficulty with this will be more of a people problem than a technical problem. You have to convince people who use the Solr install that it's a lot better that they get an indexing error and ask you to fix it. They may not care that you've got a major problem on your hands when the system makes a mistake adding a field. > I'm runnning 6.6.0 in cloud mode (only because it's necessary, as I > understand it, to be in cloud mode for the authentication/authorization > to work). In my server/solr/configsets subdirectory there are > directories "data_driven_schema_configs" and "basic_configs". Both > contain a file named "managed_schema." Which one is the active one? As of Solr 6.5.0, the basic authentication plugin also works in non-cloud (standalone) mode. https://issues.apache.org/jira/browse/SOLR-9481 I will typically recommend cloud mode to anyone setting up a brand new Solr installation, mostly because it automates a lot of the steps of setting up high availability. I don't use cloud mode myself, because it didn't exist when I set up my systems. Converting to cloud mode would require rewriting all of the tools I've written that keep the indexes up to date. I might do that one day, but not today. In cloud mode, neither of the managed-schema files you have mentioned is active. The active config (solrconfig.xml, the schema, and all files mentioned in either of those) is in zookeeper, not on the disk. > From the AdminUI, each collection has an associated "managed_schema" > (under the "Files" option). I'm guessing that this collection-specific > managed_schema is the result of the automated field discovery process, > presumably using some baseline version (in configsets) to start with. If you create a collection with "bin/solr create", the config that you give it is usually uploaded to zookeeper and all shard replicas in the collection use that uploaded config. In older versions like 6.6.0, basic_configs is used if no source config is named. In newer versions, _default is used. When the update processor adds an unknown field, it is added to the managed-schema file in zookeeper and the collection is reloaded. The source configset on disk is not touched. > If that's true, then it would presumably make sense to save this > collection-specific managed_schema to disk as schema.xml. I further > presume I'd create a config subdirectory for each of said collections > and put schema.xml there. Is that right? As long as you're in cloud mode, all your index configs are in zookeeper. Any config you have on disk is NOT what is actually being used. https://lucene.apache.org/solr/guide/6_6/using-zookeeper-to-manage-configuration-files.html > Every time I read (and reread, and reread, ...) the Solr docs they seem > to be making certain (very basic) assumptions that I'm unclear about, so > your help in the preceding would be most appreciated. The Solr documentation is not very friendly to novices. Writing documentation that an expert can use is sometimes difficult, but most developers can manage it. Writing documentation that a novice can use is much harder, because it's not easy for someone who has intimate knowledge of the system to step back and look at it from a place where that knowledge isn't available. Some success has been achieved in later documentation versions. It's going to take a lot of time and effort before most of Solr's documentation is novice-friendly. Thanks, Shawn