RE: Manage schema.xml via Solrj?

Davis, Daniel (NIH/NLM) [C] Fri, 08 Jan 2016 07:02:45 -0800

Bob,

XY problem means that you are presenting the imagined solution without 
presenting the problem to solve.   In other words, you are presenting X (solve 
for X), without a full statement of the equation to be solved for X.

My guess at your problem is the same as my problem - editing Solr configuration 
(schema and solrconfig.xml) as files is very flexible and Agile compared to a 
form based solution, but that comes with the downside that anyone can "crash" a 
Solr collection by editing the schema wrong.   This goes beyond just XML syntax 
checking, obviously.    But only Solr is the authority on what a good schema 
(and other configuration) should look like.

I'm working on a tool that can provide a bit of "smoke testing" on a Solr 
configuration directory.   The workflow I envision is like this:

1. DEVELOPER, TEAM LEAD, or SOLR ADMIN MAKE CHANGES TO CONFIGURATION DIRECTORY

     In the beginning, they may need to make lots of changes.   Eventually, 
they are only making small changes, but we don't want those
     Small changes to crash anything.

2. DEVELOPER, TEAM LEAD, or SOLR ADMIN TRIGGER CONTINUOS INTEGRATION

     When they push or merge to a git branch,  that may trigger a CI workflow.  
 The workflow works like this:

         2a.  Run the "smoke test" tool to (a) create a temporary configset in 
Zookeeper, (b) create a temporary collection in SolrCloud, and (c) do simple 
indexing.
         2b.  Use zkCli.sh and solr.sh to update the actual configset and 
collection in SolrCloud.

3. ITERATE

     This can happen again and again with a "staging", "QA", "Production" set 
of branches.    Other checks can be put into the CI workflow as well.

So, along the way to having this vision (of my solution), I also considered the 
advantage of schemaless systems.   I don't want to throw stones, but I think 
schemaless is mostly a marketing term for a couple of reasons:

 - I do Linked Data/RDF - it is different from SQL, but not schemaless.   If 
your "vocabulary" is badly designed, then your users will have problems.
 - ElasticSearch is not really schemaless.   Any ElasticSearch conference is 
filled with tracks/sessions on how to get your "field mappings" right, and what 
happens if you don't (too big indexes, need to re-index to fix stuff, etc.)
 - IBM Watson Explorer is not really schemaless - your update document has to 
specify the type and treatment of each field, or your XSLT must transform your 
document into a structure that does so.

Many of us have also seen what happens with non-dernormalized SQL or fully 
normalized SQL.   "Schemafull" ought to be a marketing term as well.

-----Original Message-----
From: Bob Lawson [mailto:bwlawson...@gmail.com] 
Sent: Friday, January 08, 2016 8:30 AM
To: solr-user@lucene.apache.org
Subject: Re: Manage schema.xml via Solrj?

Thanks for the replies.  The problem I'm trying to solve is to automate 
whatever steps I can in configuring Solr for our customer.  Rather than an 
admin have to edit schema.xml, I thought it would be easier and less 
error-prone to do it programmatically.  But I'm a novice, so if there is a 
better, more standard way, please let me know.  Thanks!!!

PS:  What do you mean by "XY problem"?

On Thu, Jan 7, 2016 at 11:20 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> I'd ask first what the high-level problem you're trying to solve is, 
> this could be an XY problem.
>
> That said, there's the Schema API you can use, see:
> https://cwiki.apache.org/confluence/display/solr/Schema+API
>
> You can access it from the SolrJ library, see SchemaRequest.java. For 
> examples of using this, see:
> SchemaTest.java
>
> to _get_ the Solr source code to see these, see:
> https://wiki.apache.org/solr/HowToContribute
>
> Best,
> Erick
>
> On Thu, Jan 7, 2016 at 7:01 PM, Binoy Dalal <binoydala...@gmail.com>
> wrote:
> > I am not sure about solrj but you can use any XML parsing library to 
> > achieve this.
> > Take a look here:
> > http://www.tutorialspoint.com/java_xml/java_xml_parsers.htm
> >
> > On Fri, 8 Jan 2016, 08:06 Bob Lawson <bwlawson...@gmail.com> wrote:
> >
> >> I want to programmatically make changes to schema.xml using java to 
> >> do it.  Should I use Solrj to do this or is there a better way?  
> >> Can I use Solrj to make the rest calls that make up the schema API?  
> >> Whatever the answer, can anyone point me to an example showing how to do 
> >> it?  Thanks!
> >>
> >> --
> > Regards,
> > Binoy Dalal
>

RE: Manage schema.xml via Solrj?

Reply via email to