Re: Namespaces in response (SOLR-1586)

Chris Hostetter Fri, 11 Dec 2009 11:58:34 -0800

: > : I think the initial geosearch feature can start off with
: > : <str>10,20</str> for a point.
: > 
: > +1.
: 
: Fundamentally, how is a string a point?

Fundementally a string is not a point, and a point is not a string -- but
if you want express the concept of a point in a manner that only uses very
simple primative types, then a string containing comma seperated numbers
is a pretty dencet way to do it. If you'd prefer, a pair of numbers would
workd just as well...

: > The current XML format SOlr uses was designed to be extremely simple, very
: > JSON-esque, and easily parsable by *anyone* in any langauge, without
: > needing special knowledge of types .
:
: Whoah. I'm totally confused now. Why have FieldTypes then? When not just use
: Lucene? The use case for FieldTypes is _not_ just for indexing, or querying.
: It's also for representation?

No, actually the use case for FieldTYpes is entirely about the internal
logic of how Solr should deal with those fields, and how various
operations should work on them. FieldTypes can dictate the internal
representation within the confines of a Lucene index, but they should not
circumvent the contracts of the response writers in dictating what
is/isn't a legal response.

XMLWriter.writePrim may be public, which means there is a loophole that
plugin writers can exploit to add new tag names to the Solr XML response that
violate the contract (and no we don't have a formal XSD or DTD for our
XML response format, but we still have a very well advertised contract) --
but that doesn't mean that code which ships with Solr should exploit those
loopholes to violate that contract. People should expect that if they use
Solr as is without any custom code that the XMLResponseWriter won't all of
the sudden start including new, non-primitive-ish, XML tags/attributes
that weren't there before.

That's the entire point of the format as it was designed: break down
whatever complex data might be involved in a response into easily
digestible maps/lists of maps/lists of very primitive types that can
easily be used in any programming langauge.

: allowed for a while I think), why prevent it? Allowing namespaces does _not_
: break anything.
...
: > introducing a new 'point" concept, wether as <point> or as
: > <georss:point/>, is going to break things for people.
:
: Show me an example, I fundamentally disagree with this.

Ok. Let's start with SolrJ then: take a look at the KnownType enum (line
151) in XMLResponseParser...

http://svn.apache.org/viewvc/lucene/solr/trunk/src/solrj/org/apache/solr/client/solrj/impl/XMLResponseParser.java?revision=819403&view=markup

...or let's do a random google code search for "solr xml lst" -- check out
ResponseContentHandler in solrpy...

http://code.google.com/p/solrpy/source/browse/trunk/solr/core.py#841

...I can't write python code to save my life, but I have pretty good idea
what that code will do if it sees an unexpected tag.

This is how a *LOT* of SOlr client libraries are implemented ... it's not
an issue of broken XML parsers freaking out about namespaces, it's an
issue of having a long standing, heavily advertised "schema" for the XML
response that promises to only ever use a handful of types. Adding any
new tags to this format (regardless of how easy it may be because of that
stupid fucking "public" modifier on XMLWuiter.writePrim) will absolutely
break things for people.

: And why is that? Isn't the point of SOLR to expand to use cases brought up
: by users of the system? As long as those use cases can be principally
: supported, without breaking backwards compatibility (or in that case, if
: they do, with large blinking red text that says it), then you're shutting
: people out for 0 benefit? It's aesthetics we're talking about here.

I don't know if i'd say that's the point of Solr, but yes we should
absolutely try to grow the capabilities of the system as new use cases
come along.

I am 100% in agreement that the existing "simple" XMLRresponseWriter is
not for everyone -- Historicly we've tried to maintain a sense of equality
between all of hte Response writers, so that they all contained the same
data just with different markup -- but there are clearly cases where it
would be nice to have a response writer that is allowed to "know more"
about teh real structure of the data and represent it in a manner that
more closely represents it's purpose. This was the entire point behind
adding FieldType.toOBject, and UUIDFIeld w/the BinaryResponseWriter is a
good example of the model we should follow in the future.

There is a clear push for Solr to natively be able to generated responses that
incorporate more "industry standard" XML schemas, and i would love to see
us start adding functionality to do that, but bastardizing the existing
XMLResponseWriter format is not the way to do it.

Bottom Line: I am a big fat -1 on any patch to Solr that adds new xml tags
to the output generated by the XMLResponseWriter. Feel free to call me
stubborn, call me obstinant, call me pedantic -- but there is no way in
hell i'm going to support a patch that does that.

-Hoss

Re: Namespaces in response (SOLR-1586)

Reply via email to