[
https://issues.apache.org/jira/browse/SOLR-344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525231
]
Hoss Man commented on SOLR-344:
-------------------------------
I've only had a chance to skim the attached PDF ... I've printed it out in the
hopes that I'll find some time to read in depth your specific ideas about what
the ideal Solr API should be; but there are a few things that jumped out at me
that I wanted to address while they were on my mind...
-- Motivation --
- Direct Java is "better" -
A key assumption in this proposal seems to be that "if you are writing a Java
app, and you want to use Solr, you should not use the HTTP interface" I would
argue strongly against this assumption. there are *lots* of reasons why it
makes sense to treat Solr as a webservice and interact with it over HTTP
instead of having a tight coupling with your Java application: redundancy, load
balancing, ... Even if someone had a situation where they only had one machine
in their entire operation, and all of their applications ran on that machine i
would still suggest installing a servlet container and using Solr that way
because it's likely they will have more then one application that will want to
deal with their index. Solr can make a lot of good optimizations and
assumptions that go right out the window if you try to embed Solr in 2
different apps reading and writing to the same physical index directory.
Even if compelling stats can be presented that the HTTP+XML/JSON overhead is in
fact a bottleneck, i would still think that pursuing something like an RMI
based client/server API in addition to the HTTP API would make more sense then
encouraging people to use directly in the JVM of their other applications.
Even the Plugin model (for embedding your custom Java code into Solr) is
something i only recommend in situations where it makes a lot of sense for that
logic to tied closely with the Solr or Lucene internals (ie: as part of the
TokenStream, or dealing with the DocSets before they are cached, etc...)
The #1 "Value Add" that Solr has over Lucene is the Client/Server abstraction
... there are certainly other value adds -- some small (like added
TokenFilters) and some big (like the IndexSchema concept) -- and many of these
could probably be refactored into the Lucene core (or a Lucene contrib) so they
could be reused by other Lucene applications in addition to Solr ... but Solr
*is* an application.
Arguing that you shouldn't bother using a client/server relationship to deal
with Solr if your application is written in Java is like arguing that you
shouldn't bother using a client/server relationship to deal with MySQL if your
application is written in C.
- Demand for direct access -
the statement "a significant proportion of questions on the mailing lists are
clearly from people who are attempting such integrations right now." does not
serve as a clear call to action ... even if a significant number of recent
questions have related to embedded Solr (and I'm not convinced the number is
that significant) that one data point alone does not clearly indicate that it
is important/urgent to make this easier to do. It just indicates that the
people who are attempting to do this have questions about how to do it ...
which isn't that suprising considering it's a relatively new concept that
hasn't really been documented. Some of these people may just be assuming that
they *need* to embed Solr in their existing Java applications because they
don't realize it's intended to be used as a server.
The [EMAIL PROTECTED] list gets lots of questions from people who misunderstand
the the demo code in the Lucene distribution and think Lucene is an application
that they can run on the command line to index files and search them -- that
doesn't mean that the Lucene-Java project should revamp itself to focus on
producing an application instead of a Library, it means the Lucene-Java
community has to help educate users about: A) how they can use the Lucene
library to build their own apps; and B) what apps are built on top of the
Lucene library that might be useful to them.
I think it would probably be more beneficial for the community as a whole if
people spent more time/energy documenting the benefits/mechanisms of using Solr
as a server, or improving the client APIs to make communicating with a Solr
server faster/easier then it would to dedicate a lot of resources solely
towards making Solr more of a library and less of an application.
-- Strategy for making changes --
All that said -- i agree with you that a lot of improvements can and should be
made to the internal APIs. Not because i think we need to make it easier to
embed Solr, but to make it easier for new developers to work on the Solr
internals (or to write plugins). if embedding Solr gets easier as a result --
great, but I don't see that as a compelling reason for change.
Somewhere in your doc, you advocated the importance of a top down complete API
overhaul instead of approaching things piecemeal (forgive me for not
remembering exactly how you put it, I'm not trying to put words in your mouth i
just remember there being a sentiment like this) ... while i think it would
definitely make sense to have some discussions on solr-dev about what the big
problems are with the internal APIs and come up with a high level picture of
what the ideal API might be so we can aim for it, the best way to get there is
with small patches that focuses on a single area.
I say this from experience as someone who has submitted patches to projects,
and as a committer who has to review patches: Big patches that change a lot of
things take a lot more work/discussion/thought to review and generally spend a
lot longer sitting in Jira then shorter most focused patches (some day I'll sit
down and do the math and write out "Hoss'ss Patch Size Theorem" but for now
take my word for it that there's an exponential factor in there somewhere).
The best way to proceed is probable to start by tackling individual pieces of
functionality, adding the API you think there should be, and refactoring the
current code to implement/use that API (leaving the old one around as
deprecated).
-- Loose APIs vs tight APIs --
While i agree there are a lot of places where thing like NamedList are
overused, don't discount the value add that this kind of "pass through" API
allows ... the decision to use things like the SolrParams class in some utility
classes was made consciously in a lot of cases, in order to make it easier for
these utilities to grow and evolve without their callers needing to be aware of
these new changes ... SimpleFacets for example takes in a generic SolrParams
and returns a NamedList so that as new functionality is added and new params
are added to control that functionality existing request handlers don't have to
be specificly aware of all those param names in order to get that
functionality. They can be if they want: they can construct a SolrParams
instance just for driving SimpleFacets behavior instead of passing through the
main request params, it's their choice ... but a very specific API, where every
query param was mapped to a constructor arg or a setter method or a command
pattern object or something else that had a tighter coupling would require
changes in RequestHandlers anytime something like Date faceting was added (or
even facet.mincount)
if i remember correctly, you pointed out in the mailing list that things like
SimpleFacets or the Highlighting utils shouldn't return NamedLists -- it should
return a more specific FacetResults/HighlightResults objects ... i would
definitely be on board patches like that. Refactoring the code to use a well
typed response object certainly would make the code easier to understand, and
new getters can always be added for pulling out new types of information as
added -- the important thing is that Result objects like this would need to be
able to translate themselves back into simple objects that can be understood by
ResponseWriters so that the various RequestHandlers/ResponseWriters don't
*need* to be aware of their details.
> New Java API
> ------------
>
> Key: SOLR-344
> URL: https://issues.apache.org/jira/browse/SOLR-344
> Project: Solr
> Issue Type: Improvement
> Components: clients - java, search, update
> Affects Versions: 1.3
> Reporter: Jonathan Woods
> Attachments: New Java API for Solr.pdf
>
>
> The core Solr codebase urgently needs to expose a new Java API designed for
> use by Java running in Solr's JVM and ultimately by core Solr code itself.
> This API must be (i) object-oriented ('typesafe'), (ii) self-documenting,
> (iii) at the right level of granularity, (iv) designed specifically to expose
> the value which Solr adds over and above Lucene.
> This is an urgent issue for two reasons:
> - Java-Solr integrations represent a use-case which is nearly as important as
> the core Solr use-case in which non-Java clients interact with Solr over HTTP
> - a significant proportion of questions on the mailing lists are clearly from
> people who are attempting such integrations right now.
> This point in Solr development - some way out from the 1.3 release - might be
> the right time to do the development and refactoring necessary to produce
> this API. We can do this without breaking any backward compatibility from
> the point of view of XML/HTTP and JSON-like clients, and without altering the
> core Solr algorithms which make it so efficient. If we do this work now, we
> can significantly speed up the spread of Solr.
> Eventually, this API should be part of core Solr code, not hived off into
> some separate project nor in a non-first-class package space. It should be
> capable of forming the foundation of any new Solr development which doesn't
> need to delve into low level constructs like DocSet and so on - and any new
> development which does need to do just that should be a candidate for
> incorporation into the API at the some level. Whether or not it will ever be
> worth re-writing existing code is a matter of opinion; but the Java API
> should be such that if it had existed before core plug-ins were written, it
> would have been natural to use it when writing them.
> I've attached a PDF which makes the case for this API. Apologies for
> delivering it as an attachment, but I wanted to embed pics and a bit of
> formatting.
> I'll update this issue in the next few days to give a prototype of this API
> to suggest what it might look like at present. This will build on the work
> already done in Solrj and SearchComponents
> (https://issues.apache.org/jira/browse/SOLR-281), and will be a patch on an
> up-to-date revision of Solr trunk.
> [PS:
> 1. Having written most of this, I then properly looked at
> SearchComponents/SOLR-281 and read
> http://www.nabble.com/forum/ViewPost.jtp?post=11050274&framed=y, which says
> much the same thing albeit more quickly! And weeks ago, too. But this
> proposal is angled slightly differently:
> - it focusses on the value of creating an API not only for internal Solr
> consumption, but for local Java clients
> - it focusses on designing a Java API without constantly being hobbled by
> HTTP-Java
> - it's suggesting that the SearchComponents work should result in a Java API
> which can be used as much by third party Java as by ResponseBuilder.
> 2. I've made some attempt to address Hoss's point
> (http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#6551097579454875774)
> - that an API like this would need to maintain enough state e.g. to allow an
> initial search to later be faceted, highlighted etc without going back to the
> start each time - but clearly the proof of the pudding will be in the
> prototype.
> 3. Again, I've just discovered SOLR-212 (DirectSolrConnection). I think all
> my comments about Solrj apply to this, useful though it clearly is.]
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.