[
https://issues.apache.org/jira/browse/SOLR-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770756#action_12770756
]
Chris A. Mattmann edited comment on SOLR-1516 at 10/28/09 2:27 AM:
-------------------------------------------------------------------
I haven't really heard any comments on this issue, and I've got the impression
that not many folks write these QueryResponseWriters. To me, writing one was
invaluable. The use case was:
* I make the choice to make SOLR the gold source for search index data (I'm
dealing with planetary science and earth science data on 4-5 projects)
* I want to drive search but _also_ met output from SOLR (treating SOLR as a
search web service, with customizable output [2])
* the default SOLR XML and the 5-7 output formats didn't do it for me since I
have some specialized earth and planetary science use cases. E.g., on a few
different projects, I need to be able to:
* output FGDC XML (yes it's a standard for earth science metadata, and also
relevant for the GeoSOLR stuff)
* output custom RDF metadata
* output a particular style of JSON to plug in to some external web client,
e.g., an auto-suggest that requires its own JSON format, not SOLR's
To illustrate the reason that the 5-7 output formats didn't do it for me
either, I'll use an example. There may be the sense of, "well why didn't I
write some Java/Ruby/PHP/Python client that called SOLR and one of it's
existing wt's and then output a custom format from your favorite programming
language (PL)"? The reasons are three fold:
1. SOLR advertises that the QueryResponseWriter interface is an official SOLR
plugin and interface, at least according to:
* the Wiki documentation [1]
* the advertised published book on SOLR [2]
* Chris Hostetter's ApacheCon08 slides as part of the core SOLR
architecture in his 50K foot view diagram [3]
2. If SOLR is truly a search web service, and allows for changeable output
formats (evidenced by exposing the wt parameter), then why force people to use
one of the existing wt's and then ask them to transform (either via a PL, or
via XSLT) instead of allowing them to natively generate the specific output
format type?
3. Why make o.a.l.r.QueryResponseWriter an interface and not a concrete class
if it is never intended to be implemented by others, or more importantly, is
kind of non-intuitive to implement?
Besides 1-3 for me, I have external COTS and OTS tools that cannot be changed
and that expect data to be loaded into them in a particular format, and I'd
like to plug them into SOLR and the easiest way for me to do that is with a
curl/wget type operation and then a pipe into the COTS/OTS tool, and wt's are
the way to go for that.
So, given the above, when I went to write a "wt" I was surprised how hard it
was for me to understand the NamedList structure which is just a bag of objects
that you have to unpack with unfriendly instanceof checks and recursive
unmarshalling (walking the NamedList tree). All I wanted for my wt was to be
able to format the output Document List or on a Doc-by-doc basis.
Anyways just wanted to provide some further fodder and discussion for this
issue. To me this is important, and clearly, based on [1-3],
QueryResponseWriters by definition seem to be a big piece of the SOLR
architecture.
Chris
---
[1] http://wiki.apache.org/solr/QueryResponseWriter
[2]
http://people.apache.org/~hossman/apachecon2008us/btb/apache-solr-beyond-the-box.pdf
[3] SOLR 1.4 Enterprise Search Server, Packt Publishing, 2009.
was (Author: chrismattmann):
I haven't really heard any comments on this issue, and I've got the
impression that not many folks write these QueryResponseWriters. To me, writing
one was invaluable. The use case was:
* I make the choice to make SOLR the gold source for search index data (I'm
dealing with planetary science and earth science data on 4-5 projects)
* I want to drive search but _also_ met output from SOLR (treating SOLR as a
search web service, with customizable output [2])
* the default SOLR XML and the 5-7 output formats didn't do it for me since I
have some specialized earth and planetary science use cases. E.g., on a few
different projects, I need to be able to:
* output FGDC XML (yes it's a standard for earth science metadata, and also
relevant for the GeoSOLR stuff)
* output custom RDF metadata
* output a particular style of JSON to plug in to some external web client,
e.g., an auto-suggest that requires its own JSON format, not SOLR's
To illustrate the reason that the 5-7 output formats didn't do it for me
either, I'll use an example. There may be the sense of, "well why didn't I
write some Java/Ruby/PHP/Python client that called SOLR and one of it's
existing wt's and then output a custom format from your favorite programming
language (PL)"? The reasons are three fold:
1. SOLR advertises that the QueryResponseWriter interface is an official SOLR
plugin and interface, at least according to:
* the Wiki documentation [1]
* the advertised published book on SOLR [2]
* Chris Hostetter's ApacheCon08 slides as part of the core SOLR
architecture in his 50K foot view diagram [3]
2. If SOLR is truly a search web service, and allows for changeable output
formats (evidenced by exposing the wt parameter), then why force people to use
one of the existing wt's and then ask them to transform (either via a PL, or
via XSLT) instead of allowing them to natively generate the specific output
format type?
3. Why make o.a.l.r.QueryResponseWriter an interface and not a concrete class
if it is never intended to be implemented by others, or more importantly, is
kind of non-intuitive to implement?
Besides 1-3 for me, I have external COTS and OTS tools that cannot be changed
and that expect data to be loaded into them in a particular format, and I'd
like to plug them into SOLR and the easiest way for me to do that is with a
curl/wget type operation and then a pipe into the COTS/OTS tool, and wt's are
the way to go for that.
So, given the above, when I went to write a "wt" I was surprised how hard it
was for me to understand the NamedList structure which is just a bag of objects
that you have to unpack with unfriendly instanceof checks and recursive
unmarshalling (walking the NamedList tree). All I wanted for my wt was to be
able to format the output Document List or on a Doc-by-doc basis.
Anyways just wanted to provide some further fodder and discussion for this
issue. To me this is important, and clearly, based on [1-3],
QueryResponseWriters by definition seem to be a big piece of the SOLR
architecture.
Chris
---
[1] http://wiki.apache.org/solr/QueryResponseWriter
[2]
http://people.apache.org/~hossman/apachecon2008us/btb/apache-solr-beyond-the-box.pdf
[3] SOLR 1.4 Enterprise Search Server, Packt Publishing, 2009.
> DocumentList and Document QueryResponseWriter
> ---------------------------------------------
>
> Key: SOLR-1516
> URL: https://issues.apache.org/jira/browse/SOLR-1516
> Project: Solr
> Issue Type: New Feature
> Components: search
> Affects Versions: 1.3
> Environment: My MacBook Pro laptop.
> Reporter: Chris A. Mattmann
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1516.Mattmann.101809.patch.txt
>
>
> I tried to implement a custom QueryResponseWriter the other day and was
> amazed at the level of unmarshalling and weeding through objects that was
> necessary just to format the output o.a.l.Document list. As a user, I wanted
> to be able to implement either 2 functions:
> * process a document at a time, and format it (for speed/efficiency)
> * process all the documents at once, and format them (in case an aggregate
> calculation is necessary for outputting)
> So, I've decided to contribute 2 simple classes that I think are sufficiently
> generic and reusable. The first is o.a.s.request.DocumentResponseWriter -- it
> handles the first bullet above. The second is
> o.a.s.request.DocumentListResponseWriter. Both are abstract base classes and
> require the user to implement either an #emitDoc function (in the case of
> bullet 1), or an #emitDocList function (in the case of bullet 2). Both
> classes provide an #emitHeader and #emitFooter function set that handles
> formatting and output before the Document list is processed.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.