[jira] Issue Comment Edited: (SOLR-1516) DocumentList and Document QueryResponseWriter

Chris A. Mattmann (JIRA) Tue, 27 Oct 2009 19:28:25 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770756#action_12770756
 ]


Chris A. Mattmann edited comment on SOLR-1516 at 10/28/09 2:27 AM:
-------------------------------------------------------------------

I haven't really heard any comments on this issue, and I've got the impression 
that not many folks write these QueryResponseWriters. To me, writing one was 
invaluable. The use case was:

* I make the choice to make SOLR the gold source for search index data (I'm 
dealing with planetary science and earth science data on 4-5 projects)
* I want to drive search but _also_ met output from SOLR (treating SOLR as a 
search web service, with customizable output [2])
* the default SOLR XML and the 5-7 output formats didn't do it for me since I 
have some specialized earth and planetary science use cases. E.g., on a few 
different projects, I need to be able to:
   * output FGDC XML (yes it's a standard for earth science metadata, and also 
relevant for the GeoSOLR stuff)
   * output custom RDF metadata 
   * output a particular style of JSON to plug in to some external web client, 
e.g., an auto-suggest that requires its own JSON format, not SOLR's

  To illustrate the reason that the 5-7 output formats didn't do it for me 
either, I'll use an example. There may be the sense of, "well why didn't I 
write some Java/Ruby/PHP/Python client that called SOLR and one of it's 
existing wt's and then output a custom format from your favorite programming 
language (PL)"? The reasons are three fold:

  1. SOLR advertises that the QueryResponseWriter interface is an official SOLR 
plugin and interface, at least according to:
      * the Wiki documentation [1]
      * the advertised published book on SOLR [2]
      * Chris Hostetter's ApacheCon08 slides as part of the core SOLR 
architecture in his 50K foot view diagram [3]

2. If SOLR is truly a search web service, and allows for changeable output 
formats (evidenced by exposing the wt parameter), then why force people to use 
one of the existing wt's and then ask them to transform (either via a PL, or 
via XSLT) instead of allowing them to natively generate the specific output 
format type?

3. Why make o.a.l.r.QueryResponseWriter an interface and not a concrete class 
if it is never intended to be implemented by others, or more importantly, is 
kind of non-intuitive to implement?

Besides 1-3 for me, I have external COTS and OTS tools that cannot be changed 
and that expect data to be loaded into them in a particular format, and I'd 
like to plug them into SOLR and the easiest way for me to do that is with a 
curl/wget type operation and then a pipe into the COTS/OTS tool, and wt's are 
the way to go for that.

So, given the above, when I went to write a "wt" I was surprised how hard it 
was for me to understand the NamedList structure which is just a bag of objects 
that you have to unpack with unfriendly instanceof checks and recursive 
unmarshalling (walking the NamedList tree). All I wanted for my wt was to be 
able to format the output Document List or on a Doc-by-doc basis. 

Anyways just wanted to provide some further fodder and discussion for this 
issue. To me this is important, and clearly, based on [1-3], 
QueryResponseWriters by definition seem to be a big piece of the SOLR 
architecture.


Chris

---
[1] http://wiki.apache.org/solr/QueryResponseWriter
[2] 
http://people.apache.org/~hossman/apachecon2008us/btb/apache-solr-beyond-the-box.pdf
 
[3] SOLR 1.4 Enterprise Search Server, Packt Publishing, 2009.



      was (Author: chrismattmann):
    I haven't really heard any comments on this issue, and I've got the 
impression that not many folks write these QueryResponseWriters. To me, writing 
one was invaluable. The use case was:

* I make the choice to make SOLR the gold source for search index data (I'm 
dealing with planetary science and earth science data on 4-5 projects)
* I want to drive search but _also_ met output from SOLR (treating SOLR as a 
search web service, with customizable output [2])
* the default SOLR XML and the 5-7 output formats didn't do it for me since I 
have some specialized earth and planetary science use cases. E.g., on a few 
different projects, I need to be able to:
   * output FGDC XML (yes it's a standard for earth science metadata, and also 
relevant for the GeoSOLR stuff)
   * output custom RDF metadata 
   * output a particular style of JSON to plug in to some external web client, 
e.g., an auto-suggest that requires its own JSON format, not SOLR's

  To illustrate the reason that the 5-7 output formats didn't do it for me 
either, I'll use an example. There may be the sense of, "well why didn't I 
write some Java/Ruby/PHP/Python client that called SOLR and one of it's 
existing wt's and then output a custom format from your favorite programming 
language (PL)"? The reasons are three fold:

  1. SOLR advertises that the QueryResponseWriter interface is an official SOLR 
plugin and interface, at least according to:
      * the Wiki documentation [1]
      * the advertised published book on SOLR [2]
      * Chris Hostetter's ApacheCon08 slides as part of the core SOLR 
architecture in his 50K foot view diagram [3]
  2. If SOLR is truly a search web service, and allows for changeable output 
formats (evidenced by exposing the wt parameter), then why force people to use 
one of the existing wt's and then ask them to transform (either via a PL, or 
via XSLT) instead of allowing them to natively generate the specific output 
format type?
  3. Why make o.a.l.r.QueryResponseWriter an interface and not a concrete class 
if it is never intended to be implemented by others, or more importantly, is 
kind of non-intuitive to implement?

Besides 1-3 for me, I have external COTS and OTS tools that cannot be changed 
and that expect data to be loaded into them in a particular format, and I'd 
like to plug them into SOLR and the easiest way for me to do that is with a 
curl/wget type operation and then a pipe into the COTS/OTS tool, and wt's are 
the way to go for that.

So, given the above, when I went to write a "wt" I was surprised how hard it 
was for me to understand the NamedList structure which is just a bag of objects 
that you have to unpack with unfriendly instanceof checks and recursive 
unmarshalling (walking the NamedList tree). All I wanted for my wt was to be 
able to format the output Document List or on a Doc-by-doc basis. 

Anyways just wanted to provide some further fodder and discussion for this 
issue. To me this is important, and clearly, based on [1-3], 
QueryResponseWriters by definition seem to be a big piece of the SOLR 
architecture.


Chris

---
[1] http://wiki.apache.org/solr/QueryResponseWriter
[2] 
http://people.apache.org/~hossman/apachecon2008us/btb/apache-solr-beyond-the-box.pdf
 
[3] SOLR 1.4 Enterprise Search Server, Packt Publishing, 2009.


  
> DocumentList and Document QueryResponseWriter
> ---------------------------------------------
>
>                 Key: SOLR-1516
>                 URL: https://issues.apache.org/jira/browse/SOLR-1516
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>         Environment: My MacBook Pro laptop.
>            Reporter: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1516.Mattmann.101809.patch.txt
>
>
> I tried to implement a custom QueryResponseWriter the other day and was 
> amazed at the level of unmarshalling and weeding through objects that was 
> necessary just to format the output o.a.l.Document list. As a user, I wanted 
> to be able to implement either 2 functions:
> * process a document at a time, and format it (for speed/efficiency)
> * process all the documents at once, and format them (in case an aggregate 
> calculation is necessary for outputting)
> So, I've decided to contribute 2 simple classes that I think are sufficiently 
> generic and reusable. The first is o.a.s.request.DocumentResponseWriter -- it 
> handles the first bullet above. The second is 
> o.a.s.request.DocumentListResponseWriter. Both are abstract base classes and 
> require the user to implement either an #emitDoc function (in the case of 
> bullet 1), or an #emitDocList function (in the case of bullet 2). Both 
> classes provide an #emitHeader and #emitFooter function set that handles 
> formatting and output before the Document list is processed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1516) DocumentList and Document QueryResponseWriter

Reply via email to