[jira] Commented: (SOLR-1516) DocumentList and Document QueryResponseWriter

Chris A. Mattmann (JIRA) Mon, 16 Nov 2009 21:18:06 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778726#action_12778726
 ]


Chris A. Mattmann commented on SOLR-1516:
-----------------------------------------

bq. This does not help the user of the API much because the real difficulty is 
in unmarshalling various types of objects. This patch does nothing to read the 
stored fields from the Document .

I agree with your statement above regarding "the real difficulty". That's 
precisely what this patch addresses. This patch deals with that real difficulty 
for users (of which there are plenty, please see my comment above RE: use 
cases, e.g., FGDC, RDF, etc.) that are mostly concerned with spitting out (for 
format compatibility) the resultant Documents from searches in a particular XML 
format. This patch isn't intended to do anything with the stored fields -- 
that's left up to the user who extends the abstract base classes by 
implementing #emitDoc or #emitDocList, where the user deals with Lucene 
Documents. As I stated above numerous times, it took me quite a bit of printing 
out and deducing the structure of the resultant SolrResponse to determine where 
in that list Documents were stored (and in fact they weren't it i just the 
IDs). This isn't really documented anywhere per se (at least from what I could 
find with the online Javadocs or Wiki).

bq. That is really difficult. A lot of components write their output in a very 
arbitrary Object tree. The output is largely designed like a JSON object tree 
(with more promitives) . The producer decides what the tree contains. The good 
thing about this approach is that we don't need to build custom classes for 
every type of output.

Why is this difficult? It would amount to components declaring what type of 
schema they return. Typed, bags of objects, coupled with sparse documentation 
isn't exactly the answer. I think we both agree that there is a larger issue to 
look at in terms of the SolrResponse though and QueryResponseWriters, my point 
is that I don't think using this issue to solve those bigger picture questions 
is the right answer. I'd be happy to create further issues to discuss this.

bq. There is no reason why a GenericResponseWriter can't do that . I am not 
happy about putting this classes in and leading users to believe that this is 
all that they have to do.

How are we telling users that this is all they have to do? The patch 
specifically states (taken from the included Javadoc):

bq. This {...@link QueryResponseWriter} allows a user to implement the 
{...@link #emitDoc(Document, Writer)} function which acts as a callback 
function to process one Lucene {...@link Document} returned from the SOLR Query 
at a time. Sub-classes should keep track of any global state as this class does 
not provide a means to access the entire set of returned {...@link 
Document}s.If that functionality is required, see {...@link 
DocumentListResponseWriter}.

bq. This {...@link QueryResponseWriter} allows a user to implement the 
{...@link #emitDocList(List, Writer)} function which acts as a callback 
function to process the entire {...@link List} of Lucene {...@link Document} 
returned from the SOLR Query at once. To process the {...@link Document}s 
one-at-a-time (to conserve resources, or to speed up the processing/etc.), see 
{...@link DocumentResponseWriter}.

I'm not sure I see the concern behind this ~250 line patch? The patch:

* adds functionality that would have simplified a number of use cases that I am 
leveraging SOLR for in the space and earth science data community, where 
formats are critical and metadata output is more important than the specific 
search meta-info (# hits, query time, start/end, etc.). See the 3-4 examples I 
stated above.

* does not introduce anything that is not backwards compatible

* includes javadoc on all public methods, as well as class-level javadoc

* should apply without trouble to the current SVN trunk

This has typically been the criteria for inclusion (modulo unit tests, which if 
there is concern there, I'd be happy to include) -- is the criteria different 
here in SOLR? 

> DocumentList and Document QueryResponseWriter
> ---------------------------------------------
>
>                 Key: SOLR-1516
>                 URL: https://issues.apache.org/jira/browse/SOLR-1516
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.3
>         Environment: My MacBook Pro laptop.
>            Reporter: Chris A. Mattmann
>            Assignee: Noble Paul
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1516.Mattmann.101809.patch.txt
>
>
> I tried to implement a custom QueryResponseWriter the other day and was 
> amazed at the level of unmarshalling and weeding through objects that was 
> necessary just to format the output o.a.l.Document list. As a user, I wanted 
> to be able to implement either 2 functions:
> * process a document at a time, and format it (for speed/efficiency)
> * process all the documents at once, and format them (in case an aggregate 
> calculation is necessary for outputting)
> So, I've decided to contribute 2 simple classes that I think are sufficiently 
> generic and reusable. The first is o.a.s.request.DocumentResponseWriter -- it 
> handles the first bullet above. The second is 
> o.a.s.request.DocumentListResponseWriter. Both are abstract base classes and 
> require the user to implement either an #emitDoc function (in the case of 
> bullet 1), or an #emitDocList function (in the case of bullet 2). Both 
> classes provide an #emitHeader and #emitFooter function set that handles 
> formatting and output before the Document list is processed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1516) DocumentList and Document QueryResponseWriter

Reply via email to