[jira] Updated: (SOLR-284) Parsing Rich Document Types

Chris Harris (JIRA) Fri, 29 Aug 2008 12:36:36 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chris Harris updated SOLR-284:
------------------------------

    Attachment: un-hardcode-id.diff

The patch, as currently stands, treats a field called "id" as a special case. 
First, it is a required field. Second, unlike any other field, you don't need 
to declare it in the fieldnames parameter. Finally, since the 
fieldSolrParams.getInt(), that field is required to be an int.

This special-case treatment seems a little too particular to me; not everyone 
wants to have a field called "id", and not everyone who does wants that field 
to be an int. So what I propose is to eliminate the special treatment of "id". 
See un-hardcode-id.diff for what this might mean in particular. (That file is 
not complete; to correctly make this change, I'd have to update the test cases.)

This is a breaking change, because if you *are* using an id field, you'll now 
have to specifically indicate that fact in the fieldnames parameter. Thus, 
instead of

http://localhost:8983/solr/update/rich?stream.file=myfile.doc&stream.type=doc&id=100&stream.fieldname=text&fieldnames=subject,author&subject=mysubject&author=eric

you'll have to put

http://localhost:8983/solr/update/rich?stream.file=myfile.doc&stream.type=doc&id=100&stream.fieldname=text&fieldnames=id,subject,author&subject=mysubject&author=eric

I think asking users of this patch to make this slight change in their client 
code is not an unreasonable burden, but I'm curious what Eric and others have 
to say.

> Parsing Rich Document Types
> ---------------------------
>
>                 Key: SOLR-284
>                 URL: https://issues.apache.org/jira/browse/SOLR-284
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Eric Pugh
>             Fix For: 1.4
>
>         Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
> rich.patch, rich.patch, source.zip, test-files.zip, test-files.zip, test.zip, 
> un-hardcode-id.diff
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
> that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
> Solr.
> There is a wiki page with information here: 
> http://wiki.apache.org/solr/UpdateRichDocuments
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-284) Parsing Rich Document Types

Reply via email to