[jira] Updated: (SOLR-284) Parsing Rich Document Types

Chris Harris (JIRA) Fri, 05 Sep 2008 12:23:38 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chris Harris updated SOLR-284:
------------------------------

    Attachment: rich.patch

THIS IS A BREAKING CHANGE TO RICH.PATCH! CLIENT URLs NEED TO BE UPDATED!

All unit tests pass.

Changes:

* As suggested earlier, the "id" parameter is no longer treated as a special 
case; it is not required, and it does not need to be an int. If you *do* use a 
field called "id", you *must* now declare it in the fieldnames parameter, as 
you would any other field

* Do updates with with UpdateRequestProcessor and SolrInputDocument, rather 
than UpdateHandler and DocumentBuilder. (The latter pair appear to be obsolete.)

* Previously if you declared a field in the fieldnames parameter but did not 
then did not specify a value for that field, you would get a 
NullPointerException. Now you can specify any nonnegative number of values for 
a declared field, including zero. (I've added a unit test for this.)

* In SolrPDFParser, properly close PDDocument when PDF parsing throws an 
exception

* Log the stream type in the solr log, rather than on the console

* Some not-very-thorough conversion of tabs to spaces

As an aside, I've noticed that I failed in my earlier efforts to incorporate 
Juri Kuehn's change to allow the id field to be non-integer. Sorry about that, 
Juri; that was not at all intentional.


> Parsing Rich Document Types
> ---------------------------
>
>                 Key: SOLR-284
>                 URL: https://issues.apache.org/jira/browse/SOLR-284
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Eric Pugh
>             Fix For: 1.4
>
>         Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
> rich.patch, rich.patch, rich.patch, rich.patch, source.zip, test-files.zip, 
> test-files.zip, test.zip, un-hardcode-id.diff
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
> that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
> Solr.
> There is a wiki page with information here: 
> http://wiki.apache.org/solr/UpdateRichDocuments
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-284) Parsing Rich Document Types

Reply via email to