[jira] Commented: (SOLR-284) Parsing Rich Document Types

Grant Ingersoll (JIRA) Sun, 11 Jan 2009 07:46:24 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662793#action_12662793
 ]


Grant Ingersoll commented on SOLR-284:
--------------------------------------

bq.  Hmmm ... that means that if i have a schema with a uniqueKey field, and i 
forget to specify a uniqueKey value when indexing my document, the handler will 
"silently succeed" in adding a document with a key i have no control over 
instead of failing in a way that will make me aware of my mistake - and i have 
no way of configuring solr to prevent that kind of "silent success"

Actually, there is a mechanism for avoiding it, and it is documented on in 
http://wiki.apache.org/solr/ExtractingRequestHandler#head-6cda7b8832bb2ccaf6b0b57a6ef524b553db489e

I could, however, see adding a flag to specify whether one wants "silent 
success" or not.  I think the use case for content extraction is different than 
the normal XML message path.  Often times, these files are quite large and the 
cost of sending them to the system is significant.  

Another thing that might be interesting to do is to actually return in the the 
response the generated id.


> Parsing Rich Document Types
> ---------------------------
>
>                 Key: SOLR-284
>                 URL: https://issues.apache.org/jira/browse/SOLR-284
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>            Reporter: Eric Pugh
>            Assignee: Grant Ingersoll
>             Fix For: 1.4
>
>         Attachments: libs.zip, rich.patch, rich.patch, rich.patch, 
> rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284.patch, 
> SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, 
> SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, test-files.zip, 
> test-files.zip, test.zip, un-hardcode-id.diff
>
>
> I have developed a RichDocumentRequestHandler based on the CSVRequestHandler 
> that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into 
> Solr.
> There is a wiki page with information here: 
> http://wiki.apache.org/solr/UpdateRichDocuments
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-284) Parsing Rich Document Types

Reply via email to