[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726123#action_12726123 ]
Chris Harris commented on SOLR-284: ----------------------------------- {quote} bq. My only request is that, if you're changing how field mapping works and maybe removing ext.ignore.und.fl, you make sure it stays easy to say, "Tika, I don't care about any of your parsed metadata. Map unknown fields to an ignored fieldtype. uprefix=ignored_ {quote} That seems fine. Tangentially, I wonder how fast Tika's metadata extraction is, compared to its main body text extraction. If the latter doesn't dwarf the former, there might be value in adding a "Solr, don't even ask Tika to calculate for metadata at all; just have it extract the body text" flag; this could potentially speed things up for people that don't need the metadata. Maybe it would make sense to benchmark things before adding such a flag, though. I also don't have a good sense of how many people will want to use the metadata feature vs how many don't. > Parsing Rich Document Types > --------------------------- > > Key: SOLR-284 > URL: https://issues.apache.org/jira/browse/SOLR-284 > Project: Solr > Issue Type: New Feature > Components: update > Reporter: Eric Pugh > Assignee: Grant Ingersoll > Fix For: 1.4 > > Attachments: libs.zip, rich.patch, rich.patch, rich.patch, > rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284-no-key-gen.patch, > SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, > SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, > test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff > > > I have developed a RichDocumentRequestHandler based on the CSVRequestHandler > that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into > Solr. > There is a wiki page with information here: > http://wiki.apache.org/solr/UpdateRichDocuments > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.