I need to index rich text documents, this is* solrconfig.xml for extract handler*: <requestHandler name="/update/extract" class="solr.extraction.ExtractingRequestHandler" > <lst name="defaults">
<str name="lowernames">true</str> <str name="uprefix">ignored_</str> <str name="captureAttr">true</str> </lst> </requestHandler> My *schema.xml* is: <field name="doc_id" type="uuid" indexed="true" stored="true" default="NEW" multiValued="false"/> <field name="id" type="long" indexed="true" stored="true" required="true" multiValued="false"/> <field name="contents" type="text" indexed="true" stored="true" multiValued="false"/> <field name="author" type="title_text" indexed="true" stored="true" multiValued="true"/> <field name="title" type="title_text" indexed="true" stored="true"/> <field name="date_modified" type="date" indexed="true" stored="true" multivalued="true"/> <field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/> <dynamicField name="ignored_*" type="text" indexed="true" stored="true" multiValued="true"/> But after *indexing using this curl*: curl "http://localhost:8080/solr/document/update/extract?literal.id=12&commit=true" -F"myfile=Coding.pdf" when queried as q=id:12, the *output* is : <arr name="ignored_stream_source_info"> <str>myfile</str> </arr> <arr name="ignored_stream_content_type"> <str>application/octet-stream</str> </arr> <arr name="ignored_stream_size"> <str>3336935</str> </arr> <arr name="ignored_stream_name"> <str>Coding.pdf</str> </arr> <arr name="ignored_content_type"> <str>application/pdf</str> </arr> <str name="contents"></str> ----*Contents not shown* <long name="_version_">1456831756526157824</long> <str name="doc_id">8eb229e0-5f25-4d26-bba4-6cb67aab7f81</str> </doc> Why is it so?? Also date_modified field does not appear?? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850.html Sent from the Solr - User mailing list archive at Nabble.com.