I need to index rich text documents, this is* solrconfig.xml for extract
handler*:
<requestHandler name="/update/extract"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">

<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">true</str>
</lst>
</requestHandler>

My *schema.xml* is:
<field name="doc_id" type="uuid" indexed="true" stored="true" default="NEW"
multiValued="false"/>
<field name="id" type="long" indexed="true" stored="true" required="true"
multiValued="false"/>
<field name="contents" type="text" indexed="true" stored="true"
multiValued="false"/>
<field name="author" type="title_text" indexed="true" stored="true"
multiValued="true"/>
<field name="title" type="title_text" indexed="true" stored="true"/>
<field name="date_modified" type="date" indexed="true" stored="true"
multivalued="true"/>
<field name="_version_" type="long" indexed="true" stored="true"
multiValued="false"/>
<dynamicField name="ignored_*" type="text" indexed="true" stored="true"
multiValued="true"/>


But after *indexing using this curl*:
curl
"http://localhost:8080/solr/document/update/extract?literal.id=12&commit=true";
-F"myfile=Coding.pdf"
when queried as q=id:12, the *output* is :
<arr name="ignored_stream_source_info">
<str>myfile</str>
</arr>
<arr name="ignored_stream_content_type">
<str>application/octet-stream</str>
</arr>
<arr name="ignored_stream_size">
<str>3336935</str>
</arr>
<arr name="ignored_stream_name">
<str>Coding.pdf</str>
</arr>
<arr name="ignored_content_type">
<str>application/pdf</str>
</arr>
<str name="contents"></str>     ----*Contents not shown*
<long name="_version_">1456831756526157824</long>
<str name="doc_id">8eb229e0-5f25-4d26-bba4-6cb67aab7f81</str>
</doc>

Why is it so??

Also date_modified field does not appear??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to