My SOLR_HOME =/home/solr_1_4_0/apache-solr-1.4.0/example/solr/conf in
tomcat.sh

POI, PDFBox, Tika and related jars are under
/home/solr_1_4_0/apache-solr-1.4.0/lib

When I try to index files using SolrJ API as follow, I don't see content of
the file being indexed. It only indexes file size (bytes) and file/type into
"content" field. See below schema defintion as well.
ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(file);
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(up);

schema.xml has following
 <field name="issueKey" type="slong" indexed="true" stored="true"
required="true" /> 
 <field name="content" type="text" indexed="true" stored="true"
multiValued="true"/>     

<defaultSearchField>content</defaultSearchField>

And solrconfig.xml has
<requestHandler name="/update/extract"
class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
    <lst name="defaults">
      <str name="map.content">content</str>
      <str name="defaultField">content</str>
    </lst>
  </requestHandler>

Luke response is as below, which displays correct count (7) of indexed
documents but no "content" in the index. in tomcat logs I don't see any
errors or anything. Unless I am going blind with something I don't see
anything missing in setting things up. Can anyone advise. Do I need to
include tika jars in tomcat's deployed solr/lib or unde /example/lib in
SOLR_HOME?

  <?xml version="1.0" encoding="UTF-8" ?> 
- <response>
- <lst name="responseHeader">
  <int name="status">0</int> 
  <int name="QTime">28</int> 
  </lst>
- <lst name="index">
  <int name="numDocs">7</int> 
  <int name="maxDoc">7</int> 
  <int name="numTerms">25</int> 
  <long name="version">1259164190261</long> 
  <bool name="optimized">false</bool> 
  <bool name="current">true</bool> 
  <bool name="hasDeletions">false</bool> 
  <str
name="directory">org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index</str>
 
  <date name="lastModified">2009-11-25T15:50:03Z</date> 
  </lst>
- <lst name="fields">
- <lst name="content">
  <str name="type">text</str> 
  <str name="schema">ITSM----------</str> 
  <str name="index">ITS----------</str> 
  <int name="docs">7</int> 
  <int name="distinct">18</int> 
- <lst name="topTerms">
  <int name="text">3</int> 
  <int name="applic">3</int> 
  <int name="msword">3</int> 
  <int name="applicationmsword">3</int> 
  <int name="plain">2</int> 
  <int name="textplain">2</int> 
  <int name="70144">1</int> 
  <int name="453">1</int> 
  <int name="2370">1</int> 
  <int name="html">1</int> 
  </lst>
- <lst name="histogram">
  <int name="1">12</int> 
  <int name="2">2</int> 
  <int name="4">4</int> 
  </lst>
  </lst>
- <lst name="issueKey">
  <str name="type">slong</str> 
  <str name="schema">I-S----O-----l</str> 
  <str name="index">I-S----O-----</str> 
  <int name="docs">7</int> 
  <int name="distinct">7</int> 
- <lst name="topTerms">
  <int name="1">1</int> 
  <int name="2">1</int> 
  <int name="3">1</int> 
  <int name="4">1</int> 
  <int name="5">1</int> 
  <int name="6">1</int> 
  <int name="0">1</int> 
  </lst>
- <lst name="histogram">
  <int name="1">7</int> 
  </lst>
  </lst>
  </lst>
- <lst name="info">
- <lst name="key">
  <str name="I">Indexed</str> 
  <str name="T">Tokenized</str> 
  <str name="S">Stored</str> 
  <str name="M">Multivalued</str> 
  <str name="V">TermVector Stored</str> 
  <str name="o">Store Offset With TermVector</str> 
  <str name="p">Store Position With TermVector</str> 
  <str name="O">Omit Norms</str> 
  <str name="L">Lazy</str> 
  <str name="B">Binary</str> 
  <str name="C">Compressed</str> 
  <str name="f">Sort Missing First</str> 
  <str name="l">Sort Missing Last</str> 
  </lst>
  <str name="NOTE">Document Frequency (df) is not updated when a document is
marked for deletion. df values include deleted documents.</str> 
  </lst>
  </response>
-- 
View this message in context: 
http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26515579.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to