Are there any actual errors in the log?  What occurs after the last line in 
your log below?  What happens if you send in an "extract only" message?  Do you 
get back out content?  

On Apr 28, 2010, at 9:10 AM, Jeroen van Schagen wrote:

> On second hand, the curl command doesn't index the file content on this
> system either. It worked on my home system (mac) but isn't working anymore
> on my job system (windows), could this have anything to do with the issue?
> I'm not getting any error messages, the file identifier is properly indexed
> but the text fields are all empty, despite the file type used (pdf, docx,
> txt). My solr config looks as follows:
> 
> <config>
>    <dataDir>solr/data</dataDir>
>    <requestHandler name="/update/extract"
> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>        <lst name="defaults">
>            <str name="fmap.content">text</str>
>            <str name="fmap.Last-Modified">last_modified</str>
>            <str name="uprefix">ignored_</str>
>        </lst>
>    </requestHandler>
>    ...
> </config>
> 
> And the logs:
> 
> [INFO] Started Jetty Server
> 28-04-2010 14:27:54.643 WARN  31889293-0
> org.apache.solr.updat
> e.SolrIndexWriter:120 No lockType configured for solr/data\index/ assuming
> 'simp
> le'
> 28-04-2010 14:27:54.674 INFO  31889293-0
> org.apache.s
> olr.core.SolrCore:114 SolrDeletionPolicy.onInit: commits:num=1
> 
> commit{dir=C:\Development\workspace\solr-server\solr\data\index,segFN=se
> gments_6,version=1272456431379,generation=6,filenames=[_0.cfs, _3.cfs,
> _0_1.del,
> _1.cfs, _2.cfs, segments_6, _4.cfs]
> 28-04-2010 14:27:54.674 INFO  31889293-0
> org.apache.s
> olr.core.SolrCore:136 newest commit = 1272456431379
> 28-04-2010 14:27:54.689 INFO  31889293-0
> org.apache.solr.upd
> ate.UpdateHandler:399 start
> commit(optimize=false,waitFlush=false,waitSearcher=t
> rue,expungeDeletes=false)
> 28-04-2010 14:27:54.721 INFO  31889293-0
> org.apache.s
> olr.core.SolrCore:122 SolrDeletionPolicy.onCommit: commits:num=2
> 
> commit{dir=C:\Development\workspace\solr-server\solr\data\index,segFN=se
> gments_6,version=1272456431379,generation=6,filenames=[_0.cfs, _3.cfs,
> _0_1.del,
> _1.cfs, _2.cfs, segments_6, _4.cfs]
> 
> commit{dir=C:\Development\workspace\solr-server\solr\data\index,segFN=se
> gments_7,version=1272456431380,generation=7,filenames=[_0.cfs, segments_7, _
> 3.cf
> s, _0_1.del, _4_1.del, _5.cfs, _1.cfs, _2.cfs, _4.cfs]
> 28-04-2010 14:27:54.721 INFO  31889293-0
> org.apache.s
> olr.core.SolrCore:136 newest commit = 1272456431380
> 28-04-2010 14:27:54.721 INFO  31889293-0
> org.apache.solr.search.
> SolrIndexSearcher:135 Opening searc...@1da1a93 main
> 28-04-2010 14:27:54.721 INFO  31889293-0
> org.apache.solr.upd
> ate.UpdateHandler:423 end_commit_flush
> 28-04-2010 14:27:54.721 INFO  1-thread-1
> org.apache.solr.search.
> SolrIndexSearcher:1480 autowarming searc...@1da1a93 main from
> searc...@1c44a6d m
> ain
> 
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,siz
> e=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00
> ,cumulative_inserts=0,cumulative_evictions=0}
> 28-04-2010 14:27:54.736 INFO  1-thread-1
> org.apache.solr.search.
> SolrIndexSearcher:1482 autowarming result for searc...@1da1a93 main
> 
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,siz
> e=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00
> ,cumulative_inserts=0,cumulative_evictions=0}
> 28-04-2010 14:27:54.736 INFO  1-thread-1
> org.apache.s
> olr.core.SolrCore:1276 [] Registered new searcher searc...@1da1a93 main
> 28-04-2010 14:27:54.736 INFO  1-thread-1
> org.apache.solr.search.
> SolrIndexSearcher:225 Closing searc...@1c44a6d main
> 
> fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,siz
> e=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00
> ,cumulative_inserts=0,cumulative_evictions=0}
> *28-04-2010 14:27:54.736 INFO  31889293-0
> .update.processor.Updat
> eRequestProcessor:171 {add=[RUNNING.txt],commit=} 0 140
> 28-04-2010 14:27:54.736 INFO  31889293-0
> org.apache.s
> olr.core.SolrCore:1324 [] webapp=/solr-server path=/update/extract
> params={commi
> t=true&literal.id=RUNNING.txt} status=0 QTime=140*
> 
> On Wed, Apr 28, 2010 at 2:24 PM, Grant Ingersoll <gsing...@apache.org>wrote:
> 
>> What error are you getting?  Does
>> http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/work
>>  for you?
>> 
>> On Apr 28, 2010, at 6:44 AM, Jeroen van Schagen wrote:
>> 
>>> Dear solr-user,
>>> 
>>> Using a quartz scheduler, I want to index all documents inside a specific
>>> folder with Solr(J). To perform the actual indexing I selected the
>>> org.apache.solr.handler.extraction.ExtractingRequestHandler. The request
>>> handler functions perfectly when the request is send by curl: curl "
>>> http://localhost:8080/solr/update/extract?literal.id=doc2&commit=true";
>> -F
>>> "tutori...@example.docx"
>>> 
>>> But, for some reason, the file is not indexed when using SolrJ. My
>> indexing
>>> method looks as follow:
>>> private static final String EXTRACT_REQUEST_MAPPING = "/update/extract";
>>> private File baseFolder;
>>> private boolean recursive = false;
>>> private CommonsHttpSolrServer server;
>>> public void index(File folder) {
>>>       if (!folder.isDirectory()) {
>>>           throw new IllegalArgumentException(folder.getAbsolutePath() +
>> "
>>> is not a directory.");
>>>       }
>>>       logger.info("Indexing documents inside folder [{}]",
>>> folder.getAbsolutePath());
>>>       for (File file : folder.listFiles()) {
>>>           if (file.isFile()) {
>>>               ContentStreamUpdateRequest up = new
>>> ContentStreamUpdateRequest(EXTRACT_REQUEST_MAPPING);
>>>               try {
>>>                   up.addFile(file);
>>>                   up.setParam("literal.id", file.getName());
>>>                   up.setAction(AbstractUpdateRequest.ACTION.COMMIT,
>> true,
>>> true);
>>>                   server.request(up);
>>>               } catch (SolrServerException e) {
>>>                   logger.error("Could not connect to server.", e);
>>>               } catch (IOException e) {
>>>                   logger.error("Could not upload file to server.", e);
>>>               }
>>>           } else if (recursive && file.isDirectory()) {
>>>               index(file); // Index sub-directory as well
>>>           }
>>>       }
>>>   }
>>> 
>>> is there something im doing wrong here?
>> 
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem using Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search

Reply via email to