Here is the core of the SOLRJ client that ended up accomplishing what I
wanted

        String fileName2 = "C:\\work\\SolrClient\\data\\worldwartwo.txt";
        SolrServer server = new
StreamingUpdateSolrServer("http://localhost:8080/solr/",20,8);
        UpdateRequest req = new UpdateRequest("/update/extract");
        ModifiableSolrParams params = null ;
        params = new ModifiableSolrParams();
        params.add("stream.file", new String[]{fileName2});
        params.set("literal.id", fileName2);
        params.set("captureAttr", "false");


        req.setParams(params);
        server.request(req);
        server.commit();

To get this to work correctly, the following server side config was needed
(I started from a barebones solr config)

1. Add apache-solr-cell-3.5.0.jar to the <solrhost>/lib directory (or
wherever solr can access jars) as this contains the class
ExtractingRequestHandler
2. Add the appropriate handler for /update/extract in the solrconfig.xml
(this uses the ExtractingRequestHandler class).

I'll blog about this later on for the benefit of the community at large

I'm still puzzled that there are no readily available alternatives to using
the Tika based ExtractingRequestHandler in the situation where the input
data is plain UTF-8 text files that SOLR needs to injest and index. I may
need to look into defining a custom Request Handler  if that's the right way
to go.

Thanks again

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-for-SOLR-SOLRJ-to-index-files-directly-bypassing-HTTP-streaming-tp3833419p3843593.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to