Thanks for all you advices and thoughts. The "client" in our case is/are the tomcats. To be more precise the webapps running in the tomcats. These should serve http request.
I'd also like to note that it's he batch-updates that in my opinion cause load (cpu and memory (dependeing on the pdf)) which I would like to take of the webapps. Not the single document insertions/updates. But if I don't get a clean/stable "Solr-way-to-do-it" solution to this problem I will do the extraction in the webapps, as is -----Ursprüngliche Nachricht----- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Samstag, 13. September 2014 23:22 An: solr-user@lucene.apache.org Betreff: Re: SolrJ : fieldcontent from (multiple) file(s) Alexandre: Hmmm, if you're correct, that pretty much shoots SolrCel in the head too. You'd probably have to do something with a custom UpdateRequestProcessor in that case... On Sat, Sep 13, 2014 at 2:06 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > On 13 September 2014 17:03, Erick Erickson <erickerick...@gmail.com> wrote: >> Which probably just means I don't understand your problem space in >> sufficient depth.... > > I suspect this means the clients do not have access to the shared > drive with the files, but the Solr server does. A firewall in between > or some such. > > If I am right, that would make invoking DataImportHandler a bit > complicated as well, due to change of push to pull. > > Regards, > Alex. > > Personal: http://www.outerthoughts.com/ and @arafalov Solr resources > and newsletter: http://www.solr-start.com/ and @solrstart Solr > popularizers community: https://www.linkedin.com/groups?gid=6713853