OK, I was thinking more along the lines of this blog: http://searchhub.org/dev/2012/02/14/indexing-with-solrj/
which uses Tika directly to process the docs on the client (wherever you run it) and only sends the results to Solr.... The SolrJ program you're referencing uses a different approach... FWIW, Erick On Tue, Sep 25, 2012 at 10:04 AM, <johannes.schwendin...@blum.com> wrote: > The difference with solr cell is, that i'am sending every single document > to solr cell and don't collect them until i have a couple of them in my > memory. > Using mainly the code form here: > http://wiki.apache.org/solr/ExtractingRequestHandler#SolrJ > > > Erick Erickson <erickerick...@gmail.com> schrieb am 25.09.2012 15:47:34: > >> Von: >> >> Erick Erickson <erickerick...@gmail.com> >> >> An: >> >> solr-user@lucene.apache.org >> >> Datum: >> >> 25.09.2012 15:48 >> >> Betreff: >> >> Re: Re: Solr Cell Questions >> >> bq: how many documents per minute, second, what ever can i put into solr >> >> Too many variables to say. I've seen several thousand truly simple >> docs/sec. But since you're doing the Tika processing that's probably >> going to be your limiting factor. And it'll be many fewer... >> >> I don't understand your OOM issue when running Tika on the client. Or, >> rather, why you think using SolrCell makes this different. SolrCell also >> uses Tika. So my suspicion it that your client-side process simply isn't >> allocating much memory to the JVM, did you try bumping the memory >> on your client? >> >> Best >> Erick >> >> On Tue, Sep 25, 2012 at 5:23 AM, <johannes.schwendin...@blum.com> > wrote: >> > Thank you Erick for your respone, >> > >> > I've already tried what you've suggested and got some out of memory >> > exceptions. Because of this i like the solution with solr Cell where i > can >> > send the file directly to solr via stream and don't collect them in my >> > memory. >> > >> > And another question that came to my mind, how many documents per > minute, >> > second, what ever can i put into solr. Say XML format and from 100kb > to >> > 100MB. >> > Is there a number or is it to dependent from hardware and settings? >> > >> > >> > Best >> > Johannes >> > >> > Erick Erickson <erickerick...@gmail.com> schrieb am 25.09.2012 > 00:22:26: >> > >> >> Von: >> >> >> >> Erick Erickson <erickerick...@gmail.com> >> >> >> >> An: >> >> >> >> solr-user@lucene.apache.org >> >> >> >> Datum: >> >> >> >> 25.09.2012 00:23 >> >> >> >> Betreff: >> >> >> >> Re: Solr Cell Questions >> >> >> >> If you're concerned about throughput, consider moving all the >> >> SolrCell (Tika) processing off the server. SolrCell is way cool >> >> for showing what can be done, but its downside is you're >> >> moving all the processing of the structured documents to the >> >> same machine doing the indexing. Pretty soon, especially >> >> with significant size files, you're spending all your CPU cycles >> >> parsing the files... >> >> >> >> Happens there's a blog about this: >> >> http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ >> >> >> >> By moving the indexing to N clients, you can increase >> >> throughput until you make Solr work hard to do the indexing.... >> >> >> >> Best >> >> Erick >> >> >> >> On Mon, Sep 24, 2012 at 10:04 AM, <johannes.schwendin...@blum.com> >> > wrote: >> >> > Hi, >> >> > >> >> > Im currently experimenting with Solr Cell to index files to Solr. >> > During >> >> > this some questions came up. >> >> > >> >> > 1. Is it possible (and wise) to connect to Solr Cell with multiple >> > Threads >> >> > at the same time to index several documents at the same time? >> >> > This question came up because my prrogramm takes about 6hours to > index >> >> > round 35000 docs. (no production environment, only example solr and > a >> >> > little desktop machine but I think its very slow, and I know solr >> > isn't >> >> > the bottleneck (yet)) >> >> > >> >> > 2. If 1 is possible, how many Threads should do this and how many >> > memory >> >> > Solr needs? I've tried it but i run into an out of memory > exception. >> >> > >> >> > Thanks in advantage >> >> > >> >> > Best Regards >> >> > Johannes