Re: Stream still in memory after tika exception? Possible memoryleak?

2011-11-06 Thread Lance Norskog
Yes, please open a JIRA for this, with as much info as possible. Lance On Thu, Nov 3, 2011 at 9:48 AM, P Williams wrote: > Hi All, > > I'm experiencing a similar problem to the other's in the thread. > > I've recently upgraded from apache-solr-4.0-2011-06-14_08-33-23.war to > apache-solr-4.0-201

Re: Stream still in memory after tika exception? Possible memoryleak?

2011-11-03 Thread P Williams
Hi All, I'm experiencing a similar problem to the other's in the thread. I've recently upgraded from apache-solr-4.0-2011-06-14_08-33-23.war to apache-solr-4.0-2011-10-14_08-56-59.war and then apache-solr-4.0-2011-10-30_09-00-00.war to index ~5300 pdfs, of various sizes, using the TikaEntityProce

Re: Stream still in memory after tika exception? Possible memoryleak?

2011-08-31 Thread Marc Jacobs
Hi Erick, This is one of the errors I get (at the 4GB memory machine) and after a while Tomcat crashes: SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! And this is part of my solrconfig.xml (I'm indexing 200k documents per run):

Re: Stream still in memory after tika exception? Possible memoryleak?

2011-08-30 Thread Erick Erickson
See solrconfig.xml, particularly ramBufferSizeMB, also maxBufferedDocs. There's no reason you can't index as many documents as you want, unless your documents are absolutely huge (as in 100s of M, possibly G size). Are you actually getting out of memory problems? Erick On Tue, Aug 30, 2011 at 4

Re: Stream still in memory after tika exception? Possible memoryleak?

2011-08-30 Thread Marc Jacobs
Hi Chris, Thanks for the response. Eventualy I want to install Solr on a machine with a maximum memory of 4GB. I tried to index the data on that machine before, but it resulted in index locks and memory errors. Is 4GB not enough to index 100,000 documents in a row? How much should it be? Is there

Re: Stream still in memory after tika exception? Possible memoryleak?

2011-08-30 Thread Marc Jacobs
Hi Erick, I am using Solr 3.3.0, but with 1.4.1 the same problems. The connector is a homemade program in the C# programming language and is posting via http remote streaming (i.e. http://localhost:8080/solr/update/extract?stream.file=/path/to/file.doc&literal.id=1 ) I'm using Tika to extract the

Re: Stream still in memory after tika exception? Possible memoryleak?

2011-08-30 Thread Chris Hostetter
: The current system I'm using has 150GB of memory and while I'm indexing the : memoryconsumption is growing and growing (eventually more then 50GB). : In the attached graph (http://postimage.org/image/acyv7kec/) I indexed about : 70k of office-documents (pdf,doc,xls etc) and between 1 and 2 perce

Re: Stream still in memory after tika exception? Possible memoryleak?

2011-08-30 Thread Erick Erickson
What version of Solr are you using, and how are you indexing? DIH? SolrJ? I'm guessing you're using Tika, but how? Best Erick On Tue, Aug 30, 2011 at 4:55 AM, Marc Jacobs wrote: > Hi all, > > Currently I'm testing Solr's indexing performance, but unfortunately I'm > running into memory problems

Re: Stream still in memory after tika exception? Possible memoryleak?

2011-08-30 Thread Marc Jacobs
Hi all, Currently I'm testing Solr's indexing performance, but unfortunately I'm running into memory problems. It looks like Solr is not closing the filestream after an exception, but I'm not really sure. The current system I'm using has 150GB of memory and while I'm indexing the memoryconsumptio

Stream still in memory after tika exception? Possible memoryleak?

2011-08-30 Thread Marc Jacobs
Hi all, Currently I'm testing Solr's indexing performance, but unfortunately I'm running into memory problems. It looks like Solr is not closing the filestream after an exception, but I'm not really sure. The current system I'm using has 150GB of memory and while I'm indexing the memoryconsumptio