To solve your production problem I highly recommend limiting the size of the docs fed to Tika, for a start. But that is no guarantee, I understand.
Out of memory problems are very hard to get good forensics for because they cause major disruptions to the running server. You could turn on a degree of logging so that you can see what documents are being processed at any time by all threads, but that is pretty verbose. In your properties.xml file, add <property name="org.apache.manifoldcf.crawlerthreads" value="DEBUG"/>. But I suspect that will generate far too much noise. Still, it's the best I can offer. Karl On Fri, Jul 27, 2018 at 7:52 AM msaunier <msaun...@citya.com> wrote: > Hi Karl, > > > > Okay. For the Out of Memory: > > > > This is the last day that I can go on to find out where the error comes > from. After that, I should go into production to meet my deadlines. > > I hope to find time in the future to be able to fix this problem on this > server, otherwise I could not index it. Unfortunately, it is very difficult > to find the documents that cause this error. I did not find any trace in > the database. Even in debug mode, it is difficult to find the problematic > document. Maybe if I limit to 1 thread I could find it more easily, but I'm > afraid the crawl is very long. > > Maybe you have an idea of the best method to adopt to find this / these > documents? > > > > Maxence > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* vendredi 27 juillet 2018 12:47 > *À :* dev <d...@manifoldcf.apache.org>; user@manifoldcf.apache.org > *Objet :* Tika/POI bugs > > > > Hi all, > > > > I've easily spent 40 hours over the last two weeks chasing down bugs in > Apache Tika and POI. The two kinds I see are "ClassNotFound" (due to usage > of the wrong ClassLoader), and "OutOfMemoryError" (not clear what it is due > to yet). > > I don't have enough time to create tickets directly in Tika for all > possible documents where these failures occur, so I urge our users to > create tickets DIRECTLY in the Tika project in Jira. I guess you can let > the Tika people create the POI tickets, if need be. For OutOfMemory > problems, please attach the file that causes the problem to the ticket, and > also the amount of memory you gave the agents process. For ClassNotFound > problems, also include the stack trace. > > > > Thanks in advance, > > Karlx >