Hi Maxence, I am wondering whether you moved any jars from dist/connector-common-lib to dist/lib? If you did this, you will mess up the ability of any of the Tika jars to find their dependencies. This also explains why commons-compress cannot be found; it's in connector-common-lib. It sounds like you may have put the new poi jars in the wrong place? They should *all* be in connector-common-lib too.
Karl On Thu, Jul 26, 2018 at 6:23 AM Karl Wright <daddy...@gmail.com> wrote: > Hi Maxence, > > The following error: > > >>>>>> > > FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed: > org/apache/poi/POIXMLTextExtractor > > java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor > > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106) > ~[?:?] > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ~[?:?] > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ~[?:?] > > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > ~[?:?] > > at > org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) > ~[?:?] > > at > org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) > ~[?:?] > > <<<<<< > > .... seems to be the result of putting new POI jars down that are not > compatible fully with the version of Tika that's there. Unfortunately, > this cannot be addressed right now in any way I can think of. Tika's > dependencies are legion and they change all the time. > > The only thing we can really do is wait for: (1) POI to release their new > software, and then (2) Tika to release a new release that depends on it. > > Karl > > > On Thu, Jul 26, 2018 at 5:33 AM msaunier <msaun...@citya.com> wrote: > >> Hello Karl, >> >> >> >> For the moment, it working. >> >> >> >> I have write this errors but they are not FATAL: >> >> >> >> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Checking '*' >> against '/69B_citya_barioz_immobilier/02894_berthollier/Formation/' >> >> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Match found. >> >> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Leaving >> checkInclude for >> 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/' >> >> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Recorded path >> is >> 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/' >> and is included. >> >> FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed: >> org/apache/poi/POIXMLTextExtractor >> >> java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor >> >> at >> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106) >> ~[?:?] >> >> at >> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) >> ~[?:?] >> >> at >> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) >> ~[?:?] >> >> at >> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) >> ~[?:?] >> >> at >> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) >> ~[?:?] >> >> at >> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) >> ~[?:?] >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) >> ~[mcf-agents.jar:?] >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) >> ~[mcf-agents.jar:?] >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$MonitoredAddActivityWrapper.sendDocument(IncrementalIngester.java:3471) >> ~[mcf-agents.jar:?] >> >> at >> org.apache.manifoldcf.agents.transformation.contentlimiter.ContentLimiter.addOrReplaceDocumentWithException(ContentLimiter.java:161) >> ~[?:?] >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) >> ~[mcf-agents.jar:?] >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) >> ~[mcf-agents.jar:?] >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) >> ~[mcf-agents.jar:?] >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) >> ~[mcf-agents.jar:?] >> >> at >> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) >> ~[mcf-pull-agent.jar:?] >> >> at >> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) >> ~[mcf-pull-agent.jar:?] >> >> at >> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) >> ~[?:?] >> >> at >> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) >> [mcf-pull-agent.jar:?] >> >> Caused by: java.lang.ClassNotFoundException: >> org.apache.poi.POIXMLTextExtractor >> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >> ~[?:1.8.0_171] >> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> ~[?:1.8.0_171] >> >> at >> java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814) >> ~[?:1.8.0_171] >> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> ~[?:1.8.0_171] >> >> ... 18 more >> >> AND >> >> >> >> Starting crawler... >> >> juil. 26, 2018 11:29:01 AM >> org.apache.tika.config.InitializableProblemHandler$3 >> handleInitializableProblem >> >> AVERTISSEMENT: JBIG2ImageReader not loaded. jbig2 files will be ignored >> >> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io >> >> for optional dependencies. >> >> TIFFImageWriter not loaded. tiff files will not be processed >> >> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io >> >> for optional dependencies. >> >> J2KImageReader not loaded. JPEG2000 files will not be processed. >> >> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io >> >> for optional dependencies. >> >> >> >> juil. 26, 2018 11:29:01 AM >> org.apache.tika.config.InitializableProblemHandler$3 >> handleInitializableProblem >> >> AVERTISSEMENT: org.xerial's sqlite-jdbc is not loaded. >> >> Please provide the jar on your classpath to parse sqlite files. >> >> See tika-parsers/pom.xml for the correct version. >> >> >> >> Maxence, >> >> >> >> >> >> >> >> *De :* Karl Wright [mailto:daddy...@gmail.com] >> *Envoyé :* mercredi 25 juillet 2018 19:09 >> *À :* user@manifoldcf.apache.org >> *Objet :* Re: Out of memory, one file bug i think >> >> >> >> That's what I was afraid of. The new poi jars have dependencies we >> haven't accounted for yet. >> >> >> >> Can you download apache-commons-compress jar (latest version should be >> OK) and also put that in connector-common-lib? Thanks!! >> >> >> >> Karl >> >> >> >> >> >> On Wed, Jul 25, 2018 at 1:01 PM msaunier <msaun...@citya.com> wrote: >> >> Hi Karl, >> >> >> >> I have add the snapshot and I’m spam with this error : >> >> >> >> FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: >> org/apache/commons/compress/utils/InputStreamStatistics >> >> java.lang.NoClassDefFoundError: >> org/apache/commons/compress/utils/InputStreamStatistics >> >> at >> org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) >> ~[?:?] >> >> at >> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) >> ~[?:?] >> >> at >> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) >> ~[?:?] >> >> at >> org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) >> ~[?:?] >> >> at >> org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) >> ~[?:?] >> >> at >> org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?] >> >> at >> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?] >> >> at >> org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) >> ~[?:?] >> >> at >> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) >> ~[?:?] >> >> at >> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) >> ~[?:?] >> >> at >> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) >> ~[?:?] >> >> at >> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) >> ~[?:?] >> >> at >> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) >> ~[?:?] >> >> at >> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) >> ~[?:?] >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) >> ~[mcf-agents.jar:?] >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) >> ~[mcf-agents.jar:?] >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) >> ~[mcf-agents.jar:?] >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) >> ~[mcf-agents.jar:?] >> >> at >> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) >> ~[mcf-pull-agent.jar:?] >> >> at >> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) >> ~[mcf-pull-agent.jar:?] >> >> at >> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) >> ~[?:?] >> >> at >> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) >> [mcf-pull-agent.jar:?] >> >> >> >> Maxence, >> >> >> >> >> >> *De :* Karl Wright [mailto:daddy...@gmail.com] >> *Envoyé :* mercredi 25 juillet 2018 13:12 >> *À :* user@manifoldcf.apache.org >> *Objet :* Re: Out of memory, one file bug i think >> >> >> >> Hi Maxence, >> >> >> >> Tomorrow (7/26) the POI project will be delivering a nightly build which >> should repair the Class Not Found exceptions. You will need to download it >> here: >> >> >> https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/ >> >> >> >> ... and replace all poi jars with the corresponding ones from the binary >> distribution. I believe the poi jars are all in connector-common-lib. Be >> sure to delete the old ones (or move them somewhere else) first. >> >> >> >> I don't know whether this will fix your out of memory problem however. >> Please let me know what's still not working and I can take it from there. >> >> >> >> Karl >> >> >> >> >> >> On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddy...@gmail.com> wrote: >> >> Out of memory errors are fatal, I'm afraid, because they corrupt not only >> the document in question but all others being processed at the same time. >> So those cannot be ignored. >> >> >> >> Tika should ignore documents that it cannot process, however, and that is >> a great enhancement request for them. >> >> >> >> Karl >> >> >> >> >> >> On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaun...@citya.com> wrote: >> >> Hi Karl, >> >> >> >> Okay. So today, I'm going to force ManifoldCF to run so that only the >> documents are left behind. >> >> In the future, could I ignore these mistakes? Because it makes the >> application crash, and in production it is not terrible as behavior. >> >> >> >> Thanks >> >> Maxence, >> >> >> >> >> >> *De :* Karl Wright [mailto:daddy...@gmail.com] >> *Envoyé :* mardi 24 juillet 2018 17:53 >> *À :* user@manifoldcf.apache.org >> *Objet :* Re: Out of memory, one file bug i think >> >> >> >> The problem isn't with images in general; it's with certain kinds of >> images. There are optional dependencies in Tika for some kinds of images >> that we cannot include in the MCF distribution because of licensing >> problems. I don't know which kinds these are but apparently you are trying >> to index some of them. >> >> You will need to find and download the right jar and put it in the >> connector-common-lib folder for this to work. >> >> >> >> Karl >> >> >> >> >> >> On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaun...@citya.com> wrote: >> >> On other crawl I extract images with sames parameters and I not have >> problems with images. They are index without errors. Images are necessary >> for this job. I try to recreate my job and test. >> >> >> >> Thanks, >> >> Maxence, >> >> >> >> >> >> >> >> >> >> *De :* Karl Wright [mailto:daddy...@gmail.com] >> *Envoyé :* mardi 24 juillet 2018 17:32 >> *À :* user@manifoldcf.apache.org >> *Objet :* Re: Out of memory, one file bug i think >> >> >> >> " java.lang.NoSuchMethodException: >> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, >> boolean)" >> >> >> >> This exception is occurring because you are trying to extract content >> from an image. In order for this to work you need a jar that isn't >> supplied with Tika for licensing reasons. Can you exclude images from your >> crawl? >> >> >> >> Karl >> >> >> >> >> >> On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaun...@citya.com> wrote: >> >> Hi Karl, >> >> >> >> With just connectors in debug I have that informations: >> >> >> >> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client >> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 >> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 >> >> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to >> authenticate using SASL (unknown error) >> >> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Socket connection established to >> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session >> >> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Session establishment complete on server >> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid = >> 0xff00000201970049, negotiated timeout = 40000 >> >> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated >> live nodes from ZooKeeper... (0) -> (2) >> >> [Thread-269948] INFO >> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at >> kemp-formation-solr:2181 ready >> >> java.lang.NoSuchMethodException: >> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, >> boolean) >> >> at java.lang.Class.getConstructor0(Class.java:3082) >> >> at java.lang.Class.getDeclaredConstructor(Class.java:2178) >> >> at >> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817) >> >> at >> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961) >> >> at >> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950) >> >> at >> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051) >> >> at >> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938) >> >> at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675) >> >> at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659) >> >> at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652) >> >> at >> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995) >> >> at >> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904) >> >> at >> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162) >> >> at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169) >> >> at >> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112) >> >> at >> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60) >> >> at >> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243) >> >> at >> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105) >> >> at >> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106) >> >> at >> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) >> >> at >> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) >> >> at >> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) >> >> at >> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) >> >> at >> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) >> >> at >> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) >> >> at >> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) >> >> at >> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) >> >> at >> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) >> >> at >> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) >> >> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 28024ms for sessionid 0x100000050ae004d >> >> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 28024ms for sessionid 0x100000050ae004d, closing socket >> connection and attempting reconnect >> >> [zkCallback-16-thread-2] WARN >> org.apache.solr.common.cloud.ConnectionManager - Watcher >> org.apache.solr.common.cloud.ConnectionManager@5382340 name: >> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent >> state:Disconnected type:None path:null path: null type: None >> >> [zkCallback-16-thread-2] WARN >> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected >> >> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to >> authenticate using SASL (unknown error) >> >> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Socket connection established to >> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session >> >> agents process ran out of memory - shutting down >> >> java.lang.OutOfMemoryError: GC overhead limit exceeded >> >> at >> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737) >> >> at >> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784) >> >> at >> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457) >> >> at >> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146) >> >> at >> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204) >> >> at >> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837) >> >> at >> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024) >> >> at >> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76) >> >> agents process ran out of memory - shutting down >> >> java.lang.OutOfMemoryError: GC overhead limit exceeded >> >> at >> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200) >> >> at >> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583) >> >> at >> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372) >> >> at >> org.apache.manifoldcf.core.database.Database.execute(Database.java:896) >> >> at >> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696) >> >> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Session establishment complete on server >> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid = >> 0x100000050ae004d, negotiated timeout = 40000 >> >> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped >> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345} >> >> agents process ran out of memory - shutting down >> >> java.lang.OutOfMemoryError: GC overhead limit exceeded >> >> at java.util.HashMap.resize(HashMap.java:704) >> >> at java.util.HashMap.putVal(HashMap.java:629) >> >> at java.util.HashMap.put(HashMap.java:612) >> >> at >> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154) >> >> at >> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204) >> >> at >> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837) >> >> at >> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642) >> >> at >> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581) >> >> at >> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453) >> >> at >> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570) >> >> agents process ran out of memory - shutting down >> >> java.lang.OutOfMemoryError: GC overhead limit exceeded >> >> at java.util.Arrays.copyOf(Arrays.java:3308) >> >> at java.util.BitSet.ensureCapacity(BitSet.java:337) >> >> at java.util.BitSet.expandTo(BitSet.java:352) >> >> at java.util.BitSet.set(BitSet.java:447) >> >> at >> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267) >> >> at >> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155) >> >> at >> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) >> >> at >> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) >> >> at >> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) >> >> at >> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) >> >> at >> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) >> >> at >> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46) >> >> at >> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82) >> >> at >> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140) >> >> at >> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287) >> >> at >> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279) >> >> at >> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306) >> >> at >> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431) >> >> at >> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380) >> >> at >> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520) >> >> at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown >> Source) >> >> at >> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown >> Source) >> >> at >> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown >> Source) >> >> at >> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown >> Source) >> >> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown >> Source) >> >> at org.apache.xerces.parsers.XML11Configuration.parse(Unknown >> Source) >> >> at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) >> >> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown >> Source) >> >> at >> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) >> >> at >> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344) >> >> at >> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167) >> >> at >> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135) >> >> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: >> 0x100000050ae004e closed >> >> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - >> EventThread shut down for session: 0x100000050ae004e >> >> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: >> 0x100000050ae004d closed >> >> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - >> EventThread shut down for session: 0x100000050ae004d >> >> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: >> 0x2000000b80d004a closed >> >> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - >> EventThread shut down for session: 0x2000000b80d004a >> >> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: >> 0x2000000b80d004b closed >> >> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - >> EventThread shut down for session: 0x2000000b80d004b >> >> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: >> 0xff00000201970046 closed >> >> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - >> EventThread shut down for session: 0xff00000201970046 >> >> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: >> 0x100000050ae004c closed >> >> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - >> EventThread shut down for session: 0x100000050ae004c >> >> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - >> Stopped >> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} >> >> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - >> Stopped >> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} >> >> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: >> 0x2000000b80d004c closed >> >> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - >> EventThread shut down for session: 0x2000000b80d004c >> >> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: >> 0xff00000201970048 closed >> >> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - >> EventThread shut down for session: 0xff00000201970048 >> >> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: >> 0xff00000201970049 closed >> >> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - >> EventThread shut down for session: 0xff00000201970049 >> >> >> >> I have unactivate history to gain performances. So, can I find the last >> file with SQL request? >> >> >> >> Maxence, >> >> >> >> *De :* Karl Wright [mailto:daddy...@gmail.com] >> *Envoyé :* mardi 24 juillet 2018 16:04 >> *À :* user@manifoldcf.apache.org >> *Objet :* Re: Out of memory, one file bug i think >> >> >> >> Hi Maxence, >> >> >> >> You would want to turn on connector debugging INSTEAD of the debugging >> you've turned on, which is very noisy and not helpful. >> >> >> >> In global properties: org.apache.manifoldcf.connectors value DEBUG >> >> >> >> Karl >> >> >> >> >> >> On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaun...@citya.com> wrote: >> >> With debug: >> >> >> >> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 28034ms for sessionid 0x100000050ae0049 >> >> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 28034ms for sessionid 0x100000050ae0049, closing socket >> connection and attempting reconnect >> >> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 27708ms for sessionid 0xff00000201970044 >> >> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 27737ms for sessionid 0xff00000201970043 >> >> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 27737ms for sessionid 0xff00000201970043, closing socket >> connection and attempting reconnect >> >> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 28316ms for sessionid 0x100000050ae004b >> >> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 28394ms for sessionid 0x2000000b80d0047 >> >> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket >> connection and attempting reconnect >> >> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 27708ms for sessionid 0xff00000201970044, closing socket >> connection and attempting reconnect >> >> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to >> authenticate using SASL (unknown error) >> >> agents process ran out of memory - shutting down >> >> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Socket connection established to >> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session >> >> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 36805ms for sessionid 0x2000000b80d0046 >> >> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket >> connection and attempting reconnect >> >> java.lang.OutOfMemoryError: GC overhead limit exceeded >> >> at java.lang.StringBuilder.toString(StringBuilder.java:407) >> >> at >> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849) >> >> at >> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483) >> >> at >> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454) >> >> at >> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131) >> >> at >> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204) >> >> at >> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862) >> >> at >> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236) >> >> at >> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133) >> >> at >> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862) >> >> at >> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108) >> >> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to >> authenticate using SASL (unknown error) >> >> agents process ran out of memory - shutting down >> >> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 27763ms for sessionid 0x100000050ae004a >> >> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 27763ms for sessionid 0x100000050ae004a, closing socket >> connection and attempting reconnect >> >> [zkCallback-3-thread-7] WARN >> org.apache.solr.common.cloud.ConnectionManager - Watcher >> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name: >> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent >> state:Disconnected type:None path:null path: null type: None >> >> [zkCallback-3-thread-7] WARN >> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected >> >> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard >> from server in 28316ms for sessionid 0x100000050ae004b, closing socket >> connection and attempting reconnect >> >> java.lang.OutOfMemoryError: GC overhead limit exceeded >> >> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Socket connection established to >> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session >> >> [zkCallback-11-thread-5] WARN >> org.apache.solr.common.cloud.ConnectionManager - Watcher >> org.apache.solr.common.cloud.ConnectionManager@53181a58 name: >> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent >> state:Disconnected type:None path:null path: null type: None >> >> [zkCallback-11-thread-5] WARN >> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected >> >> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN >> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, >> session 0xff00000201970043 has expired >> >> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, >> session 0xff00000201970043 has expired, closing socket connection >> >> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - >> EventThread shut down for session: 0xff00000201970043 >> >> [zkCallback-11-thread-2] WARN >> org.apache.solr.common.cloud.ConnectionManager - Watcher >> org.apache.solr.common.cloud.ConnectionManager@53181a58 name: >> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent >> state:Expired type:None path:null path: null type: None >> >> [zkCallback-11-thread-2] WARN >> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper >> session was expired. Attempting to reconnect to recover relationship with >> ZooKeeper... >> >> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN >> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, >> session 0x100000050ae0049 has expired >> >> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO >> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, >> session 0x100000050ae0049 has expired, closing socket connection >> >> [zkCallback-11-thread-2] WARN >> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired >> - starting a new one... >> >> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating >> client connection, connectString=kemp-formation-solr:2181 >> sessionTimeout=60000 >> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 >> >> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - >> EventThread shut down for session: 0x100000050ae0049 >> >> [zkCallback-3-thread-4] WARN >> org.apache.solr.common.cloud.ConnectionManager - Watcher >> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name: >> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent >> state:Expired type:None path:null path: null type: None >> >> [zkCallback-3-thread-4] WARN >> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper >> session was expired. Attempting to r >> >>