The ContentLimiter truncates documents. That's not what you want. Use the Allowed Documents transformer.
Karl On Thu, Jul 26, 2018 at 10:06 AM msaunier <msaun...@citya.com> wrote: > I have add a Content limiter transformation before Tika extractor. It’s > very very slow now. It’s normal? > > > > Maxence, > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* mercredi 25 juillet 2018 19:15 > *À :* user@manifoldcf.apache.org > *Objet :* ***UNCHECKED*** Re: Out of memory, one file bug i think > > > > It looks like you are still running out of memory. I would love to know > what document it was that doing that. I suspect it is very large already, > and for some reason it cannot be streamed. > > > > Karl > > > > > > On Wed, Jul 25, 2018 at 1:13 PM Karl Wright <daddy...@gmail.com> wrote: > > Hi Maxence, > > > > The second exception is occurring because processing is still occurring > while the JVM is shutting down; it can be ignored. > > > > Karl > > > > > > On Wed, Jul 25, 2018 at 1:01 PM msaunier <msaun...@citya.com> wrote: > > Hi Karl, > > > > I have add the snapshot and I’m spam with this error : > > > > FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed: > org/apache/commons/compress/utils/InputStreamStatistics > > java.lang.NoClassDefFoundError: > org/apache/commons/compress/utils/InputStreamStatistics > > at > org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62) > ~[?:?] > > at > org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147) > ~[?:?] > > at > org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34) > ~[?:?] > > at > org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66) > ~[?:?] > > at > org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255) > ~[?:?] > > at > org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?] > > at > org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?] > > at > org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197) > ~[?:?] > > at > org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127) > ~[?:?] > > at > org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88) > ~[?:?] > > at > org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84) > ~[?:?] > > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116) > ~[?:?] > > at > org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) > ~[?:?] > > at > org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) > ~[?:?] > > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) > ~[mcf-agents.jar:?] > > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) > ~[mcf-agents.jar:?] > > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) > ~[mcf-agents.jar:?] > > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) > ~[mcf-agents.jar:?] > > at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) > ~[mcf-pull-agent.jar:?] > > at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) > ~[mcf-pull-agent.jar:?] > > at > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) > ~[?:?] > > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > [mcf-pull-agent.jar:?] > > > > Maxence, > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* mercredi 25 juillet 2018 13:12 > *À :* user@manifoldcf.apache.org > *Objet :* Re: Out of memory, one file bug i think > > > > Hi Maxence, > > > > Tomorrow (7/26) the POI project will be delivering a nightly build which > should repair the Class Not Found exceptions. You will need to download it > here: > > > https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/ > > > > ... and replace all poi jars with the corresponding ones from the binary > distribution. I believe the poi jars are all in connector-common-lib. Be > sure to delete the old ones (or move them somewhere else) first. > > > > I don't know whether this will fix your out of memory problem however. > Please let me know what's still not working and I can take it from there. > > > > Karl > > > > > > On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddy...@gmail.com> wrote: > > Out of memory errors are fatal, I'm afraid, because they corrupt not only > the document in question but all others being processed at the same time. > So those cannot be ignored. > > > > Tika should ignore documents that it cannot process, however, and that is > a great enhancement request for them. > > > > Karl > > > > > > On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaun...@citya.com> wrote: > > Hi Karl, > > > > Okay. So today, I'm going to force ManifoldCF to run so that only the > documents are left behind. > > In the future, could I ignore these mistakes? Because it makes the > application crash, and in production it is not terrible as behavior. > > > > Thanks > > Maxence, > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* mardi 24 juillet 2018 17:53 > *À :* user@manifoldcf.apache.org > *Objet :* Re: Out of memory, one file bug i think > > > > The problem isn't with images in general; it's with certain kinds of > images. There are optional dependencies in Tika for some kinds of images > that we cannot include in the MCF distribution because of licensing > problems. I don't know which kinds these are but apparently you are trying > to index some of them. > > You will need to find and download the right jar and put it in the > connector-common-lib folder for this to work. > > > > Karl > > > > > > On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaun...@citya.com> wrote: > > On other crawl I extract images with sames parameters and I not have > problems with images. They are index without errors. Images are necessary > for this job. I try to recreate my job and test. > > > > Thanks, > > Maxence, > > > > > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* mardi 24 juillet 2018 17:32 > *À :* user@manifoldcf.apache.org > *Objet :* Re: Out of memory, one file bug i think > > > > " java.lang.NoSuchMethodException: > org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, > boolean)" > > > > This exception is occurring because you are trying to extract content from > an image. In order for this to work you need a jar that isn't supplied > with Tika for licensing reasons. Can you exclude images from your crawl? > > > > Karl > > > > > > On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaun...@citya.com> wrote: > > Hi Karl, > > > > With just connectors in debug I have that informations: > > > > [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client > connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000 > watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22 > > [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Opening socket connection to server > kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to > authenticate using SASL (unknown error) > > [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Socket connection established to > kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session > > [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Session establishment complete on server > kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid = > 0xff00000201970049, negotiated timeout = 40000 > > [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated > live nodes from ZooKeeper... (0) -> (2) > > [Thread-269948] INFO > org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at > kemp-formation-solr:2181 ready > > java.lang.NoSuchMethodException: > org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType, > boolean) > > at java.lang.Class.getConstructor0(Class.java:3082) > > at java.lang.Class.getDeclaredConstructor(Class.java:2178) > > at > org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817) > > at > org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961) > > at > org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950) > > at > org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051) > > at > org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938) > > at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675) > > at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659) > > at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652) > > at > org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995) > > at > org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904) > > at > org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162) > > at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169) > > at > org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112) > > at > org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60) > > at > org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243) > > at > org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105) > > at > org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106) > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > > at > org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74) > > at > org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235) > > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226) > > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077) > > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708) > > at > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756) > > at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583) > > at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548) > > at > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939) > > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > > [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 28024ms for sessionid 0x100000050ae004d > > [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 28024ms for sessionid 0x100000050ae004d, closing socket > connection and attempting reconnect > > [zkCallback-16-thread-2] WARN > org.apache.solr.common.cloud.ConnectionManager - Watcher > org.apache.solr.common.cloud.ConnectionManager@5382340 name: > ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent > state:Disconnected type:None path:null path: null type: None > > [zkCallback-16-thread-2] WARN > org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected > > [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Opening socket connection to server > kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to > authenticate using SASL (unknown error) > > [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Socket connection established to > kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session > > agents process ran out of memory - shutting down > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > at > org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737) > > at > org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784) > > at > org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457) > > at > org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146) > > at > org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204) > > at > org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837) > > at > org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024) > > at > org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76) > > agents process ran out of memory - shutting down > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > at > org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200) > > at > org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583) > > at > org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372) > > at > org.apache.manifoldcf.core.database.Database.execute(Database.java:896) > > at > org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696) > > [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Session establishment complete on server > kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid = > 0x100000050ae004d, negotiated timeout = 40000 > > [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped > ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345} > > agents process ran out of memory - shutting down > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > at java.util.HashMap.resize(HashMap.java:704) > > at java.util.HashMap.putVal(HashMap.java:629) > > at java.util.HashMap.put(HashMap.java:612) > > at > org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154) > > at > org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204) > > at > org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837) > > at > org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642) > > at > org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581) > > at > org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453) > > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570) > > agents process ran out of memory - shutting down > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > at java.util.Arrays.copyOf(Arrays.java:3308) > > at java.util.BitSet.ensureCapacity(BitSet.java:337) > > at java.util.BitSet.expandTo(BitSet.java:352) > > at java.util.BitSet.set(BitSet.java:447) > > at > de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267) > > at > org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155) > > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) > > at > org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) > > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) > > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) > > at > org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) > > at > org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46) > > at > org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82) > > at > org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140) > > at > org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287) > > at > org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279) > > at > org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306) > > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431) > > at > org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380) > > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520) > > at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown > Source) > > at > org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown > Source) > > at > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown > Source) > > at > org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown > Source) > > at org.apache.xerces.parsers.XML11Configuration.parse(Unknown > Source) > > at org.apache.xerces.parsers.XML11Configuration.parse(Unknown > Source) > > at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) > > at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown > Source) > > at > org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source) > > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344) > > at > org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167) > > at > org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135) > > [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: > 0x100000050ae004e closed > > [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn - > EventThread shut down for session: 0x100000050ae004e > > [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: > 0x100000050ae004d closed > > [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn - > EventThread shut down for session: 0x100000050ae004d > > [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: > 0x2000000b80d004a closed > > [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn - > EventThread shut down for session: 0x2000000b80d004a > > [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: > 0x2000000b80d004b closed > > [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn - > EventThread shut down for session: 0x2000000b80d004b > > [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: > 0xff00000201970046 closed > > [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn - > EventThread shut down for session: 0xff00000201970046 > > [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: > 0x100000050ae004c closed > > [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn - > EventThread shut down for session: 0x100000050ae004c > > [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - > Stopped > o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war} > > [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler - > Stopped > o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war} > > [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: > 0x2000000b80d004c closed > > [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn - > EventThread shut down for session: 0x2000000b80d004c > > [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: > 0xff00000201970048 closed > > [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn - > EventThread shut down for session: 0xff00000201970048 > > [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: > 0xff00000201970049 closed > > [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn - > EventThread shut down for session: 0xff00000201970049 > > > > I have unactivate history to gain performances. So, can I find the last > file with SQL request? > > > > Maxence, > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* mardi 24 juillet 2018 16:04 > *À :* user@manifoldcf.apache.org > *Objet :* Re: Out of memory, one file bug i think > > > > Hi Maxence, > > > > You would want to turn on connector debugging INSTEAD of the debugging > you've turned on, which is very noisy and not helpful. > > > > In global properties: org.apache.manifoldcf.connectors value DEBUG > > > > Karl > > > > > > On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaun...@citya.com> wrote: > > With debug: > > > > [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 28034ms for sessionid 0x100000050ae0049 > > [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 28034ms for sessionid 0x100000050ae0049, closing socket > connection and attempting reconnect > > [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 27708ms for sessionid 0xff00000201970044 > > [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 27737ms for sessionid 0xff00000201970043 > > [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 27737ms for sessionid 0xff00000201970043, closing socket > connection and attempting reconnect > > [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 28316ms for sessionid 0x100000050ae004b > > [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 28394ms for sessionid 0x2000000b80d0047 > > [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 28394ms for sessionid 0x2000000b80d0047, closing socket > connection and attempting reconnect > > [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 27708ms for sessionid 0xff00000201970044, closing socket > connection and attempting reconnect > > [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Opening socket connection to server > kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to > authenticate using SASL (unknown error) > > agents process ran out of memory - shutting down > > [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Socket connection established to > kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session > > [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 36805ms for sessionid 0x2000000b80d0046 > > [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 36805ms for sessionid 0x2000000b80d0046, closing socket > connection and attempting reconnect > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > at java.lang.StringBuilder.toString(StringBuilder.java:407) > > at > org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849) > > at > org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483) > > at > org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454) > > at > org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131) > > at > org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204) > > at > org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862) > > at > org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236) > > at > org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133) > > at > org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862) > > at > org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108) > > [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Opening socket connection to server > kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to > authenticate using SASL (unknown error) > > agents process ran out of memory - shutting down > > [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 27763ms for sessionid 0x100000050ae004a > > [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 27763ms for sessionid 0x100000050ae004a, closing socket > connection and attempting reconnect > > [zkCallback-3-thread-7] WARN > org.apache.solr.common.cloud.ConnectionManager - Watcher > org.apache.solr.common.cloud.ConnectionManager@7a5c701e name: > ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent > state:Disconnected type:None path:null path: null type: None > > [zkCallback-3-thread-7] WARN > org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected > > [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard > from server in 28316ms for sessionid 0x100000050ae004b, closing socket > connection and attempting reconnect > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Socket connection established to > kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session > > [zkCallback-11-thread-5] WARN > org.apache.solr.common.cloud.ConnectionManager - Watcher > org.apache.solr.common.cloud.ConnectionManager@53181a58 name: > ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent > state:Disconnected type:None path:null path: null type: None > > [zkCallback-11-thread-5] WARN > org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected > > [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN > org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, > session 0xff00000201970043 has expired > > [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, > session 0xff00000201970043 has expired, closing socket connection > > [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn - > EventThread shut down for session: 0xff00000201970043 > > [zkCallback-11-thread-2] WARN > org.apache.solr.common.cloud.ConnectionManager - Watcher > org.apache.solr.common.cloud.ConnectionManager@53181a58 name: > ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent > state:Expired type:None path:null path: null type: None > > [zkCallback-11-thread-2] WARN > org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper > session was expired. Attempting to reconnect to recover relationship with > ZooKeeper... > > [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN > org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, > session 0x100000050ae0049 has expired > > [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO > org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, > session 0x100000050ae0049 has expired, closing socket connection > > [zkCallback-11-thread-2] WARN > org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired > - starting a new one... > > [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating > client connection, connectString=kemp-formation-solr:2181 > sessionTimeout=60000 > watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58 > > [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn - > EventThread shut down for session: 0x100000050ae0049 > > [zkCallback-3-thread-4] WARN > org.apache.solr.common.cloud.ConnectionManager - Watcher > org.apache.solr.common.cloud.ConnectionManager@7a5c701e name: > ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent > state:Expired type:None path:null path: null type: None > > [zkCallback-3-thread-4] WARN > org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper > session was expired. Attempting to reconnect to recover relationship with > ZooKeeper... > > [zkCallback-3-thread-4] WARN > org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired > - starting a new one... > > [zkCallback-3-thread-4] INFO org.apache.zookeeper.ZooKeeper - Initiating > client connection, connectString=kemp-formation-solr:2181 > sessionTimeout=60000 > watcher=org.apache.solr.common.cloud.ConnectionManager@7a5c701e > > [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] > INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server > kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to > authenticate using SASL (unknown error) > > [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] > INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server > kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to > authenticate using SASL (unknown error) > > [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] > INFO org.apache.zookeeper.ClientCnxn - Socket connection established to > kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session > > [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] > INFO org.apache.zookeeper.ClientCnxn - Socket connection established to > kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session > > [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped > ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345} > > [zkCallback-3-thread-4-SendThread(kemp-formation-solr.citya.local:2181)] > INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on > server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid = > 0x2000000b80d0049, negotiated timeout = 40000 > > [zkCallback-11-thread-2-SendThread(kemp-formation-solr.citya.local:2181)] > INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on > server kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid = > 0xff00000201970045, negotiated timeout = 40000 > > agents process ran out of memory - shutting down > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > agents process ran out of memory - shutting down > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > at java.util.HashMap.newNode(HashMap.java:1747) > > at java.util.HashMap.putVal(HashMap.java:631) > > at java.util.HashMap.put(HashMap.java:612) > > at jcifs.util.transport.Transport.sendrecv(Transport.java:66) > > at jcifs.smb.SmbTransport.send(SmbTransport.java:661) > > at jcifs.smb.SmbSession.send(SmbSession.java:238) > > at jcifs.smb.SmbTree.send(SmbTree.java:119) > > at jcifs.smb.SmbFile.send(SmbFile.java:776) > > at > jcifs.smb.SmbFileInputStream.readDirect(SmbFileInputStream.java:181) > > at jcifs.smb.SmbFileInputStream.read(SmbFileInputStream.java:142) > > at > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:903) > > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > > [zkCallback-11-thread-2] INFO > org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper > reestablished. > > [zkCallback-3-thread-4] INFO > org.apache.solr.common.cloud.ConnectionManager - Connection with ZooKeeper > reestablished. > > agents process ran out of memory - shutting down > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > [zkCallback-11-thread-2] INFO > org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to > ZooKeeper > > [zkCallback-11-thread-2] INFO > org.apache.solr.common.cloud.ConnectionManager - Connected:true > > [zkCallback-3-thread-4] INFO > org.apache.solr.common.cloud.DefaultConnectionStrategy - Reconnected to > ZooKeeper > > [zkCallback-3-thread-4] INFO > org.apache.solr.common.cloud.ConnectionManager - Connected:true > > [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session: > 0x2000000b80d0046 closed > > [zkCallback-21-thread-2] WARN > org.apache.solr.common.cloud.ConnectionManager - Watcher > org.apache.solr.common.cloud.ConnectionManager@381a7557 name: > ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent > state:Disconnected type:None path:null path: null type: None > > [zkCallback-21-thread-2] WARN > org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected > > [Thread-7538-EventT > >