Hi Maxence,

I am wondering whether you moved any jars from dist/connector-common-lib to
dist/lib?  If you did this, you will mess up the ability of any of the Tika
jars to find their dependencies.  This also explains why commons-compress
cannot be found; it's in connector-common-lib.  It sounds like you may have
put the new poi jars in the wrong place?  They should *all* be in
connector-common-lib too.

Karl


On Thu, Jul 26, 2018 at 6:23 AM Karl Wright <daddy...@gmail.com> wrote:

> Hi Maxence,
>
> The following error:
>
> >>>>>>
>
> FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed:
> org/apache/poi/POIXMLTextExtractor
>
> java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor
>
>         at
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
> ~[?:?]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ~[?:?]
>
>         at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
> ~[?:?]
>
>         at
> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
> ~[?:?]
>
> <<<<<<
>
> .... seems to be the result of putting new POI jars down that are not
> compatible fully with the version of Tika that's there.  Unfortunately,
> this cannot be addressed right now in any way I can think of.  Tika's
> dependencies are legion and they change all the time.
>
> The only thing we can really do is wait for: (1) POI to release their new
> software, and then (2) Tika to release a new release that depends on it.
>
> Karl
>
>
> On Thu, Jul 26, 2018 at 5:33 AM msaunier <msaun...@citya.com> wrote:
>
>> Hello Karl,
>>
>>
>>
>> For the moment, it working.
>>
>>
>>
>> I have write this errors but they are not FATAL:
>>
>>
>>
>> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Checking '*'
>> against '/69B_citya_barioz_immobilier/02894_berthollier/Formation/'
>>
>> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Match found.
>>
>> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Leaving
>> checkInclude for
>> 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/'
>>
>> DEBUG 2018-07-26T11:30:32,220 (Worker thread '4') - JCIFS: Recorded path
>> is
>> 'smb://srv-fichiersqg/Social/_SOCIAL_CABINETS/69B_citya_barioz_immobilier/02894_berthollier/Formation/'
>> and is included.
>>
>> FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed:
>> org/apache/poi/POIXMLTextExtractor
>>
>> java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$MonitoredAddActivityWrapper.sendDocument(IncrementalIngester.java:3471)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.contentlimiter.ContentLimiter.addOrReplaceDocumentWithException(ContentLimiter.java:161)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>> ~[mcf-pull-agent.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>> ~[mcf-pull-agent.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>> [mcf-pull-agent.jar:?]
>>
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.poi.POIXMLTextExtractor
>>
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> ~[?:1.8.0_171]
>>
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> ~[?:1.8.0_171]
>>
>>         at
>> java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814)
>> ~[?:1.8.0_171]
>>
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ~[?:1.8.0_171]
>>
>>         ... 18 more
>>
>> AND
>>
>>
>>
>> Starting crawler...
>>
>> juil. 26, 2018 11:29:01 AM
>> org.apache.tika.config.InitializableProblemHandler$3
>> handleInitializableProblem
>>
>> AVERTISSEMENT: JBIG2ImageReader not loaded. jbig2 files will be ignored
>>
>> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>>
>> for optional dependencies.
>>
>> TIFFImageWriter not loaded. tiff files will not be processed
>>
>> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>>
>> for optional dependencies.
>>
>> J2KImageReader not loaded. JPEG2000 files will not be processed.
>>
>> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
>>
>> for optional dependencies.
>>
>>
>>
>> juil. 26, 2018 11:29:01 AM
>> org.apache.tika.config.InitializableProblemHandler$3
>> handleInitializableProblem
>>
>> AVERTISSEMENT: org.xerial's sqlite-jdbc is not loaded.
>>
>> Please provide the jar on your classpath to parse sqlite files.
>>
>> See tika-parsers/pom.xml for the correct version.
>>
>>
>>
>> Maxence,
>>
>>
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddy...@gmail.com]
>> *Envoyé :* mercredi 25 juillet 2018 19:09
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> That's what I was afraid of.  The new poi jars have dependencies we
>> haven't accounted for yet.
>>
>>
>>
>> Can you download apache-commons-compress jar (latest version should be
>> OK) and also put that in connector-common-lib?  Thanks!!
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Jul 25, 2018 at 1:01 PM msaunier <msaun...@citya.com> wrote:
>>
>> Hi Karl,
>>
>>
>>
>> I have add the snapshot and I’m spam with this error :
>>
>>
>>
>> FATAL 2018-07-25T16:43:04,599 (Worker thread '0') - Error tossed:
>> org/apache/commons/compress/utils/InputStreamStatistics
>>
>> java.lang.NoClassDefFoundError:
>> org/apache/commons/compress/utils/InputStreamStatistics
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipArchiveThresholdInputStream.<init>(ZipArchiveThresholdInputStream.java:62)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:147)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipSecureFile.getInputStream(ZipSecureFile.java:34)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.util.ZipFileZipEntrySource.getInputStream(ZipFileZipEntrySource.java:66)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:255)
>> ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:725) ~[?:?]
>>
>>         at
>> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:238) ~[?:?]
>>
>>         at
>> org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:197)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:127)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
>> ~[?:?]
>>
>>         at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:116)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>> ~[mcf-agents.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>> ~[mcf-pull-agent.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>> ~[mcf-pull-agent.jar:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>> ~[?:?]
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>> [mcf-pull-agent.jar:?]
>>
>>
>>
>> Maxence,
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddy...@gmail.com]
>> *Envoyé :* mercredi 25 juillet 2018 13:12
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> Hi Maxence,
>>
>>
>>
>> Tomorrow (7/26) the POI project will be delivering a nightly build which
>> should repair the Class Not Found exceptions.  You will need to download it
>> here:
>>
>>
>> https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
>>
>>
>>
>> ... and replace all poi jars with the corresponding ones from the binary
>> distribution.  I believe the poi jars are all in connector-common-lib.  Be
>> sure to delete the old ones (or move them somewhere else) first.
>>
>>
>>
>> I don't know whether this will fix your out of memory problem however.
>> Please let me know what's still not working and I can take it from there.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Jul 25, 2018 at 6:03 AM Karl Wright <daddy...@gmail.com> wrote:
>>
>> Out of memory errors are fatal, I'm afraid, because they corrupt not only
>> the document in question but all others being processed at the same time.
>> So those cannot be ignored.
>>
>>
>>
>> Tika should ignore documents that it cannot process, however, and that is
>> a great enhancement request for them.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Jul 25, 2018 at 3:39 AM msaunier <msaun...@citya.com> wrote:
>>
>> Hi Karl,
>>
>>
>>
>> Okay. So today, I'm going to force ManifoldCF to run so that only the
>> documents are left behind.
>>
>> In the future, could I ignore these mistakes? Because it makes the
>> application crash, and in production it is not terrible as behavior.
>>
>>
>>
>> Thanks
>>
>> Maxence,
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddy...@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 17:53
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> The problem isn't with images in general; it's with certain kinds of
>> images.  There are optional dependencies in Tika for some kinds of images
>> that we cannot include in the MCF distribution because of licensing
>> problems.  I don't know which kinds these are but apparently you are trying
>> to index some of them.
>>
>> You will need to find and download the right jar and put it in the
>> connector-common-lib folder for this to work.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 11:36 AM msaunier <msaun...@citya.com> wrote:
>>
>> On other crawl I extract images with sames parameters and I not have
>> problems with images. They are index without errors. Images are necessary
>> for this job. I try to recreate my job and test.
>>
>>
>>
>> Thanks,
>>
>> Maxence,
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *De :* Karl Wright [mailto:daddy...@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 17:32
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> " java.lang.NoSuchMethodException:
>> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
>> boolean)"
>>
>>
>>
>> This exception is occurring because you are trying to extract content
>> from an image.  In order for this to work you need a jar that isn't
>> supplied with Tika for licensing reasons.  Can you exclude images from your
>> crawl?
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 10:32 AM msaunier <msaun...@citya.com> wrote:
>>
>> Hi Karl,
>>
>>
>>
>> With just connectors in debug I have that informations:
>>
>>
>>
>> [Thread-269948] INFO org.apache.zookeeper.ZooKeeper - Initiating client
>> connection, connectString=kemp-formation-solr:2181 sessionTimeout=60000
>> watcher=org.apache.solr.common.cloud.SolrZkClient$3@3c351b22
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [Thread-269948-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
>> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0xff00000201970049, negotiated timeout = 40000
>>
>> [Thread-269948] INFO org.apache.solr.common.cloud.ZkStateReader - Updated
>> live nodes from ZooKeeper... (0) -> (2)
>>
>> [Thread-269948] INFO
>> org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider - Cluster at
>> kemp-formation-solr:2181 ready
>>
>> java.lang.NoSuchMethodException:
>> org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTPictureBaseImpl.<init>(org.apache.xmlbeans.SchemaType,
>> boolean)
>>
>>         at java.lang.Class.getConstructor0(Class.java:3082)
>>
>>         at java.lang.Class.getDeclaredConstructor(Class.java:2178)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.getJavaImplConstructor2(SchemaTypeImpl.java:1817)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedSubclass(SchemaTypeImpl.java:1961)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createUnattachedNode(SchemaTypeImpl.java:1950)
>>
>>         at
>> org.apache.xmlbeans.impl.schema.SchemaTypeImpl.createElementType(SchemaTypeImpl.java:1051)
>>
>>         at
>> org.apache.xmlbeans.impl.values.XmlObjectBase.create_element_user(XmlObjectBase.java:938)
>>
>>         at org.apache.xmlbeans.impl.store.Xobj.getUser(Xobj.java:1675)
>>
>>         at org.apache.xmlbeans.impl.store.Cur.getUser(Cur.java:2659)
>>
>>         at org.apache.xmlbeans.impl.store.Cur.getObject(Cur.java:2652)
>>
>>         at
>> org.apache.xmlbeans.impl.store.Cursor._getObject(Cursor.java:995)
>>
>>         at
>> org.apache.xmlbeans.impl.store.Cursor.getObject(Cursor.java:2904)
>>
>>         at
>> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:162)
>>
>>         at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:169)
>>
>>         at
>> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:112)
>>
>>         at
>> org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:60)
>>
>>         at
>> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:243)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:105)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:106)
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>>
>>         at
>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>>
>>         at
>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaParser.parse(TikaParser.java:74)
>>
>>         at
>> org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:235)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3226)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3077)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2708)
>>
>>         at
>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:756)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1583)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1548)
>>
>>         at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:939)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28024ms for sessionid 0x100000050ae004d
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28024ms for sessionid 0x100000050ae004d, closing socket
>> connection and attempting reconnect
>>
>> [zkCallback-16-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@5382340 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-16-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:737)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:784)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1457)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.getJobsReadyForInactivity(JobManager.java:8024)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.JobNotificationThread.run(JobNotificationThread.java:76)
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1200)
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:1583)
>>
>>         at
>> org.postgresql.jdbc.PgConnection.prepareStatement(PgConnection.java:372)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.execute(Database.java:896)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
>>
>> [Thread-35854-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Session establishment complete on server
>> kemp-formation-solr.citya.local/192.168.37.107:2181, sessionid =
>> 0x100000050ae004d, negotiated timeout = 40000
>>
>> [Thread-490] INFO org.eclipse.jetty.server.ServerConnector - Stopped
>> ServerConnector@2a640157{HTTP/1.1}{0.0.0.0:8345}
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.HashMap.resize(HashMap.java:704)
>>
>>         at java.util.HashMap.putVal(HashMap.java:629)
>>
>>         at java.util.HashMap.put(HashMap.java:612)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:154)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:837)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.processParentHashSet(JobManager.java:5642)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.calculateAffectedRestoreCarrydownChildren(JobManager.java:5581)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.finishDocuments(JobManager.java:5453)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:570)
>>
>> agents process ran out of memory - shutting down
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.util.Arrays.copyOf(Arrays.java:3308)
>>
>>         at java.util.BitSet.ensureCapacity(BitSet.java:337)
>>
>>         at java.util.BitSet.expandTo(BitSet.java:352)
>>
>>         at java.util.BitSet.set(BitSet.java:447)
>>
>>         at
>> de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267)
>>
>>         at
>> org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>>
>>         at
>> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>>
>>         at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:279)
>>
>>         at
>> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:306)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$SheetTextAsHTML.cell(XSSFExcelExtractorDecorator.java:431)
>>
>>         at
>> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler.endElement(XSSFSheetXMLHandler.java:380)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator$XSSFSheetInterestingPartsCapturer.endElement(XSSFExcelExtractorDecorator.java:520)
>>
>>         at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
>> Source)
>>
>>         at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>>
>>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
>> Source)
>>
>>         at
>> org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.processSheet(XSSFExcelExtractorDecorator.java:344)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.XSSFExcelExtractorDecorator.buildXHTML(XSSFExcelExtractorDecorator.java:167)
>>
>>         at
>> org.apache.tika.parser.microsoft.ooxml.AbstractOOXMLExtractor.getXHTML(AbstractOOXMLExtractor.java:135)
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004e closed
>>
>> [Thread-257943-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004e
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004d closed
>>
>> [Thread-35854-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004d
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004a closed
>>
>> [Thread-8765-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004a
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004b closed
>>
>> [Thread-35853-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004b
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970046 closed
>>
>> [Thread-6991-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970046
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x100000050ae004c closed
>>
>> [Thread-8699-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae004c
>>
>> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
>> Stopped
>> o.e.j.w.WebAppContext@44d52de2{/mcf-api-service,file:/tmp/jetty-0.0.0.0-8345-mcf-api-service.war-_mcf-api-service-any-559052738855414857.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-api-service.war}
>>
>> [Thread-490] INFO org.eclipse.jetty.server.handler.ContextHandler -
>> Stopped
>> o.e.j.w.WebAppContext@60410cd{/mcf-authority-service,file:/tmp/jetty-0.0.0.0-8345-mcf-authority-service.war-_mcf-authority-service-any-927770358411352606.dir/webapp/,UNAVAILABLE}{/opt/manifoldcf-trunk/bin/./../web-proprietary/war/mcf-authority-service.war}
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0x2000000b80d004c closed
>>
>> [Thread-262666-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x2000000b80d004c
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970048 closed
>>
>> [Thread-244171-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970048
>>
>> [Thread-490] INFO org.apache.zookeeper.ZooKeeper - Session:
>> 0xff00000201970049 closed
>>
>> [Thread-269948-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970049
>>
>>
>>
>> I have unactivate history to gain performances. So, can I find the last
>> file with SQL request?
>>
>>
>>
>> Maxence,
>>
>>
>>
>> *De :* Karl Wright [mailto:daddy...@gmail.com]
>> *Envoyé :* mardi 24 juillet 2018 16:04
>> *À :* user@manifoldcf.apache.org
>> *Objet :* Re: Out of memory, one file bug i think
>>
>>
>>
>> Hi Maxence,
>>
>>
>>
>> You would want to turn on connector debugging INSTEAD of the debugging
>> you've turned on, which is very noisy and not helpful.
>>
>>
>>
>> In global properties: org.apache.manifoldcf.connectors value DEBUG
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Tue, Jul 24, 2018 at 9:12 AM msaunier <msaun...@citya.com> wrote:
>>
>> With debug:
>>
>>
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28034ms for sessionid 0x100000050ae0049
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28034ms for sessionid 0x100000050ae0049, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27708ms for sessionid 0xff00000201970044
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27737ms for sessionid 0xff00000201970043
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27737ms for sessionid 0xff00000201970043, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28316ms for sessionid 0x100000050ae004b
>>
>> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28394ms for sessionid 0x2000000b80d0047
>>
>> [Thread-7602-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28394ms for sessionid 0x2000000b80d0047, closing socket
>> connection and attempting reconnect
>>
>> [Thread-31532-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27708ms for sessionid 0xff00000201970044, closing socket
>> connection and attempting reconnect
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> agents process ran out of memory - shutting down
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 36805ms for sessionid 0x2000000b80d0046
>>
>> [Thread-7538-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 36805ms for sessionid 0x2000000b80d0046, closing socket
>> connection and attempting reconnect
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.readSharedData(CacheManager.java:849)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.hasExpired(CacheManager.java:483)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.lookupObject(CacheManager.java:454)
>>
>>         at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:131)
>>
>>         at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:204)
>>
>>         at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:862)
>>
>>         at
>> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:236)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.Jobs.deletingJobsPresent(Jobs.java:3133)
>>
>>         at
>> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDeletableDocuments(JobManager.java:1862)
>>
>>         at
>> org.apache.manifoldcf.crawler.system.DocumentDeleteStufferThread.run(DocumentDeleteStufferThread.java:108)
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server
>> kemp-formation-solr.citya.local/192.168.37.107:2181. Will not attempt to
>> authenticate using SASL (unknown error)
>>
>> agents process ran out of memory - shutting down
>>
>> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27763ms for sessionid 0x100000050ae004a
>>
>> [Thread-7574-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 27763ms for sessionid 0x100000050ae004a, closing socket
>> connection and attempting reconnect
>>
>> [zkCallback-3-thread-7] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-3-thread-7] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-31551-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard
>> from server in 28316ms for sessionid 0x100000050ae004b, closing socket
>> connection and attempting reconnect
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Socket connection established to
>> kemp-formation-solr.citya.local/192.168.37.107:2181, initiating session
>>
>> [zkCallback-11-thread-5] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Disconnected type:None path:null path: null type: None
>>
>> [zkCallback-11-thread-5] WARN
>> org.apache.solr.common.cloud.ConnectionManager - zkClient has disconnected
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0xff00000201970043 has expired
>>
>> [Thread-7573-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0xff00000201970043 has expired, closing socket connection
>>
>> [Thread-7573-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0xff00000201970043
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@53181a58 name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Expired type:None path:null path: null type: None
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
>> session was expired. Attempting to reconnect to recover relationship with
>> ZooKeeper...
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] WARN
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0x100000050ae0049 has expired
>>
>> [Thread-5234-SendThread(kemp-formation-solr.citya.local:2181)] INFO
>> org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service,
>> session 0x100000050ae0049 has expired, closing socket connection
>>
>> [zkCallback-11-thread-2] WARN
>> org.apache.solr.common.cloud.DefaultConnectionStrategy - Connection expired
>> - starting a new one...
>>
>> [zkCallback-11-thread-2] INFO org.apache.zookeeper.ZooKeeper - Initiating
>> client connection, connectString=kemp-formation-solr:2181
>> sessionTimeout=60000
>> watcher=org.apache.solr.common.cloud.ConnectionManager@53181a58
>>
>> [Thread-5234-EventThread] INFO org.apache.zookeeper.ClientCnxn -
>> EventThread shut down for session: 0x100000050ae0049
>>
>> [zkCallback-3-thread-4] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Watcher
>> org.apache.solr.common.cloud.ConnectionManager@7a5c701e name:
>> ZooKeeperConnection Watcher:kemp-formation-solr:2181 got event WatchedEvent
>> state:Expired type:None path:null path: null type: None
>>
>> [zkCallback-3-thread-4] WARN
>> org.apache.solr.common.cloud.ConnectionManager - Our previous ZooKeeper
>> session was expired. Attempting to r
>>
>>

Reply via email to