date:20180726

Re: UNCHECKED Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright

The way it works in the JCIFS connector is that files that aren't within the specification are removed from the list of files being processed. If a file is already being processed, however, it is just retried. So changing this property to make an out-of-memory condition go away is not going to

Re: UNCHECKED Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright

How are you limiting content size? Is this in the repository connection, or in an Allowed Documents transformation connection? Karl On Thu, Jul 26, 2018 at 10:58 AM msaunier wrote: > I have limit to 20Mb / document and I have again an out of memory java. > > > > > > > > *De :* Karl Wright

Re: UNCHECKED Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright

I believe there's also a content length tab in the Windows Share connector, if you're using that. Karl On Thu, Jul 26, 2018 at 10:19 AM Karl Wright wrote: > The ContentLimiter truncates documents. That's not what you want. > > Use the Allowed Documents transformer. > > Karl > > > On Thu, Jul

Re: UNCHECKED Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright

The ContentLimiter truncates documents. That's not what you want. Use the Allowed Documents transformer. Karl On Thu, Jul 26, 2018 at 10:06 AM msaunier wrote: > I have add a Content limiter transformation before Tika extractor. It’s > very very slow now. It’s normal? > > > > Maxence, > > >

Re: Solr connection, max connections and CPU

2018-07-26 Thread Karl Wright

Hi Mario, There is no connection between the number of CPUs and the number output connections. You pick the maximum number of output connections based on the number of listening threads that you can use at the same time in Solr. Karl On Thu, Jul 26, 2018 at 9:22 AM Bisonti Mario wrote: >

Solr connection, max connections and CPU

2018-07-26 Thread Bisonti Mario

Hallo, I setup solr connection in the "Output connections" of Manifold I don't understand if there is a relation between "Max Connections" and the number of CPUs in the host. Could you help me ti understand it? Thanks a lot Mario

Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright

Hi Maxence, I am wondering whether you moved any jars from dist/connector-common-lib to dist/lib? If you did this, you will mess up the ability of any of the Tika jars to find their dependencies. This also explains why commons-compress cannot be found; it's in connector-common-lib. It sounds

Re: Out of memory, one file bug i think

2018-07-26 Thread Karl Wright

Hi Maxence, The following error: >> FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed: org/apache/poi/POIXMLTextExtractor java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor at

Re: web crawler not sharing cookies

2018-07-26 Thread Karl Wright

Here's the documentation from HttpClient on the various cookie policies. You're probably going to need to read some of the RFCs to see which policy you want. I will wait for you to get back to me with a recommendation before taking any action in the MCF codebase. Thanks!

Re: web crawler not sharing cookies

2018-07-26 Thread Karl Wright

Ok, so the database for your site crawl contains both z.com and x.y.z.com cookies? And your site pages from domain a.y.z.com receive no cookies at all when fetched? Is that a correct description of the situation? Please verify that the a.y.z.com pages are part of the protected part of your

Re: web crawler not sharing cookies

2018-07-26 Thread Gustavo Beneitez

Hi, database may contain Z.com and X.Y.Z.com if created automatically through a JSP, but not the intermediate one Y.Z.com. if the crawler decides to go to A.Y.Z.com and looking to database Z.com is present, it still doesn't work (it should since A.Y.Z is a sub-domain in Z). Only doing that

Re: UNCHECKED Re: Out of memory, one file bug i think

Re: UNCHECKED Re: Out of memory, one file bug i think

Re: UNCHECKED Re: Out of memory, one file bug i think

Re: UNCHECKED Re: Out of memory, one file bug i think

Re: Solr connection, max connections and CPU

Solr connection, max connections and CPU

Re: Out of memory, one file bug i think

Re: Out of memory, one file bug i think

Re: web crawler not sharing cookies

Re: web crawler not sharing cookies

Re: web crawler not sharing cookies

11 matches

Site Navigation

Mail list logo

Footer information