The way it works in the JCIFS connector is that files that aren't within
the specification are removed from the list of files being processed. If a
file is already being processed, however, it is just retried. So changing
this property to make an out-of-memory condition go away is not going to
How are you limiting content size? Is this in the repository connection,
or in an Allowed Documents transformation connection?
Karl
On Thu, Jul 26, 2018 at 10:58 AM msaunier wrote:
> I have limit to 20Mb / document and I have again an out of memory java.
>
>
>
>
>
>
>
> *De :* Karl Wright
I believe there's also a content length tab in the Windows Share connector,
if you're using that.
Karl
On Thu, Jul 26, 2018 at 10:19 AM Karl Wright wrote:
> The ContentLimiter truncates documents. That's not what you want.
>
> Use the Allowed Documents transformer.
>
> Karl
>
>
> On Thu, Jul
The ContentLimiter truncates documents. That's not what you want.
Use the Allowed Documents transformer.
Karl
On Thu, Jul 26, 2018 at 10:06 AM msaunier wrote:
> I have add a Content limiter transformation before Tika extractor. It’s
> very very slow now. It’s normal?
>
>
>
> Maxence,
>
>
>
Hi Mario,
There is no connection between the number of CPUs and the number output
connections. You pick the maximum number of output connections based on
the number of listening threads that you can use at the same time in Solr.
Karl
On Thu, Jul 26, 2018 at 9:22 AM Bisonti Mario
wrote:
>
Hallo, I setup solr connection in the "Output connections" of Manifold
I don't understand if there is a relation between "Max Connections" and the
number of CPUs in the host.
Could you help me ti understand it?
Thanks a lot
Mario
Hi Maxence,
I am wondering whether you moved any jars from dist/connector-common-lib to
dist/lib? If you did this, you will mess up the ability of any of the Tika
jars to find their dependencies. This also explains why commons-compress
cannot be found; it's in connector-common-lib. It sounds
Hi Maxence,
The following error:
>>
FATAL 2018-07-26T11:30:32,220 (Worker thread '28') - Error tossed:
org/apache/poi/POIXMLTextExtractor
java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTextExtractor
at
Here's the documentation from HttpClient on the various cookie policies.
You're probably going to need to read some of the RFCs to see which policy
you want. I will wait for you to get back to me with a recommendation
before taking any action in the MCF codebase. Thanks!
Ok, so the database for your site crawl contains both z.com and x.y.z.com
cookies? And your site pages from domain a.y.z.com receive no cookies at
all when fetched? Is that a correct description of the situation?
Please verify that the a.y.z.com pages are part of the protected part of
your
Hi,
database may contain Z.com and X.Y.Z.com if created automatically through a
JSP, but not the intermediate one Y.Z.com.
if the crawler decides to go to A.Y.Z.com and looking to database Z.com is
present, it still doesn't work (it should since A.Y.Z is a sub-domain in Z).
Only doing that
11 matches
Mail list logo