The web connector, though, does not filter any cookies. It takes them all
-- whatever cookies HttpClient is storing at that point. So you should see
all the cookies in the database table, regardless of their site affinity,
unless HttpClient is refusing to accept a cookie for security reasons.
I agree, but the fact is that if my "login sequence" defines a login
credential for domain "Z.com" and the crawler reaches "Y.Z.com" or "
X.Y.Z.com", none of the sub-sites receives that cookie, I need to write
same cookie for every sub-domain, that solves the situation (and
thankfully is a
The "cleaning up" phase deletes the documents in the target index (where
your outputconnectors point). That takes more time.
Karl
On Wed, Jul 25, 2018 at 1:43 PM msaunier wrote:
> If I delete a job on ManifoldCF, jobs pass in « Cleaning Up » status.
>
>
>
> « Processed » document are delete
If I delete a job on ManifoldCF, jobs pass in « Cleaning Up » status.
« Processed » document are delete very fast
« Active » documents too.
But « Documents » on the interface, it’s very slow to delete every lines.
ManifoldCF delete Documents 100 by 100.
Maxence,
De : Karl
I'm sorry, I don't understand your question?
Karl
On Wed, Jul 25, 2018 at 12:53 PM msaunier wrote:
> Hi Karl,
>
>
>
> Can I configure ManifoldCF to cleaning up faster ? I think, ManifoldCF
> Clean 100 by 100 by default.
>
>
>
> Maxence,
>
>
>
You should not need to fill the database by hand. Your login sequence
should include whatever redirection etc is used to set the cookies though.
Karl
On Wed, Jul 25, 2018 at 1:06 PM Gustavo Beneitez
wrote:
> Hi again,
>
> Thanks Karl, I was able of doing that after defining some "login
>
It looks like you are still running out of memory. I would love to know
what document it was that doing that. I suspect it is very large already,
and for some reason it cannot be streamed.
Karl
On Wed, Jul 25, 2018 at 1:13 PM Karl Wright wrote:
> Hi Maxence,
>
> The second exception is
Hi Maxence,
The second exception is occurring because processing is still occurring
while the JVM is shutting down; it can be ignored.
Karl
On Wed, Jul 25, 2018 at 1:01 PM msaunier wrote:
> Hi Karl,
>
>
>
> I have add the snapshot and I’m spam with this error :
>
>
>
> FATAL
That's what I was afraid of. The new poi jars have dependencies we haven't
accounted for yet.
Can you download apache-commons-compress jar (latest version should be OK)
and also put that in connector-common-lib? Thanks!!
Karl
On Wed, Jul 25, 2018 at 1:01 PM msaunier wrote:
> Hi Karl,
>
>
>
Hi again,
Thanks Karl, I was able of doing that after defining some "login sequence",
but also after filling database (cookiedata table) with certain values due
to "domain constrictions".
Before every web call, I suspect Manifold only takes cookies from URL exact
subdomain (i.e. x.y.z.com), so if
Hi Karl,
Can I configure ManifoldCF to cleaning up faster ? I think, ManifoldCF Clean
100 by 100 by default.
Maxence,
Out of memory errors are fatal, I'm afraid, because they corrupt not only
the document in question but all others being processed at the same time.
So those cannot be ignored.
Tika should ignore documents that it cannot process, however, and that is a
great enhancement request for them.
Karl
Hi Maxence,
Tomorrow (7/26) the POI project will be delivering a nightly build which
should repair the Class Not Found exceptions. You will need to download it
here:
https://builds.apache.org/view/P/view/POI/job/POI-DSL-1.8/lastSuccessfulBuild/artifact/build/dist/
... and replace all poi jars
13 matches
Mail list logo