re added to the queue.
>> Hopcounts are stored for each document in the hopcount table. So if you
>> change a hopcount limit, it is quite possible that nothing will change
>> unless documents that are at the previous hopcount limit are re-evaluated.
>> I believe there is no lo
at the previous hopcount limit are re-evaluated.
> I believe there is no logic in ManifoldCF for that at this time, but I'd
> have to review the codebase to be certain of that.
>
> What that means is that you can't increase the hopcount limit and expect
> the next crawl to p
are re-evaluated.
I believe there is no logic in ManifoldCF for that at this time, but I'd
have to review the codebase to be certain of that.
What that means is that you can't increase the hopcount limit and expect
the next crawl to pick up the documents you excluded before with the
hopcount
or (Solr
>> connector)
>> After that, the same pages are still out of scope like the limit has been
>> set to 1, and they are not indexed.
>>
>> I have tried to "Reset seeding" thinking that maybe the pages need to be
>> check again, but still having the sam
t maybe the pages need to be
> check again, but still having the same problem, I don't think the problem
> is with the output, but I have also use the option "Re-index all associated
> documents" and "Remove all associated records" with the same result
> I don't want
See this article:
https://stackoverflow.com/questions/6784463/error-trustanchors-parameter-must-be-non-empty
ManifoldCF web crawler configuration allows you to drop certs into a local
trust store for the connection. You need to either do that (adding
whatever certificate authority cert you
Hi,
Did you find any solution for that or do you have still disabled the
history?
I'm having the same problem, and we are using postgresql as the db.
Regards
On Sun, 29 Jan 2023 at 05:48, Artem Abeleshev
wrote:
> Hi everyone!
>
> We are using ManifoldCF 2.22.1 with multiple nodes in our
But if those are set, and the connection health check passes, then I can't
tell you why Solr is unhappy with your connection. It's clearly working
sometimes. I'd look on the Solr end to figure out whether its rejection is
coming from just one of your instances.
On Wed, Jun 7, 2023 at 7:49 AM
The Solr output connection configuration contains all credentials that are
sent to Solr. If those aren't set Solr won't get them.
Karl
On Wed, Jun 7, 2023 at 7:23 AM Marisol Redondo <
marisol.redondo.gar...@gmail.com> wrote:
> Hi,
>
> We are using Solr 8 with basic authentication, and when
ares” works for near 18
> hours.
>
> My document numebr a little bit of 1 million.
>
>
>
> If I check the documents scan from MifoldCF I see, for example:
>
>
>
> It seems that re work on the document every day even if it hadn’t been
> modified.
>
> So,
Thanks a lot for your kind and elaborate response!
I will do some further investigation on my own towards the documentum.
Best regards,
Rasta
pá 17. 3. 2023 v 12:08 odesílatel Karl Wright napsal:
> It was open-sourced back in 2012 at the same time ManifoldCF was
> open-sourced. It was written
It was open-sourced back in 2012 at the same time ManifoldCF was
open-sourced. It was written by a contractor paid by MetaCarta, who also
paid for the development of ManifoldCF itself (I developed that). It was
spun off as open source when MetaCarta was bought by Nokia who had no
interest in the
Hi Karl, thanks for your answer! Would you be able to point me towards the
author/git branch of the documentum connector?
Best regards, Rasta
čt 16. 3. 2023 v 20:58 odesílatel Karl Wright napsal:
> Hi,
>
> I didn't write the documentum connector initially, so I trust that the
> engineer who did
Hi,
I didn't write the documentum connector initially, so I trust that the
engineer who did knew how to construct the proper DQL. I've not seen any
bugs related to it so it does seem to work.
Karl
On Thu, Mar 16, 2023 at 8:23 AM Rasťa Šíša wrote:
> Hello,
> i would like to ask how does
The shutdown procedure for ManifoldCF involves sending interruptions (or
socket interruptions) to all worker threads. These then put the threads in
the "terminated" state, one by one. So you should only get this if you
shut down the agents process, or try to. The handling for this is correct,
Karl, good day!
Thank you for the hint! It was very useful! Actually, you was right and the
actual problem was about the connection. But I doesn't expect it would be
so dramatic. Here is what I found using some debugging:
First I have found the actual code that was responsible for the deletion
gt; 0x7f051c50a000,0x7f051c60a000] [id=2537470]
>
>
>
> Stack: [0x7f051c50a000,0x7f051c60a000], sp=0x7f051c608080,
> free space=1016k
>
> Native frames: (J=compiled Java code, A=aot compiled Java code,
> j=interpreted, Vv=VM code, C=native code)
>
>
Hi,
2.22 makes no changes to the way document deletions are processed over
probably 10 previous versions of ManifoldCF.
What likely is the case is that the connection to the output for the job
you are cleaning up is down. When that happens, the documents are queued
but the delete worker threads
t;
>
>
>
>
> *Da:* Karl Wright
> *Inviato:* mercoledì 18 gennaio 2023 12:10
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: JCIFS: Possibly transient exception detected on attempt 1
> while getting share security: All pipe instances are busy
>
>
>
> Hi, &qu
Hi, "Possibly transient issue" means that the error will be retried anyway,
according to a schedule. There should not need to be any requirement to
shut down the agents process and restart.
Karl
On Wed, Jan 18, 2023 at 5:08 AM Bisonti Mario
wrote:
> Hi.
>
> Often, I obtain the error:
>
> WARN
Hi Karl,
I agree. BTW, Artem, the colleague, finally succeeded to subscribe. He
tried to subscribe some more times before opening JIRA ticket in
INFRA, and he finally got some responses from the ML system. Maybe
they restarted the system or did something else.
Thanks!
Koji
2023年1月10日(火) 20:17
Hmm - I haven't heard of difficulties like this before. The mail manager
is used apache-wide; if it doesn't work the best thing to do would be to
create an infra ticket in JIRA.
Karl
On Tue, Jan 10, 2023 at 3:50 AM Koji Sekiguchi
wrote:
> Hi Karl, everyone!
>
> I'm writing to the moderator
The internals of ManifoldCF will handle this fine if you are sure to set
the encoding of your database to be UTF-8. However, I don't know about the
JCIFS library, and whether there might be a restriction on characters in
that code base. I think you'd have to just try it and see, frankly.
Karl
Can anybody provide any clue on this. Would be of great help
On Thu, Dec 22, 2022 at 5:33 PM ritika jain
wrote:
> Hi all,
>
> I am using Manifoldcf 2.21 version with Windows shares connector and
> Output as Elastic.
> I am facing this error while clicking "List all jobs", Manifoldcf, jobs
>
Hi Ronny,
Unsubscribing is self-service. Please follow here,
https://manifoldcf.apache.org/en_US/mail.html
On 22 Oct 2022 Sat at 08:55 Ronny Heylen wrote:
> Hi,
> Please unscribe me from these emails, I don't work anymore.
>
> Regards,
>
> Ronny
>
You will need to contact the current maintainers of the Jcifs library to
get answers to these questions.
Karl
On Mon, Aug 22, 2022 at 3:27 AM ritika jain
wrote:
> Hi All,
>
> I have a Windows shared job to crawl files from samba server, it's a huge
> job to crawl documents in millions(about
Remember, there is already a "forget" button on the output connection,
which will remove everything associated with the connection. It's meant to
be used when the output index has been reset and is empty. I'm not sure
what you'd do different functionally.
Karl
On Tue, Jun 14, 2022 at 2:04 AM
+1.
I respect for the design concept of ManifoldCF, but I think force delete options make MCF more
useful for those who use MCF as crawler. Adding force delete options doesn't change default
behaviors and it doesn't break back-compatibility.
Koji
On 2022/06/14 14:46, Ricardo Ruiz wrote:
Hi
Hi Karl
We are using ManifoldCF as a crawler more than a synchronizer. We are
thinking of contributing to ManifoldCf by including a force job delete and
force output connector delete, considering of course the things that need
to be deleted with them (BD, etc). Do you think this is possible?
We
Because ManifoldCF is not just a crawler, but a synchonizer, a job
represents and includes a list of documents that have been indexed.
Deleting the job requires deleting the documents that have been indexed
also. It's part of the basic model.
So if you tear down your target output instance and
" repeated service interruption" means that it happens again and again.
For this particular document, the problem is that the error we are seeing
is: "The process cannot access the file because it is being used by another
process."
ManifoldCF assumes that if it retries enough it should be able
We cannot do back patches of older versions of ManifoldCF. There is a new
release which shipped in January that addresses log4j issues. I suggest
updating to that.
Karl
On Tue, Mar 15, 2022 at 8:59 AM ritika jain
wrote:
> Hi,
>
> How manifoldcf uses log4j files in bin
As I've mentioned before, the best way to diagnose problems like this is to
get a thread dump of the agents process. There are many potential reasons
it could occur, ranging from stuck locks to resource starvation. What
locking model are you using?
Karl
On Mon, Jan 31, 2022 at 6:02 AM ritika
ManifoldCF framework and connectors use log4j 2.x to dump information to
the ManifoldCF log file.
Please read the following page:
https://logging.apache.org/log4j/2.x/security.html
Specifically, this part:
'Descripton: Apache Log4j2 <=2.14.1 JNDI features used in configuration,
log messages,
Hi Ritika,
For maven check here:
https://github.com/apache/manifoldcf/blob/trunk/pom.xml#L80
For Ant check here:
https://github.com/apache/manifoldcf/blob/trunk/build.xml#L87
Kind Regards,
Furkan KAMACI
On Tue, Dec 14, 2021 at 12:41 PM ritika jain
wrote:
> .Hi All,
>
> How does manifold.cf
The degree of parallelism can be controlled in two ways.
The first way is to set the number of worker threads to something
reasonable. Usually, this is no more than about 2x the number of
processors you have.
The second way is to control the number of connections in your jcifs
connector to keep
SMB exceptions with jcifs in the trace tell us that JCIFS couldn't talk to
your windows share server. That's all we can tell though.
Karl
On Mon, Nov 15, 2021 at 7:24 AM ritika jain
wrote:
> Hi,
>
> Raising the concern above again, to process only 60k of document (when
> clock issue is fixed
Hi,
Raising the concern above again, to process only 60k of document (when
clock issue is fixed too), job process is not progressing , its being stuck
for like days. So had to restart the docker container every time for it to
process.
This time now we are getting this :- Timeout Exception. What
One hour is quite a lot and will wreak havoc on the document queue.
Karl
On Tue, Nov 9, 2021 at 7:08 AM ritika jain wrote:
> I have checked, there is only one hour time difference between docker
> container and docker host
>
> On Tue, Nov 9, 2021 at 4:41 PM Karl Wright wrote:
>
>> If your
I have checked, there is only one hour time difference between docker
container and docker host
On Tue, Nov 9, 2021 at 4:41 PM Karl Wright wrote:
> If your docker image's clock is out of sync badly with the real world,
> then System.currentTimeMillis() may give bogus values, and ManifoldCF uses
If your docker image's clock is out of sync badly with the real world, then
System.currentTimeMillis() may give bogus values, and ManifoldCF uses that
to manage throttling etc. I don't know if that is the correct explanation
but it's the only thing I can think of.
Karl
On Tue, Nov 9, 2021 at
We see errors like this only because MCF is a highly multithreaded
application, and two threads sometimes are able to collide in what they are
doing even though they are transactionally separated. That is because of
bugs in the database software. So if you restart the job it should not
encounter
Is it repeatable? My guess is it is not repeatable.
Karl
On Wed, Oct 27, 2021 at 4:43 AM ritika jain
wrote:
> So , it can be left as it is.. ? because it is preventing job to complete
> and its stopping.
>
> On Tue, Oct 26, 2021 at 8:40 PM Karl Wright wrote:
>
>> That's a database bug. All
So , it can be left as it is.. ? because it is preventing job to complete
and its stopping.
On Tue, Oct 26, 2021 at 8:40 PM Karl Wright wrote:
> That's a database bug. All of our underlying databases have some bugs of
> this kind.
>
> Karl
>
>
> On Tue, Oct 26, 2021 at 9:17 AM ritika jain
>
That's a database bug. All of our underlying databases have some bugs of
this kind.
Karl
On Tue, Oct 26, 2021 at 9:17 AM ritika jain
wrote:
> Hi All,
>
> While using Manifoldcf 2.14 with Web connector and ES connector. After a
> certain time of continuing the job (jobs ingest some documents
The only limit is that the more you add, the slower it gets.
Karl
On Mon, Oct 25, 2021 at 6:06 AM ritika jain
wrote:
> Hi ,
> Is there any limit on the number of paths we can define in job using
> Repository as Window Shares and ES as Output
>
> Thanks
>
The API should really catch this situation. Basically, you are calling a
function that requires an input but you are not providing one. In that
case the API sets the input to "null", and the detailed operation is
called. The detailed operation is not expecting a null input.
This is API piece
Hi,
You say this is a "Tika error". Is this Tika as a stand-alone service? I
do not recognize any ManifoldCF classes whatsoever in this thread dump.
If this is Tika, I suggest contacting the Tika team.
Karl
On Thu, Sep 30, 2021 at 3:02 AM Bisonti Mario
wrote:
> Additional info.
>
>
>
> I
This is something you should contact the Tika project about.
Karl
On Tue, Sep 7, 2021 at 8:46 AM ritika jain wrote:
> Hi All,
>
> I am using tika-core 1.21 and tika-parsers 1.21 jar files as tika
> dependencies in Manifoldcf 2.14 version.
> Getting some issues while parsing *PDF *files. Some
I'm having issues with ManifoldCF losing connection to ZooKeeper. This is
easily repeatable: I just need to leave ManifoldCF running for a few days.
The results are not always "No route to host" as I previously reported --
sometimes its just connect timeouts or other behavior, but the connection
I have a work day today, with limited time.
The UI is what it is; it does not have capabilities beyond what is stated
in the UI and in the manual. It's meant to allow construction of paths
piece by piece, not by full subdirectory at a time.
You can obviously use the API if you want to construct
Can anybody have a clue on this ?
On Fri, Aug 20, 2021 at 12:33 PM ritika jain
wrote:
> Hi All,
>
> I am having a query , is there any way using which we can mention
> subdirectories' path in the file spec of Window shares connector.
>
> Like my requirement is to mention Most top hierarchical
Yes, when you delete a job, the indexed documents associated with that job
are removed from the index.
ManifoldCF is a synchronizer, not a crawler, so when you remove the
synchronization job then if it didn't delete the indexed documents they
would be left dangling.
Karl
On Thu, Aug 12, 2021
Seems to be working now!!! Thanks a lot !!!
On Wed, Aug 11, 2021 at 6:22 PM ritika jain
wrote:
> Hi ,
>
> Yes this works only the difference is when a single file is ingested we
> are having ingested one as C:/Users/Dell/Desktop/abc.txt/.-with a UNWANTED
> slash at end
>
> *The file spec part
Hi ,
Yes this works only the difference is when a single file is ingested we
are having ingested one as C:/Users/Dell/Desktop/abc.txt/.-with a UNWANTED
slash at end
*The file spec part should include the file name.:- *This way I have tried,
I am getting Access denied. Also checked about all the
The "path" attribute is not meant to include terminal file names, only
directories. I'm surprised that this works at all. The file spec part
should include the file name.
Karl
On Wed, Aug 11, 2021 at 2:14 AM ritika jain
wrote:
> *Dynamic Job *
>
>
*Dynamic Job *
{"job":{"_children_":[{"_type_":"id","_value_":"1628595470228"},{"_type_":"description","_value_":"DEMo
TEMP
I am sorry, but I'm having trouble understanding how exactly you are
configuring the JCIFS connector in these two cases.Can you view the job
in each case and provide cut-and-paste of the view?
Karl
On Tue, Aug 10, 2021 at 9:09 AM ritika jain
wrote:
> Hi All,
>
> I am using Window shares
I had a quick look at Jira. I think there is already a ticket which
covers the reqirement of using a sitemap.xml file which is referenced by
robots.txt
https://issues.apache.org/jira/browse/CONNECTORS-1657
I'll update this ticket with infos from the sitemap protocol page
If you wish to add a feature request, please create a CONNECTORS ticket
that describes the functionality you think the connector should have.
Karl
On Wed, Jul 7, 2021 at 9:29 AM h0444xk8 wrote:
> Hi,
>
> yes, that seems to be the reason. In:
>
>
>
Hi,
yes, that seems to be the reason. In:
https://github.com/apache/manifoldcf/blob/030703a7f2bbfbb5a8dcde529b29ead830a7f60c/connectors/rss/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/rss/Robots.java
there is the following code sequence:
else if
The robots parsing does not recognize the "sitemaps" line, which was likely
not in the spec for robots when this connector was written.
Karl
On Wed, Jul 7, 2021 at 3:31 AM h0444xk8 wrote:
> Hi,
>
> I have a general question. Is the Web connector supporting sitemap files
> referenced by the
302 does get recognized as a redirection, yes
On Fri, May 28, 2021 at 5:07 AM ritika jain
wrote:
> Is the process the same when fetch/process status code returned is 302 ?
When running a job with web crawler and ES output connector
>>>
> can anybody have a clue about this
>
>
> Is the process the same when fetch/process status code returned is 302 ?
>>> When running a job with web crawler and ES output connector
>>>
>>
can anybody have a clue about this
Is the process the same when fetch/process status code returned is 302 ?
When running a job with web crawler and ES output connector
On Wed, May 19, 2021 at 10:35 PM Karl Wright wrote:
> ManifoldCF reads all the URLs on its queue.
> If it's a 301, it detects this and pushes the new URL onto
ManifoldCF reads all the URLs on its queue.
If it's a 301, it detects this and pushes the new URL onto the document
queue.
When it gets to the new URL, it processes it like any other.
Karl
On Wed, May 19, 2021 at 8:32 AM ritika jain
wrote:
> Hi
>
> I want to understand the process of "How
"crashing the manifold" is probably running out of memory, and it is
probably due to having too many worker threads and insufficient memory, not
the error you found.
If that error caused a problem, it would simply abort the job, not "crash"
Manifold.
Karl
On Fri, May 14, 2021 at 4:10 AM ritika
It retries for 3 times and it usually crashes the manifoldcf.
Similar ticket i observed
https://issues.apache.org/jira/browse/CONNECTORS-1633, does manifoldcf
itself capable of skipping the file that cause issue instead of aborting
the job or crashing manifold
On Fri, May 14, 2021 at 1:34 PM
'
*JCIFS: Possibly transient exception detected on attempt 1 while getting
share security'Yes, it is going to retry.*
*Karl*
On Fri, May 14, 2021 at 1:45 AM ritika jain
wrote:
> Hi,
> I am using Windows shares connector in manifoldcf 2.14 and ElasticSearch
> connector as Output connector and
This used to work fine, but I suspect that when SSH was declared unsafe, it
was disabled, and now only TLS will work.
Karl
On Tue, May 11, 2021 at 12:13 PM wrote:
> Hello,
>
>
>
> I am trying to use an email notification connector but without success.
> When the connector tries to send an
Hi,
There was a book written but never published on ManifoldCF and how to write
connectors. It's meant to be extended in that way. The PDFs for the book
are available for free online, and they are linked through the manifoldcf
web site.
Karl
On Mon, Apr 12, 2021 at 8:49 AM koch wrote:
> Hi
Hi Ritika,
There is no deletion process. Deletion takes place when a job is run in a
mode where deletion is possible (there are some where it is not). The way
it takes place depends on the kind of repository connector (what model it
declares itself to use).
For the most common kinds of
Hi, Karl.
Karl Wright wrote:
>I have now updated (I think) everything that this patch actually has, save
>for one deprecated field substitution (the "types" field is now the "doc_"
I've confirmed the updated sources via git://git.apache.org/manifoldcf.git,
to find some problem in the following
Hi, Karl.
Karl Wrightさんは書きました:
>field). I would like to know more about this. Does the "types" field no
>longer work? Should we send both, in order to be sure that the connector
>works with most versions of ElasticSearch? Please help clarify so that I
>can finish this off.
The "types" field
I have now updated (I think) everything that this patch actually has, save
for one deprecated field substitution (the "types" field is now the "doc_"
field). I would like to know more about this. Does the "types" field no
longer work? Should we send both, in order to be sure that the connector
Hi,
Please see https://issues.apache.org/jira/browse/CONNECTORS-1666 .
I did not commit the patches as given because I felt that the fix was a
relatively narrow one and it could be implemented with no user
involvement. Adding control for the user was therefore beyond the scope of
the repair.
Thanks for the information. I'll see what I can do.
Karl
On Thu, Mar 18, 2021 at 7:23 PM Shirai Takashi/ 白井隆
wrote:
> Hi, Karl.
>
> Karl Wright wrote:
> >Hi - I'm still waiting for this patch to be attached to a ticket. That is
> >the only way I believe we're allowed to accept it legally.
>
Hi, Karl.
Karl Wright wrote:
>Hi - I'm still waiting for this patch to be attached to a ticket. That is
>the only way I believe we're allowed to accept it legally.
Do you ask me to send the patch to the JIRA ticket?
I can't access the JIRA because of our firewall.
Sorry.
What can I do without
Hi - I'm still waiting for this patch to be attached to a ticket. That is
the only way I believe we're allowed to accept it legally.
Karl
On Thu, Mar 4, 2021 at 7:16 PM Shirai Takashi/ 白井隆
wrote:
> Hi, Karl.
>
> Karl Wrightさんは書きました:
> >I agree it is unlikely that the JDK will lose support
Hi, Karl.
Karl Wrightさんは書きました:
>I agree it is unlikely that the JDK will lose support for SHA-1 because it
>is used commonly, as is MD5. So please feel free to use it.
I know.
I think that SHA-1 is better on the whole.
I don't care that apache-manifoldcf-elastic-id-2.patch.gz is discarded.
I agree it is unlikely that the JDK will lose support for SHA-1 because it
is used commonly, as is MD5. So please feel free to use it.
Karl
On Wed, Mar 3, 2021 at 7:54 PM Shirai Takashi/ 白井隆
wrote:
> Hi, Horn.
>
> Jörn Franke wrote:
> >Makes sense
>
> I don't think that it's easy.
>
>
> >>>
Hi, There.
Shirai Takashi/ 白井隆 wrote:
>I can use SHA-256 with Elasticsearch connector.
I've prepared the patch to support SHA-256.
It minimizes changes, to avoid the global effects.
It seems unbeautiful to include the try-catch clause.
I can't decide which is better.
Nintendo, Co., Ltd.
Hi, Horn.
Jörn Franke wrote:
>Makes sense
I don't think that it's easy.
>>> Maybe use SHA-256 or later. SHA-1 is obsolete and one never knows when it
>>> will be removed from JDK.
I also know SHA-1 is dangerous.
Someone can generate the string which is hashed into the same SHA-1 to pretend
Hi, Karl.
Karl Wright wrote:
>Backwards compatibility means that we very likely have to
>use the hash approach, and not use the decoding approach.
Do you object to the decoding?
It may be useless for the users with the alphabetical language.
But it's useful for the users with the multibyte
Hi - this is very helpful. I would like you to officially create a ticket
in Jira: https://issues.apache.org/jira , project "CONNECTORS", and attach
these patches. Backwards compatibility means that we very likely have to
use the hash approach, and not use the decoding approach.
Thanks,
Karl
current ManifoldCF use SHA-1?
> This case may have to use SHA-1 depending on the reason.
> If the reason is only the compatibility,
> I can re-design the method ManifoldCF.hash(),
> to add the argument which indicates the algorism.
>
>
> Nintendo, Co., Ltd.
> Product Technology Dept.
> Takashi SHIRAI
> PHONE: +81-75-662-9600
> mailto:shi...@nintendo.co.jp
lass,
the default algorism is updated entirely.
I've just followed the standard of ManifoldCF.
I also think SHA-256 or later is better.
Why the current ManifoldCF use SHA-1?
This case may have to use SHA-1 depending on the reason.
If the reason is only the compatibility,
I can re-design the method
Maybe use SHA-256 or later. SHA-1 is obsolete and one never knows when it will
be removed from JDK.
> Am 02.03.2021 um 04:10 schrieb Shirai Takashi/ 白井隆 :
>
> Hi, there.
>
> I've found another trouble in Elasticsearch connector.
> Elasticsearch output connector use the URI string as ID.
>
Hi, there.
Shirai Takashi wrote:
>ManifoldCF can use mapping-attachments plugin for Elasticsearch connector.
>But it is obsolete, to recommend ingest-attachment plugin instead.
>I try to support this plugin with the attached patch.
Sorry, I have some mistake with this patch.
Please replace it
File synchronization is still supported but is deprecated. We recommend
zookeeper synchronization unless you have a very good reason not to.
Karl
On Wed, Feb 17, 2021 at 12:26 PM Ananth Peddinti wrote:
> Hello Team ,
>
>
> I would like to know if someone has already done multi-process model
The internal Tika is not memory bounded; some transformations stream, but
others put everything into memory.
You can try using the external tika, with a tika instance you run
separately, and that would likely help. But you may need to give it lots
of memory too.
Karl
On Wed, Feb 17, 2021 at
Hi Karl,
I am using Elastic search as an output connector and yes using an internal
Tika extracter, not using solr output connection.
Also Elastic search server is on hosted on different server with huge
memory allocation.
On Tue, Feb 16, 2021 at 7:29 PM Karl Wright wrote:
> Hi, do you mean
Hi, do you mean content limiter length of 100?
I assume you are using the internal Tika transformer? Are you combining
this with a Solr output connection that is not using the extract handler?
By "manifold crashes" I assume you actually mean it runs out of memory.
The "long running query"
This parameter is in bytes.
Karl
On Mon, Feb 15, 2021 at 9:03 AM ritika jain
wrote:
> Hi Users,
>
> Can anybody tell me if this can be filled as bytes or kilobytes here.
>
> The "Content Length tab looks like this:
>
>
> [image: Windows Share Job, Content Length tab]
>
> Values are to be
Hi,
Usually the reason a job doesn't complete is because a document is retrying
indefinitely. You can see what's going on by looking at the Simple History
job report, or, if you prefer, tailing the manifoldcf log.
Other times a job won't complete because somebody shut down the agents
process.
I have a job that is stuck in terminating for 12 hrs. it is a small test job
and I am wondering if there is a way to fix this? The job ran once and
completed 175k documents. I modified the query to the job and reseeded. The job
was modified to process a smaller document set. I assume reseeding
gt;
>
>
>
> --
>
> Michael Cizmar
>
>
>
> *From:* ritika jain
> *Sent:* Thursday, December 31, 2020 7:33 AM
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: Indexation Not OK
>
>
>
> Elastic search output connector with some custom changes for s
if that traffic is actually going
to Elastic search.
Karl – I believe Ritika said Elastic.
--
Michael Cizmar
From: ritika jain
Sent: Thursday, December 31, 2020 7:33 AM
To: user@manifoldcf.apache.org
Subject: Re: Indexation Not OK
Elastic search output connector with some custom changes for some fields
Sorry, I couldn't quite understand everything in your email, but it sounds
like the problem is in the ES connection. It is possible that ES expires
your connection and the indexing fails after that happens. If that is
happening, however, I would expect to see a much more detailed error
message
Elastic search output connector with some custom changes for some fields
On Thursday, December 31, 2020, Karl Wright wrote:
> Hi,
> Can you let us know what you are using for the output connector?
> Thanks,
> Karl
>
>
> On Thu, Dec 31, 2020 at 8:24 AM ritika jain
> wrote:
>
>> Hi,
>>
>> I am
1 - 100 of 2608 matches
Mail list logo