.4,10+ or 11+?
>
> Thanks,
> Cihad Guzel
>
>
> Karl Wright , 11 Şub 2019 Pzt, 04:01 tarihinde şunu
> yazdı:
>
>> No, it is not normal. I expect that the MySQL transaction issues are
>> causing lots of problems.
>>
>> Karl
>>
>>
>> On Sun, F
.xml) with the line :
> value="500" />
>
> Is there an instruction that allows to disable the reindex requested by
> manifoldcf
>
> thanks
>
> Daniel
>
>
> Le 08/02/2019 à 16:00, > Karl Wright (par Internet, dépôt
> user-return-5674-daniel.li
What database is this?
Basically, the "unexpected job status" means that the framework found
something that should not have been possible, if the database had been
properly enforcing ACID transactional constraints. Is this MySQL? Because
if so it's known to have this problem.
It also looks like
ter as the sharepoint
> servers. Currently they are in different DCs with dedicated MPLS
> connectivity.
>
> Thanks,
> Gaurav
>
> On Sat, Feb 9, 2019 at 3:03 AM Karl Wright wrote:
>
>> The problem is not the speed of Manifold, but rather the work it has to
>> do an
vacuum once daily.
>
> Would switching to a multi process configuration with manifoldcf running
> on two servers give a boost.
>
> Thanks,
> Gaurav
>
> On Saturday, February 9, 2019, Karl Wright wrote:
>
>> It does the minimum necessary. That means it can't do it in le
er of docs that actually change in a 30 min period won't be more than
> 200.
>
> Being able to capture adds and updates in 30 minutes is a key business
> requirement.
>
> Thanks,
> Gaurav
>
> On Friday, February 8, 2019, Karl Wright wrote:
>
>> Hi Guarav,
>>
&g
Hi Guarav,
The right way to do this is to schedule "minimal" crawls every 15 minutes
(which will process only the minimum needed to deal with adds and updates),
and periodically perform "full" crawls (which will also include deletions).
Thanks,
Karl
On Fri, Feb 8, 2019 at 10:11 AM Gaurav G
Hello,
(1) What database are you using for this? Some databases require
maintenance periodically or have other heavy usage constraints.
(2) Every time a query takes more than an minute to execute, it is logged,
along with the query plan. You need to look at the manifoldcf log to see
which
The only "old data" kept by MCF is the history information. By default
it's expunged after 30 days. You can shorten the amount of time it's kept
around though by setting a properties.xml parameter (need to refer to the
"how-to-build-and-deploy" page for details).
Karl
On Fri, Feb 8, 2019 at
Did you try 'vacuum full'?
Karl
On Fri, Jan 25, 2019 at 3:47 AM Bisonti Mario
wrote:
> Hallo.
>
> I use MCF 2.12 and postgresql 9.3.25 Solr 7.6 Tika 1.19 on Ubuntu Server
> 18.04
>
>
>
> Weekly I scheduled by crontab for the user postgres :
>
> 15 8 * * Sun vacuumdb --all --analyze
>
> 20 10
The latest (2.12) version of MCF fixes this problem by working around it.
Karl
On Mon, Jan 21, 2019 at 5:12 AM Erlend Garåsen
wrote:
>
> I have encountered the same problem Karl reported in the following ticket:
> https://issues.apache.org/jira/browse/SOLR-12798
>
> Since the ticket is
Hi,
HSQLDB is actually reasonably fast, but it has other problems, namely that
it stores whole DB tables in memory so if your crawl is large enough it
will run out.
The reason for Documentum connector slowness is almost always poor
Documentum performance, and has nothing to do with MCF itself.
The output format did change, and the reason was because the "syntactic
sugar" format would not preserve ordering, so that if you output and
re-input, you'd lose information.
The more complex form is being used only where there is a possibility of
ordering confusion. It was always accepted as
This is a serious fatal error of some kind and we need a complete stack
trace to address it. The JVM stops giving complete stack traces after they
repeat for a certain number of times, so you will need go back far enough
in the log to find where a complete trace was dumped.
Thanks,
Karl
On
Please do create a ticket with a patch. I'm extremely curious.
Depending on what you're proposing, I think a valid approach might need to
be to propose appropriate changes to the HttpComponents/HttpClient library.
Karl
On Thu, Jan 3, 2019 at 7:52 AM Erlend Garåsen
wrote:
>
> It works now
"are
> authorized to access the document[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "requested. Either you supplied the wrong[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "credential
On December 20th, we released Apache ManifoldCF 2.12. It is available for
download from the Apache ManifoldCF 2.12 site here:
http://manifoldcf.apache.org . Enjoy!!
Karl
.6]
> at
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
> ~[httpclient-4.5.6.jar:4.5.6]
> at
>
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
> ~[httpclient-4.5.6.jar:4.5.6]
>
Did you import any data directly into new tables?
The schema has changed significantly from 1.7 until now. I doubt very much
you could get away with an import of the old table data, and that could
well cause the effect you're seeing.
Karl
On Wed, Dec 12, 2018 at 11:12 AM Erlend Garåsen
wrote:
This value is transmitted in the "User-Agent" header.
Karl
On Wed, Dec 12, 2018 at 8:53 AM Singh,Jasvinder
wrote:
> Karl - Can you please help with details around user Agent for connector
> instance – is there some configuration to set this value to some custom
> value or what is default
As I've stated before, we see this error a lot from MySQL. It's due to
bugs in MySQL. MySQL does not maintain transaction integrity in some
situations; it may be a JDBC driver issue, not sure.
Karl
On Wed, Dec 12, 2018 at 8:33 AM Sivakoti, Nikhilesh <
nikhilesh.sivak...@capgemini.com> wrote:
at
> org.apache.manifoldcf.ui.i18n.Messages.getBodyJavascriptString(Messages.java:67)
> [mcf-ui-core.jar:?]
> at org.apache.jsp.index_jsp._jspService(index_jsp.java:212) [jsp/:?]
>
>
> Is this can be resolved after adding any resource files or any other
> solution has to be opted?
>
>
Hi James,
ManifoldCF does not currently support Oracle as a back-end database. It
does support crawling Oracle databases, however, via the JDBC Connector.
Is that what you are trying to do?
If it is, then please note the following instructions for the JDBC
connector. Because the driver you are
available.
>
> I see only “Host name” “Port” …
>
>
>
>
>
> *Da:* Karl Wright
> *Inviato:* giovedì 6 dicembre 2018 13:49
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: How to notify mail by SMTP
>
>
>
> Hi Mario, there is an email notification connecto
Hi Mario, there is an email notification connector. Have you tried to
configure that?
On Thu, Dec 6, 2018, 3:50 AM Bisonti Mario Hallo.
>
> I would like to notify by mail the end of a job.
>
> I use an smtp server but I am not able how to configure this.
>
>
>
>
>
> I read
>
at 9:14 AM Karl Wright wrote:
> The dates/times for this page are formatted as follows:
>
> org.apache.manifoldcf.ui.util.Formatter.formatTime(clientTimezone,
> pageContext.getRequest().getLocale(), js.getStartTime());
>
> But the code for formatTime pays no attention to th
I'm sorry, you'll need to provide more details about what exactly you are
running into trouble with.
Specifically, this: " But the current crawler using the SQL queries which
is hard to query under a path. "
Karl
On Fri, Nov 30, 2018 at 4:42 AM Sivakoti, Nikhilesh <
The dates/times for this page are formatted as follows:
org.apache.manifoldcf.ui.util.Formatter.formatTime(clientTimezone,
pageContext.getRequest().getLocale(), js.getStartTime());
But the code for formatTime pays no attention to the preferred format for
the locale:
public static String
start my
> big job.
>
> My job is running from yesterday at 4 p.m. without interruption
>
> It has indexed 261000 docs now.
>
> I suppose that i twill finish in two days.
>
> I will update you.
>
> Thanks a lot!
>
> Mario
>
>
>
>
>
>
>
ecute:
>
> pstree 1369
>
> java───686*[{java}]
>
>
>
> so, 686 process child of the agent.
>
>
>
> Is there any relation about these values 686 and 184 ?
>
>
>
> Thanks.
>
> Mario
>
>
>
>
>
> *Da:* Karl Wright
> *In
)
>
> at
> jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:185)
>
> at
> jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
>
> at
> jdk.hotspot.agent/sun.jvm.hotspot.tools.JInfo.runWithArgs(JInfo.java:139)
>
>
p/jstack_start_agent.log
>
>
>
> but I obtain:
>
> 1233: Unable to open socket file /proc/1233/cwd/.attach_pid1233: target
> process 1233 doesn't respond within 10500ms or HotSpot VM not loaded
>
>
>
> Perhaps isn’t it the right way to obtain a thread dump?
>
>
Another thing you could do is get a thread dump of the agents process.
Karl
On Wed, Nov 28, 2018 at 10:35 AM Karl Wright wrote:
> Can you look into the database jobqueue table and provide a row that
> corresponds to one of these documents?
>
> Thanks,
> Karl
>
>
> On
are': Tika down, retrying:
> Connect to sengvivv01.local.domain:9998 [sengvivv01.local.domain/
> 172.16.1.135] failed: Connection refused (Connection refused)
>
> WARN 2018-11-26T13:18:26,862 (Worker thread '12') - Service interruption
> reported for job 1533797717712 connection '
e"/>
>value="custom_hostname"/>
>
> So, I've added that properties to make it work. Shouldn't hostname,
> dbsuperusername and dbsuperuserpassword be enough?
>
> Kind Regards,
> Furkan KAMACI
>
>
> On Sat, Nov 24, 2018 at 5:40 PM Karl Wr
--
> --
> Database: amarok
> jdbcDriver: com.mysql.jdbc.Driver
> jdbcUrl:
> jdbc:mysql://localhost/amarok?useUnicode=true=utf8
> userName: manifoldcf
> password: local_pg_passwd
> --
>
> So, it doesn't try to connect a host rather than localhost without
>
Hi Nikhilesh,
Where are you seeing these errors? They sound like ElasticSearch errors to
me; it is complaining that an null or empty-string pipeline name is being
specified somehow. Can you tell me what version of ElasticSearch you are
using?
We have outstanding tickets for updating the
Hi Nikita,
Can you be more specific when you say "OpenNLP is not working"? All that
this connector does is integrate OpenNLP as a ManifoldCF transformer. It
uses a specific directory to deliver the models that OpenNLP uses to match
and extract content from documents. Thus, you can provide any
ie the website needs to escape itself the special
> characters otherwise the extraction will not work in MCF, am I right ?
>
> Best regards,
>
> Olivier
>
>
>
> Le 15 nov. 2018 à 12:57, Karl Wright a écrit :
>
> Hi Olivier,
>
> You can create a ticket but I don't h
nk extraction starting
> DEBUG 2018-10-30T11:48:13,553 (Worker thread '36') - WEB: no content
> exclusion rule supplied... returning
> DEBUG 2018-10-30T11:48:13,553 (Worker thread '36') - WEB: Decided to
> ingest 'http://localhost:/testjs/test.html'
> —
> So special characters like th
apply the same your concept , wait 10 sec and retry
> three times , to the 503 error , too?
>
>
>
> So, I would like to try, if, with the modification, I obtain that job end
> correctly instead of failure.
>
>
>
>
>
> Thanks a lot
>
> Mario
>
>
>
t
> (in my case manifoldcf)
>
>
> https://issues.apache.org/jira/browse/TIKA-2776?focusedCommentId=16686620=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16686620
>
>
>
> I am not able to do this…
>
> Is it possible to implement on the MCF so
; QueuedDocument qd = previousDocuments.get(documentIdentifierHash);
> // return null. The problem is here.
> if (qd == null)
> throw new IllegalArgumentException("Unrecognized document
> identifier: '"+documentIdentifier+"'");
> r
Hi,
Have you been modifying the framework code? If so, I really cannot help
you.
If you haven't -- it looks like you've got code that is injecting document
identifiers that are incorrect. But I will need to see a full stack trace
to be sure of that.
Thanks,
Karl
On Mon, Nov 12, 2018 at 4:06
Hi Mario,
The Tika external connector retries for a while before it gives up and
aborts the job. If you can get the Tika server back up within a reasonable
period of time all should be well. But if one specific document *always*
brings down the Tika server, it will be hard to recover from that.
> the simple history for one of these documents; I need to see what happened
> to it last.
>
>
>
> Thanks,
>
> Karl
>
>
>
>
>
> On Tue, Nov 6, 2018 at 7:32 AM Bisonti Mario
> wrote:
>
> My version is 2.11
>
>
>
>
>
>
>
>
&g
ok, can you create a ticket? Also, I'd appreciate it if you can look at
the simple history for one of these documents; I need to see what happened
to it last.
Thanks,
Karl
On Tue, Nov 6, 2018 at 7:32 AM Bisonti Mario
wrote:
> My version is 2.11
>
>
>
>
>
>
>
;
>
> Solr server is ok
>
> Tika server is ok
>
> Agent is ok
>
> Tomcat with ManifoldCF is ok
>
>
>
> I could search if I could to put in info log mode for example Tika servrer
> or Solr.
>
>
>
> Thanks..
>
>
>
>
>
> *Da:* Ka
Yes, I see many docs in the docs queue but they are inactive.
>
>
>
> Infact i see that no more docs are indexed in Solr and I see that job is
> with the same number of docs Active (35012)
>
>
>
>
>
>
>
>
>
> *Da:* Karl Wright
> *Inviato:* martedì 30 otto
The reason the job is "stuck" is because:
' JCIFS: Possibly transient exception detected on attempt 1 while getting
share security: All pipe instances are busy.'
This means that ManifoldCF will retry this document for a while before it
gives up on it. It appears to be stuck but it is not. You
Hi Olivier,
Javascript inclusion in the Web Connector is not evaluated. In fact, no
Javascript is executed at all. Therefore it should not matter what is
included via javascript.
Thanks,
Karl
On Mon, Oct 29, 2018 at 1:39 PM Olivier Tavard <
olivier.tav...@francelabs.com> wrote:
> Hi,
>
>
Never mind, I was able to get it fixed.
Karl
On Wed, Oct 24, 2018 at 10:19 AM Karl Wright wrote:
> I've created CONNECTORS-1551, and attached the patch.
>
> Unfortunately there seems to be some encoding issues with
> common_ja_JP.properties; can you send that one fi
I've created CONNECTORS-1551, and attached the patch.
Unfortunately there seems to be some encoding issues with
common_ja_JP.properties; can you send that one file via email as an
attachment? Thanks!
Karl
On Tue, Oct 23, 2018 at 8:54 PM 白井 隆/ Shirai Takashi
wrote:
> Hi, there.
>
> I've just
need or maybe we have also to increase general log level.
>
> Thanks in advance.
>
>
> El mar., 23 oct. 2018 a las 14:28, Gustavo Beneitez (<
> gustavo.benei...@gmail.com>) escribió:
>
>> Thanks Karl, we are going to make new crawls with that property enable
>>
On Tue, Oct 23, 2018 at 2:34 AM Gustavo Beneitez
wrote:
> I Karl,
>
> MySQL. As per config variables:
> version 5.7.23-log
> version comment MySQL Community Server (GPL)
>
> which file should I enable logging/debugging?
>
> Thanks!
>
> El lun., 22 oct. 2018 a las
Hi Gustavo,
I have seen this error before; it is apparently due to the database failing
to properly gate transactions and behave according to the concurrency model
selected for a transaction. We have a debugging setting you can configure
which logs the needed information so that forensics get
;
>
>
>
> I read other discussion (
> https://lists.apache.org/thread.html/66a3f9780bbcc98e404e25f5a0e56a8a6c007448642c3bc15a366ed2@%3Cuser.manifoldcf.apache.org%3E)
> but I don’t understand if they solved the issue
>
>
>
> ☹
>
>
>
> Thanks a lot.
>
Hi Olivier,
The Repository connector has no knowledge of what the pipeline looks like.
It simply asks the framework whether the mime type, length, etc. is
acceptable to the downstream pipeline. It's the connector's responsibility
to note the reason for the rejection in the simple history, but it
Hi Olivier, it sounds like you are using Zookeeper. Certain properties are
global and are imported into Zookeeper. Other properties are local and
found in each local properties.xml file. The debug properties for logging
is, I believe, global.
Karl
On Thu, Oct 11, 2018 at 8:39 AM Olivier
>
>
> Could be that, unchecking the flag, ManifoldCF doesn’t use the mime types
> specified?
>
>
>
> I am using a snapshot version of ManifoldCF of three monts ago.
>
>
>
>
>
>
>
>
>
> *Da:* Karl Wright
> *Inviato:* giovedì 11 ottobre 2018
ly and the
"use extracting update handler" box is UNCHECKED.
Thanks,
Karl
On Thu, Oct 11, 2018 at 8:16 AM Karl Wright wrote:
> When you uncheck the "use extracting update handler" checkbox, the Solr
> connection only accepts text/plain, and no binary formats. The
see them?
>
>
>
> Perhaps is the “Ignore Tika exception that I don’t know where to set in
> ManifoldCF the problem?
>
>
>
>
>
>
>
>
>
>
>
> *Da:* Karl Wright
> *Inviato:* giovedì 11 ottobre 2018 12:24
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re
r1843343 adds this condition to the list of caught conditions.
In the future it would be better to create a ticket.
Karl
On Tue, Oct 9, 2018 at 3:06 PM Karl Wright wrote:
> I can make it retry then skip if it doesn't succeed in a while.
>
> Karl
>
>
> On Tue, Oct 9, 2018 a
e file several times in a row, gives up
> after several tries and stops the jobs with a message reporting the smb
> Exception encountered.
>
> Thanks for your answer,
> Romaric
>
> So it is indeed a temporary lock, but we can't tell how long it will last.
>
> Le 09/10/
Hi Romaric,
If the error is transient, then the right thing to do is *not* to skip the
file, but to retry later. What currently happens?
Karl
On Tue, Oct 9, 2018 at 10:05 AM Romaric Pighetti <
romaric.pighe...@francelabs.com> wrote:
> Hi Karl,
> Along the lines of this ticket
>
Excellent news!
Thanks for the update.
Karl
On Mon, Oct 8, 2018 at 1:54 PM Susheel Kumar wrote:
> Thank you so much Karl. I was able to crawl the site and index them.
>
> On Wed, Oct 3, 2018 at 3:31 PM Karl Wright wrote:
>
>> Please read the user documentation for the sh
If you want all the documents for a specific job, the query is:
select count(*) from jobqueue where jobid=
Karl
On Mon, Oct 8, 2018 at 4:23 AM Romaric Pighetti <
romaric.pighe...@francelabs.com> wrote:
> Hi Karl,
>
> I am currently facing the need of getting the number of documents
>
;> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
>> "Host: dit.apps.com[\r][\n]"
>> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
>> "Connection: Keep-Alive[\r][\n]"
>> DEBUG 2018-10-03T13:27:17,603
com[\r][\n]"
> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
> "Connection: Keep-Alive[\r][\n]"
> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >>
> "Accept-Encoding: gzip,deflate[\r][\n]"
> DEBUG 2
> %5p %d{ISO8601} (%t) - %m%n
>
>
>
>
>
>
>
>
>
>
> Logger to enable connector debug messages
> ===
> http://log4j.logger.org/>
> org.apache.manifoldcf.connectors" level="DEBUG" additivity=&q
ve
> indefinitely
> DEBUG 2018-09-28T08:37:06,435 (Thread-153948) - Connection released: [id:
> 29][route: {}->http://server1:8080][state: class
> org.apache.solr.client.solrj.impl.HttpSolrClient][total kept alive: 1;
> route allocated: 1 of 1; total allocated: 1 of 1]
>
;
>
> I hope this is the information you require.
>
>
>
> Regards
>
>
>
>
>
> *Damien Collis *
> Team Leader – Systems Integration
> Technology & Innovation Division, Link Group
>
> Level 3, 1A Homebush Bay Drive, Rhodes NSW 2138
> *T*+
Hi Damien,
Can you describe your database setup?
Karl
On Thu, Sep 27, 2018 at 1:50 AM Damien Collis
wrote:
> All,
>
>
>
> I am currently having trouble loading the “Status and Job Management” web
> page. I have set up a new Job but am unable to start it.
>
>
>
> Sometimes the “Status and Job
:34:53.000Z_resolution=300+dots=32_conditions=view+(0x76696577):+36+bytes_description=sRGB+IEC61966-2.1_image_width=3840+pixels=OCR_conditions_description=Reference+Viewing+Condition+in+IEC61966-2.1_height=2160+pixels}{add=[file:/localhost/OCR/HOT%20Balloon%20Trip_Ultra%20HD.jpg
>
> (161268
Hi ManifoldCF Community,
I need one or two concrete examples of solr [INFO] log messages that
include very long metadata (>8192). This is apparently critical for
getting the SolrJ team to be able to understand ManifoldCF's usage of
solr. If you have such examples around, please be sure that the
e anu suggestion, would be really gratful
>
> ronny.hey...@qbere.com
>
>
> aan ik
>
> Op di 31 jul. 2018 om 12:12 schreef Karl Wright :
>
>> Hi Vinay,
>>
>> Dynamic rescan is meant for web-crawling and revisits already crawled
>> documents based on ho
I have only ever tried this with a personal account. I have no idea why a
business account would differ.
Karl
On Fri, Sep 21, 2018 at 8:16 AM douglas...@gmail.com
wrote:
> I forgot to mention that I am using the version 2.10.
>
> On 2018/09/21 12:15:21, douglas...@gmail.com
> wrote:
> >
; DEBUG 2018-09-19T11:29:13,851 (qtp1165791284-402) - Loaded System Directive:
> org.apache.velocity.runtime.directive.Include
> DEBUG 2018-09-19T11:29:13,851 (qtp1165791284-402) - Loaded System Directive:
> org.apache.velocity.runtime.directive.Foreach
> DEBUG 2018-09-19T11:29:13,852 (qtp1165791284-402) - Created '20' parsers.
&
Hi Susheel,
The problem is likely your site path. The actual path looks like it should
be just "/ES", not "/ES/_layouts/15".
Karl
On Wed, Sep 19, 2018 at 9:18 AM Susheel Kumar wrote:
> Hello,
>
> I am new this mailing list and just started using manifold to able to
> index data from our
> <http://www.remcam.net/> Skype: svanschalkwyk
> <https://mail.google.com/mail/u/0/#>
> <http://linkedin.com/in/vanschalkwyk>
>
> On Wed, Sep 5, 2018 at 11:05 AM, Karl Wright wrote:
>
>> I'm already working on the Web Connector. The UI has problems that
>> p
ch Engines
> +1.314.452. <+1+314+452+2896>2896st...@remcam.net http://remcam.net
> <http://www.remcam.net/> Skype: svanschalkwyk
> <https://mail.google.com/mail/u/0/#>
> <http://linkedin.com/in/vanschalkwyk>
>
> On Wed, Sep 5, 2018 at 6:33 AM, Kar
The patch I uploaded doesn't work because the entire tab is broken; looks
like the UI refactoring broke it and it was never reported. Fixing now.
Karl
On Wed, Sep 5, 2018 at 3:57 AM Karl Wright wrote:
> I coded up the web connector feature I think we need. See
> CONNECTORS-1528
2018 at 4:17 PM Karl Wright wrote:
> Hi Steph,
>
> Right, you wouldn't want to touch the framework.
>
> The effect of lower-casing the documentURI parameter in the
> addOrReplaceDocumentWithException method in an output connector would be to
> map multiple, independently-fetc
ttp://remcam.net
>> <http://www.remcam.net/> Skype: svanschalkwyk
>> <https://mail.google.com/mail/u/0/#>
>> <http://linkedin.com/in/vanschalkwyk>
>>
>> On Tue, Sep 4, 2018 at 1:33 PM, Karl Wright wrote:
>>
>>> Let's make sure we
Skype: svanschalkwyk
> <https://mail.google.com/mail/u/0/#>
> <http://linkedin.com/in/vanschalkwyk>
>
> On Tue, Sep 4, 2018 at 12:22 PM, Karl Wright wrote:
>
>> THanks for the update.
>> Lower-casing the ID would be fine except there are some connectors that
>&
Hi Steph, I suspect that Jetty is leaking some resource, and we may need to
upgrade it.
Karl
On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk
wrote:
> Olivier
> By all means.
> The only issue I have seen (totally unrelated) is with Jetty, which has to
> be restarted about once a week.
I'm afraid this is not something we can fix here, since we do not have a
Sharepoint 2013 server setup, and this seems particular to yours
specifically.
The error you are getting looks intermittent too:
>>
This server farm is not available.
<<
Karl
On Mon, Sep 3, 2018 at 6:19 AM Cheng
gesting the
> document.
>
> Please have look on the attachment for the methods which might are the
> problem area.
>
> On Wed, Aug 29, 2018 at 1:44 PM, Karl Wright wrote:
>
>> So the Allowed Document transformer is now working, and your connector is
>> now skipping document
for both the Length and checkLengthIndexable() method is same.
> And the Allowed Document is also working. But main problem is crashing
> down of the service and it displays memory Leakage error every time after
> crawling few set of documents..
>
>
>
> On Tue, Aug 28, 2018
ue is also used in the code and it is
> returning the exact value for document length.
>
> Also garbage collector and disposing for the threads is used.
>
>
>
> On Tue, Aug 28, 2018 at 5:44 PM, Karl Wright wrote:
>
>> I don't see checkLengthIndexable() in this list. You n
LIndexable(fileUrl))
> (!activities.checkMimeTypeIndexable(contentType))
> (!activities.checkDateIndexable(modifiedDate))
>
>
> But this service crashes after crawling approx 2000 documents.
>
> I think there is some other thing hitting it and creating problem.
>
>
>
>
>
;45" is also being checked and as per
> the documentation it is checked for different values.
>
>
> Please suggest the possible problem area and steps to be taken.
>
> On Mon, Aug 20, 2018 at 7:30 PM, Karl Wright wrote:
>
>> Obviously your Allowed Documents filter is som
ifferent.
>
> Consulting the "Simple history" menu option shows that Elastic output
> connector is called
> "08-23-2018 06:27:19.274 Indexation (Elasticsearch 2.4.6)"
> So I guess there is a miss-configuration somewhere...
>
>
>
> El jue., 23 ago. 20
Hi Gustavo,
I take it from your question that you are using the Web Connector?
All connectors create a version string that is used to determine whether
content needs to be reindexed or not. The Web Connector's version string
uses a checksum of the page contents; we found the "last modified"
ly.
>
> I am using in the same sequence. The allowed document is added first and
> then the Tika Transformation.
>
>
>
>
> But nothing runs in that scenario. The job simply ends without returning
> anything in the output.
>
>
>
>
>
>
> On Mon, Aug 20
Hi,
You are running out of memory.
Tika's memory consumption is not well defined so you will need to limit the
size of documents that reach it. This is not the same as limiting the size
of documents *after* Tika extracts them.
The Allowed Documents transformer therefore should be placed in the
Hi Sven,
When MCF is built, two entirely distinct versions of the examples are
created -- a standard version, and a "proprietary" version. The
proprietary version does not in general include any proprietary jars and
leaves connectors that depend on them disabled in the connectors.xml file.
The
Hi Sharnel,
(1) I cannot create a patch unless you create a ticket I can attach it to.
(2) I can easily recognize this kind of corruption and allow MCF to skip
the document, and I've committed that change (r1838171). However,
partially indexing a document that is partially corrupted like this is
ltiprocess-file-example-proprietary/
>
> sudo cp
> /opt/manifoldcf_ok/multiprocess-file-example-proprietary/properties.xml
> /opt/manifoldcf/multiprocess-file-example-proprietary/
>
>
>
> sudo service tomcat start
>
>
>
>
>
> I obtained some warnings in th
gt;
>
>
>
>
>
>
> *Da:* Karl Wright
> *Inviato:* martedì 14 agosto 2018 15:25
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Different time in Simple History Report
>
>
>
> There were a number of files committed.
>
>
>
>
>
> On Tue, Aug
201 - 300 of 1521 matches
Mail list logo