Re: Job hang in aborting state for along time

2019-02-11 Thread Karl Wright
.4,10+ or 11+? > > Thanks, > Cihad Guzel > > > Karl Wright , 11 Şub 2019 Pzt, 04:01 tarihinde şunu > yazdı: > >> No, it is not normal. I expect that the MySQL transaction issues are >> causing lots of problems. >> >> Karl >> >> >> On Sun, F

Re: ManifoldCF + Postgresql - long freeze on job

2019-02-11 Thread Karl Wright
.xml) with the line : > value="500" /> > > Is there an instruction that allows to disable the reindex requested by > manifoldcf > > thanks > > Daniel > > > Le 08/02/2019 à 16:00, > Karl Wright (par Internet, dépôt > user-return-5674-daniel.li

Re: Job hang in aborting state for along time

2019-02-10 Thread Karl Wright
What database is this? Basically, the "unexpected job status" means that the framework found something that should not have been possible, if the database had been properly enforcing ACID transactional constraints. Is this MySQL? Because if so it's known to have this problem. It also looks like

Re: Sharepoint Job - Incremental Crawling

2019-02-09 Thread Karl Wright
ter as the sharepoint > servers. Currently they are in different DCs with dedicated MPLS > connectivity. > > Thanks, > Gaurav > > On Sat, Feb 9, 2019 at 3:03 AM Karl Wright wrote: > >> The problem is not the speed of Manifold, but rather the work it has to >> do an

Re: Sharepoint Job - Incremental Crawling

2019-02-08 Thread Karl Wright
vacuum once daily. > > Would switching to a multi process configuration with manifoldcf running > on two servers give a boost. > > Thanks, > Gaurav > > On Saturday, February 9, 2019, Karl Wright wrote: > >> It does the minimum necessary. That means it can't do it in le

Re: Sharepoint Job - Incremental Crawling

2019-02-08 Thread Karl Wright
er of docs that actually change in a 30 min period won't be more than > 200. > > Being able to capture adds and updates in 30 minutes is a key business > requirement. > > Thanks, > Gaurav > > On Friday, February 8, 2019, Karl Wright wrote: > >> Hi Guarav, >> &g

Re: Sharepoint Job - Incremental Crawling

2019-02-08 Thread Karl Wright
Hi Guarav, The right way to do this is to schedule "minimal" crawls every 15 minutes (which will process only the minimum needed to deal with adds and updates), and periodically perform "full" crawls (which will also include deletions). Thanks, Karl On Fri, Feb 8, 2019 at 10:11 AM Gaurav G

Re: ManifoldCF + Postgresql - long freeze on job

2019-02-08 Thread Karl Wright
Hello, (1) What database are you using for this? Some databases require maintenance periodically or have other heavy usage constraints. (2) Every time a query takes more than an minute to execute, it is logged, along with the query plan. You need to look at the manifoldcf log to see which

Re: Postgres db maintenance

2019-02-08 Thread Karl Wright
The only "old data" kept by MCF is the history information. By default it's expunged after 30 days. You can shorten the amount of time it's kept around though by setting a properties.xml parameter (need to refer to the "how-to-build-and-deploy" page for details). Karl On Fri, Feb 8, 2019 at

Re: Job slower

2019-01-25 Thread Karl Wright
Did you try 'vacuum full'? Karl On Fri, Jan 25, 2019 at 3:47 AM Bisonti Mario wrote: > Hallo. > > I use MCF 2.12 and postgresql 9.3.25 Solr 7.6 Tika 1.19 on Ubuntu Server > 18.04 > > > > Weekly I scheduled by crontab for the user postgres : > > 15 8 * * Sun vacuumdb --all --analyze > > 20 10

Re: Is SOLR-12798 still a blocker for deleting documents?

2019-01-21 Thread Karl Wright
The latest (2.12) version of MCF fixes this problem by working around it. Karl On Mon, Jan 21, 2019 at 5:12 AM Erlend Garåsen wrote: > > I have encountered the same problem Karl reported in the following ticket: > https://issues.apache.org/jira/browse/SOLR-12798 > > Since the ticket is

Re: FW: ManifoldCF Documentum connector slowness

2019-01-17 Thread Karl Wright
Hi, HSQLDB is actually reasonably fast, but it has other problems, namely that it stores whole DB tables in memory so if your crawl is large enough it will run out. The reason for Documentum connector slowness is almost always poor Documentum performance, and has nothing to do with MCF itself.

Re: DidMCF 2.12 change the JSON API response format?

2019-01-17 Thread Karl Wright
The output format did change, and the reason was because the "syntactic sugar" format would not preserve ordering, so that if you output and re-input, you'd lose information. The more complex form is being used only where there is a possibility of ordering confusion. It was always accepted as

Re: Facing Error while executing the job After sometime

2019-01-09 Thread Karl Wright
This is a serious fatal error of some kind and we need a complete stack trace to address it. The JVM stops giving complete stack traces after they repeat for a certain number of times, so you will need go back far enough in the log to find where a complete trace was dumped. Thanks, Karl On

Re: Unexpected job status encountered

2019-01-03 Thread Karl Wright
Please do create a ticket with a patch. I'm extremely curious. Depending on what you're proposing, I think a valid approach might need to be to propose appropriate changes to the HttpComponents/HttpClient library. Karl On Thu, Jan 3, 2019 at 7:52 AM Erlend Garåsen wrote: > > It works now

Re: Unexpected job status encountered

2018-12-27 Thread Karl Wright
"are > authorized to access the document[\n]" > DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 << > "requested. Either you supplied the wrong[\n]" > DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 << > "credential

[ANNOUNCE] Apache ManifoldCF 2.12 has been released

2018-12-23 Thread Karl Wright
On December 20th, we released Apache ManifoldCF 2.12. It is available for download from the Apache ManifoldCF 2.12 site here: http://manifoldcf.apache.org . Enjoy!! Karl

Re: Unexpected job status encountered

2018-12-13 Thread Karl Wright
.6] > at > org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) > ~[httpclient-4.5.6.jar:4.5.6] > at > > org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) > ~[httpclient-4.5.6.jar:4.5.6] >

Re: Unexpected job status encountered

2018-12-12 Thread Karl Wright
Did you import any data directly into new tables? The schema has changed significantly from 1.7 until now. I doubt very much you could get away with an import of the old table data, and that could well cause the effect you're seeing. Karl On Wed, Dec 12, 2018 at 11:12 AM Erlend Garåsen wrote:

Re: userAgent for connector instance info

2018-12-12 Thread Karl Wright
This value is transmitted in the "User-Agent" header. Karl On Wed, Dec 12, 2018 at 8:53 AM Singh,Jasvinder wrote: > Karl - Can you please help with details around user Agent for connector > instance – is there some configuration to set this value to some custom > value or what is default

Re: Unexpected job status : 33 | Alfresco | Manifold

2018-12-12 Thread Karl Wright
As I've stated before, we see this error a lot from MySQL. It's due to bugs in MySQL. MySQL does not maintain transaction integrity in some situations; it may be a JDBC driver issue, not sure. Karl On Wed, Dec 12, 2018 at 8:33 AM Sivakoti, Nikhilesh < nikhilesh.sivak...@capgemini.com> wrote:

Re: Language Detection for the data

2018-12-12 Thread Karl Wright
at > org.apache.manifoldcf.ui.i18n.Messages.getBodyJavascriptString(Messages.java:67) > [mcf-ui-core.jar:?] > at org.apache.jsp.index_jsp._jspService(index_jsp.java:212) [jsp/:?] > > > Is this can be resolved after adding any resource files or any other > solution has to be opted? > >

Re: Problems connecting ManifoldCF 2.11 to Oracle Express

2018-12-07 Thread Karl Wright
Hi James, ManifoldCF does not currently support Oracle as a back-end database. It does support crawling Oracle databases, however, via the JDBC Connector. Is that what you are trying to do? If it is, then please note the following instructions for the JDBC connector. Because the driver you are

Re: How to notify mail by SMTP

2018-12-06 Thread Karl Wright
available. > > I see only “Host name” “Port” … > > > > > > *Da:* Karl Wright > *Inviato:* giovedì 6 dicembre 2018 13:49 > *A:* user@manifoldcf.apache.org > *Oggetto:* Re: How to notify mail by SMTP > > > > Hi Mario, there is an email notification connecto

Re: How to notify mail by SMTP

2018-12-06 Thread Karl Wright
Hi Mario, there is an email notification connector. Have you tried to configure that? On Thu, Dec 6, 2018, 3:50 AM Bisonti Mario Hallo. > > I would like to notify by mail the end of a job. > > I use an smtp server but I am not able how to configure this. > > > > > > I read >

Re: ManifoldCF 2.11, DatTime format in Status and Job Management

2018-11-30 Thread Karl Wright
at 9:14 AM Karl Wright wrote: > The dates/times for this page are formatted as follows: > > org.apache.manifoldcf.ui.util.Formatter.formatTime(clientTimezone, > pageContext.getRequest().getLocale(), js.getStartTime()); > > But the code for formatTime pays no attention to th

Re: Manifold crawler issue | Alfresco | Not able to crawl large set of data

2018-11-30 Thread Karl Wright
I'm sorry, you'll need to provide more details about what exactly you are running into trouble with. Specifically, this: " But the current crawler using the SQL queries which is hard to query under a path. " Karl On Fri, Nov 30, 2018 at 4:42 AM Sivakoti, Nikhilesh <

Re: ManifoldCF 2.11, DatTime format in Status and Job Management

2018-11-30 Thread Karl Wright
The dates/times for this page are formatted as follows: org.apache.manifoldcf.ui.util.Formatter.formatTime(clientTimezone, pageContext.getRequest().getLocale(), js.getStartTime()); But the code for formatTime pays no attention to the preferred format for the locale: public static String

Re: Job stuck without message

2018-11-30 Thread Karl Wright
start my > big job. > > My job is running from yesterday at 4 p.m. without interruption  > > It has indexed 261000 docs now. > > I suppose that i twill finish in two days. > > I will update you. > > Thanks a lot! > > Mario > > > > > > >

Re: Job stuck without message

2018-11-29 Thread Karl Wright
ecute: > > pstree 1369 > > java───686*[{java}] > > > > so, 686 process child of the agent. > > > > Is there any relation about these values 686 and 184 ? > > > > Thanks. > > Mario > > > > > > *Da:* Karl Wright > *In

Re: Job stuck without message

2018-11-29 Thread Karl Wright
) > > at > jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:185) > > at > jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:118) > > at > jdk.hotspot.agent/sun.jvm.hotspot.tools.JInfo.runWithArgs(JInfo.java:139) > >

Re: Job stuck without message

2018-11-28 Thread Karl Wright
p/jstack_start_agent.log > > > > but I obtain: > > 1233: Unable to open socket file /proc/1233/cwd/.attach_pid1233: target > process 1233 doesn't respond within 10500ms or HotSpot VM not loaded > > > > Perhaps isn’t it the right way to obtain a thread dump? > >

Re: Job stuck without message

2018-11-28 Thread Karl Wright
Another thing you could do is get a thread dump of the agents process. Karl On Wed, Nov 28, 2018 at 10:35 AM Karl Wright wrote: > Can you look into the database jobqueue table and provide a row that > corresponds to one of these documents? > > Thanks, > Karl > > > On

Re: Error Job stop after repeatidly interruption

2018-11-26 Thread Karl Wright
are': Tika down, retrying: > Connect to sengvivv01.local.domain:9998 [sengvivv01.local.domain/ > 172.16.1.135] failed: Connection refused (Connection refused) > > WARN 2018-11-26T13:18:26,862 (Worker thread '12') - Service interruption > reported for job 1533797717712 connection '

Re: ManifoldCF Docker MySQL Connection Error

2018-11-24 Thread Karl Wright
e"/> >value="custom_hostname"/> > > So, I've added that properties to make it work. Shouldn't hostname, > dbsuperusername and dbsuperuserpassword be enough? > > Kind Regards, > Furkan KAMACI > > > On Sat, Nov 24, 2018 at 5:40 PM Karl Wr

Re: ManifoldCF Docker MySQL Connection Error

2018-11-24 Thread Karl Wright
-- > -- > Database: amarok > jdbcDriver: com.mysql.jdbc.Driver > jdbcUrl: > jdbc:mysql://localhost/amarok?useUnicode=true=utf8 > userName: manifoldcf > password: local_pg_passwd > -- > > So, it doesn't try to connect a host rather than localhost without >

Re: Manifold fails with alfresco | pipeline exception

2018-11-24 Thread Karl Wright
Hi Nikhilesh, Where are you seeing these errors? They sound like ElasticSearch errors to me; it is complaining that an null or empty-string pipeline name is being specified somehow. Can you tell me what version of ElasticSearch you are using? We have outstanding tickets for updating the

Re: Language Detection for the data

2018-11-21 Thread Karl Wright
Hi Nikita, Can you be more specific when you say "OpenNLP is not working"? All that this connector does is integrate OpenNLP as a ManifoldCF transformer. It uses a specific directory to deliver the models that OpenNLP uses to match and extract content from documents. Thus, you can provide any

Re: web connector : links extraction issues

2018-11-15 Thread Karl Wright
ie the website needs to escape itself the special > characters otherwise the extraction will not work in MCF, am I right ? > > Best regards, > > Olivier > > > > Le 15 nov. 2018 à 12:57, Karl Wright a écrit : > > Hi Olivier, > > You can create a ticket but I don't h

Re: web connector : links extraction issues

2018-11-15 Thread Karl Wright
nk extraction starting > DEBUG 2018-10-30T11:48:13,553 (Worker thread '36') - WEB: no content > exclusion rule supplied... returning > DEBUG 2018-10-30T11:48:13,553 (Worker thread '36') - WEB: Decided to > ingest 'http://localhost:/testjs/test.html' > — > So special characters like th

Re: Error Job stop after repeatidly interruption

2018-11-15 Thread Karl Wright
apply the same your concept , wait 10 sec and retry > three times , to the 503 error , too? > > > > So, I would like to try, if, with the modification, I obtain that job end > correctly instead of failure. > > > > > > Thanks a lot > > Mario > > >

Re: Error Job stop after repeatidly interruption

2018-11-15 Thread Karl Wright
t > (in my case manifoldcf) > > > https://issues.apache.org/jira/browse/TIKA-2776?focusedCommentId=16686620=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16686620 > > > > I am not able to do this… > > Is it possible to implement on the MCF so

Re: Job stuck - WorkerThread functions return null

2018-11-14 Thread Karl Wright
; QueuedDocument qd = previousDocuments.get(documentIdentifierHash); > // return null. The problem is here. > if (qd == null) > throw new IllegalArgumentException("Unrecognized document > identifier: '"+documentIdentifier+"'"); > r

Re: Job stuck - WorkerThread functions return null

2018-11-12 Thread Karl Wright
Hi, Have you been modifying the framework code? If so, I really cannot help you. If you haven't -- it looks like you've got code that is injecting document identifiers that are incorrect. But I will need to see a full stack trace to be sure of that. Thanks, Karl On Mon, Nov 12, 2018 at 4:06

Re: Error Job stop after repeatidly interruption

2018-11-08 Thread Karl Wright
Hi Mario, The Tika external connector retries for a while before it gives up and aborts the job. If you can get the Tika server back up within a reasonable period of time all should be well. But if one specific document *always* brings down the Tika server, it will be hard to recover from that.

Re: Job stuck without message

2018-11-06 Thread Karl Wright
> the simple history for one of these documents; I need to see what happened > to it last. > > > > Thanks, > > Karl > > > > > > On Tue, Nov 6, 2018 at 7:32 AM Bisonti Mario > wrote: > > My version is 2.11 > > > > > > > > &g

Re: Job stuck without message

2018-11-06 Thread Karl Wright
ok, can you create a ticket? Also, I'd appreciate it if you can look at the simple history for one of these documents; I need to see what happened to it last. Thanks, Karl On Tue, Nov 6, 2018 at 7:32 AM Bisonti Mario wrote: > My version is 2.11 > > > > > > >

Re: Job stuck without message

2018-10-30 Thread Karl Wright
; > > Solr server is ok > > Tika server is ok > > Agent is ok > > Tomcat with ManifoldCF is ok > > > > I could search if I could to put in info log mode for example Tika servrer > or Solr. > > > > Thanks.. > > > > > > *Da:* Ka

Re: Job stuck without message

2018-10-30 Thread Karl Wright
Yes, I see many docs in the docs queue but they are inactive. > > > > Infact i see that no more docs are indexed in Solr and I see that job is > with the same number of docs Active (35012) > > > > > > > > > > *Da:* Karl Wright > *Inviato:* martedì 30 otto

Re: Job stuck without message

2018-10-30 Thread Karl Wright
The reason the job is "stuck" is because: ' JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy.' This means that ManifoldCF will retry this document for a while before it gives up on it. It appears to be stuck but it is not. You

Re: web connector : links extraction issues

2018-10-29 Thread Karl Wright
Hi Olivier, Javascript inclusion in the Web Connector is not evaluated. In fact, no Javascript is executed at all. Therefore it should not matter what is included via javascript. Thanks, Karl On Mon, Oct 29, 2018 at 1:39 PM Olivier Tavard < olivier.tav...@francelabs.com> wrote: > Hi, > >

Re: Contribution help for the Confluence connector patch

2018-10-24 Thread Karl Wright
Never mind, I was able to get it fixed. Karl On Wed, Oct 24, 2018 at 10:19 AM Karl Wright wrote: > I've created CONNECTORS-1551, and attached the patch. > > Unfortunately there seems to be some encoding issues with > common_ja_JP.properties; can you send that one fi

Re: Contribution help for the Confluence connector patch

2018-10-24 Thread Karl Wright
I've created CONNECTORS-1551, and attached the patch. Unfortunately there seems to be some encoding issues with common_ja_JP.properties; can you send that one file via email as an attachment? Thanks! Karl On Tue, Oct 23, 2018 at 8:54 PM 白井 隆/ Shirai Takashi wrote: > Hi, there. > > I've just

Re: error when running jobs

2018-10-24 Thread Karl Wright
need or maybe we have also to increase general log level. > > Thanks in advance. > > > El mar., 23 oct. 2018 a las 14:28, Gustavo Beneitez (< > gustavo.benei...@gmail.com>) escribió: > >> Thanks Karl, we are going to make new crawls with that property enable >>

Re: error when running jobs

2018-10-23 Thread Karl Wright
On Tue, Oct 23, 2018 at 2:34 AM Gustavo Beneitez wrote: > I Karl, > > MySQL. As per config variables: > version 5.7.23-log > version comment MySQL Community Server (GPL) > > which file should I enable logging/debugging? > > Thanks! > > El lun., 22 oct. 2018 a las

Re: error when running jobs

2018-10-22 Thread Karl Wright
Hi Gustavo, I have seen this error before; it is apparently due to the database failing to properly gate transactions and behave according to the concurrency model selected for a transaction. We have a debugging setting you can configure which logs the needed information so that forensics get

Re: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Karl Wright
; > > > > I read other discussion ( > https://lists.apache.org/thread.html/66a3f9780bbcc98e404e25f5a0e56a8a6c007448642c3bc15a366ed2@%3Cuser.manifoldcf.apache.org%3E) > but I don’t understand if they solved the issue > > > > ☹ > > > > Thanks a lot. >

Re: Logging and Document filter transformation connector

2018-10-11 Thread Karl Wright
Hi Olivier, The Repository connector has no knowledge of what the pipeline looks like. It simply asks the framework whether the mime type, length, etc. is acceptable to the downstream pipeline. It's the connector's responsibility to note the reason for the rejection in the simple history, but it

Re: Debug logging properties location

2018-10-11 Thread Karl Wright
Hi Olivier, it sounds like you are using Zookeeper. Certain properties are global and are imported into Zookeeper. Other properties are local and found in each local properties.xml file. The debug properties for logging is, I believe, global. Karl On Thu, Oct 11, 2018 at 8:39 AM Olivier

Re: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Karl Wright
> > > Could be that, unchecking the flag, ManifoldCF doesn’t use the mime types > specified? > > > > I am using a snapshot version of ManifoldCF of three monts ago. > > > > > > > > > > *Da:* Karl Wright > *Inviato:* giovedì 11 ottobre 2018

Re: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Karl Wright
ly and the "use extracting update handler" box is UNCHECKED. Thanks, Karl On Thu, Oct 11, 2018 at 8:16 AM Karl Wright wrote: > When you uncheck the "use extracting update handler" checkbox, the Solr > connection only accepts text/plain, and no binary formats. The

Re: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Karl Wright
see them? > > > > Perhaps is the “Ignore Tika exception that I don’t know where to set in > ManifoldCF the problem? > > > > > > > > > > > > *Da:* Karl Wright > *Inviato:* giovedì 11 ottobre 2018 12:24 > *A:* user@manifoldcf.apache.org > *Oggetto:* Re

Re: Option to skip documents

2018-10-09 Thread Karl Wright
r1843343 adds this condition to the list of caught conditions. In the future it would be better to create a ticket. Karl On Tue, Oct 9, 2018 at 3:06 PM Karl Wright wrote: > I can make it retry then skip if it doesn't succeed in a while. > > Karl > > > On Tue, Oct 9, 2018 a

Re: Option to skip documents

2018-10-09 Thread Karl Wright
e file several times in a row, gives up > after several tries and stops the jobs with a message reporting the smb > Exception encountered. > > Thanks for your answer, > Romaric > > So it is indeed a temporary lock, but we can't tell how long it will last. > > Le 09/10/

Re: Option to skip documents

2018-10-09 Thread Karl Wright
Hi Romaric, If the error is transient, then the right thing to do is *not* to skip the file, but to retry later. What currently happens? Karl On Tue, Oct 9, 2018 at 10:05 AM Romaric Pighetti < romaric.pighe...@francelabs.com> wrote: > Hi Karl, > Along the lines of this ticket >

Re: Sharepoint connector help : site didn't exist or external

2018-10-08 Thread Karl Wright
Excellent news! Thanks for the update. Karl On Mon, Oct 8, 2018 at 1:54 PM Susheel Kumar wrote: > Thank you so much Karl. I was able to crawl the site and index them. > > On Wed, Oct 3, 2018 at 3:31 PM Karl Wright wrote: > >> Please read the user documentation for the sh

Re: Query to get the number of documents processed from PostgreSQL

2018-10-08 Thread Karl Wright
If you want all the documents for a specific job, the query is: select count(*) from jobqueue where jobid= Karl On Mon, Oct 8, 2018 at 4:23 AM Romaric Pighetti < romaric.pighe...@francelabs.com> wrote: > Hi Karl, > > I am currently facing the need of getting the number of documents >

Re: Sharepoint connector help : site didn't exist or external

2018-10-03 Thread Karl Wright
;> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> >> "Host: dit.apps.com[\r][\n]" >> DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> >> "Connection: Keep-Alive[\r][\n]" >> DEBUG 2018-10-03T13:27:17,603

Re: Sharepoint connector help : site didn't exist or external

2018-10-03 Thread Karl Wright
com[\r][\n]" > DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> > "Connection: Keep-Alive[\r][\n]" > DEBUG 2018-10-03T13:27:17,603 (Thread-50830) - http-outgoing-102 >> > "Accept-Encoding: gzip,deflate[\r][\n]" > DEBUG 2

Re: Sharepoint connector help : site didn't exist or external

2018-10-01 Thread Karl Wright
> %5p %d{ISO8601} (%t) - %m%n > > > > > > > > > > > Logger to enable connector debug messages > === > http://log4j.logger.org/> > org.apache.manifoldcf.connectors" level="DEBUG" additivity=&q

Re: Sharepoint connector help : site didn't exist or external

2018-09-28 Thread Karl Wright
ve > indefinitely > DEBUG 2018-09-28T08:37:06,435 (Thread-153948) - Connection released: [id: > 29][route: {}->http://server1:8080][state: class > org.apache.solr.client.solrj.impl.HttpSolrClient][total kept alive: 1; > route allocated: 1 of 1; total allocated: 1 of 1] >

Re: Status and Job Management

2018-09-27 Thread Karl Wright
; > > I hope this is the information you require. > > > > Regards > > > > > > *Damien Collis * > Team Leader – Systems Integration > Technology & Innovation Division, Link Group > > Level 3, 1A Homebush Bay Drive, Rhodes NSW 2138 > *T*+

Re: Status and Job Management

2018-09-27 Thread Karl Wright
Hi Damien, Can you describe your database setup? Karl On Thu, Sep 27, 2018 at 1:50 AM Damien Collis wrote: > All, > > > > I am currently having trouble loading the “Status and Job Management” web > page. I have set up a new Job but am unable to start it. > > > > Sometimes the “Status and Job

Re: Solr examples with long metadata needed

2018-09-26 Thread Karl Wright
:34:53.000Z_resolution=300+dots=32_conditions=view+(0x76696577):+36+bytes_description=sRGB+IEC61966-2.1_image_width=3840+pixels=OCR_conditions_description=Reference+Viewing+Condition+in+IEC61966-2.1_height=2160+pixels}{add=[file:/localhost/OCR/HOT%20Balloon%20Trip_Ultra%20HD.jpg > > (161268

Solr examples with long metadata needed

2018-09-26 Thread Karl Wright
Hi ManifoldCF Community, I need one or two concrete examples of solr [INFO] log messages that include very long metadata (>8192). This is apparently critical for getting the SolrJ team to be able to understand ManifoldCF's usage of solr. If you have such examples around, please be sure that the

Re: Scheduler not working as we expected

2018-09-25 Thread Karl Wright
e anu suggestion, would be really gratful > > ronny.hey...@qbere.com > > > aan ik > > Op di 31 jul. 2018 om 12:12 schreef Karl Wright : > >> Hi Vinay, >> >> Dynamic rescan is meant for web-crawling and revisits already crawled >> documents based on ho

Re: Google Drive connector help

2018-09-21 Thread Karl Wright
I have only ever tried this with a personal account. I have no idea why a business account would differ. Karl On Fri, Sep 21, 2018 at 8:16 AM douglas...@gmail.com wrote: > I forgot to mention that I am using the version 2.10. > > On 2018/09/21 12:15:21, douglas...@gmail.com > wrote: > >

Re: Sharepoint connector help : site didn't exist or external

2018-09-19 Thread Karl Wright
; DEBUG 2018-09-19T11:29:13,851 (qtp1165791284-402) - Loaded System Directive: > org.apache.velocity.runtime.directive.Include > DEBUG 2018-09-19T11:29:13,851 (qtp1165791284-402) - Loaded System Directive: > org.apache.velocity.runtime.directive.Foreach > DEBUG 2018-09-19T11:29:13,852 (qtp1165791284-402) - Created '20' parsers. &

Re: Sharepoint connector help : site didn't exist or external

2018-09-19 Thread Karl Wright
Hi Susheel, The problem is likely your site path. The actual path looks like it should be just "/ES", not "/ES/_layouts/15". Karl On Wed, Sep 19, 2018 at 9:18 AM Susheel Kumar wrote: > Hello, > > I am new this mailing list and just started using manifold to able to > index data from our

Re: PostgreSQL version to support MCF v2.10

2018-09-05 Thread Karl Wright
> <http://www.remcam.net/> Skype: svanschalkwyk > <https://mail.google.com/mail/u/0/#> > <http://linkedin.com/in/vanschalkwyk> > > On Wed, Sep 5, 2018 at 11:05 AM, Karl Wright wrote: > >> I'm already working on the Web Connector. The UI has problems that >> p

Re: PostgreSQL version to support MCF v2.10

2018-09-05 Thread Karl Wright
ch Engines > +1.314.452. <+1+314+452+2896>2896st...@remcam.net http://remcam.net > <http://www.remcam.net/> Skype: svanschalkwyk > <https://mail.google.com/mail/u/0/#> > <http://linkedin.com/in/vanschalkwyk> > > On Wed, Sep 5, 2018 at 6:33 AM, Kar

Re: PostgreSQL version to support MCF v2.10

2018-09-05 Thread Karl Wright
The patch I uploaded doesn't work because the entire tab is broken; looks like the UI refactoring broke it and it was never reported. Fixing now. Karl On Wed, Sep 5, 2018 at 3:57 AM Karl Wright wrote: > I coded up the web connector feature I think we need. See > CONNECTORS-1528

Re: PostgreSQL version to support MCF v2.10

2018-09-05 Thread Karl Wright
2018 at 4:17 PM Karl Wright wrote: > Hi Steph, > > Right, you wouldn't want to touch the framework. > > The effect of lower-casing the documentURI parameter in the > addOrReplaceDocumentWithException method in an output connector would be to > map multiple, independently-fetc

Re: PostgreSQL version to support MCF v2.10

2018-09-04 Thread Karl Wright
ttp://remcam.net >> <http://www.remcam.net/> Skype: svanschalkwyk >> <https://mail.google.com/mail/u/0/#> >> <http://linkedin.com/in/vanschalkwyk> >> >> On Tue, Sep 4, 2018 at 1:33 PM, Karl Wright wrote: >> >>> Let's make sure we

Re: PostgreSQL version to support MCF v2.10

2018-09-04 Thread Karl Wright
Skype: svanschalkwyk > <https://mail.google.com/mail/u/0/#> > <http://linkedin.com/in/vanschalkwyk> > > On Tue, Sep 4, 2018 at 12:22 PM, Karl Wright wrote: > >> THanks for the update. >> Lower-casing the ID would be fine except there are some connectors that >&

Re: PostgreSQL version to support MCF v2.10

2018-09-04 Thread Karl Wright
Hi Steph, I suspect that Jetty is leaking some resource, and we may need to upgrade it. Karl On Tue, Sep 4, 2018 at 11:26 AM Steph van Schalkwyk wrote: > Olivier > By all means. > The only issue I have seen (totally unrelated) is with Jetty, which has to > be restarted about once a week.

Re: Install Sharepoint 2013 plugin Failure

2018-09-03 Thread Karl Wright
I'm afraid this is not something we can fix here, since we do not have a Sharepoint 2013 server setup, and this seems particular to yours specifically. The error you are getting looks intermittent too: >> This server farm is not available. << Karl On Mon, Sep 3, 2018 at 6:19 AM Cheng

Re: Exception in the running Custom Job

2018-08-29 Thread Karl Wright
gesting the > document. > > Please have look on the attachment for the methods which might are the > problem area. > > On Wed, Aug 29, 2018 at 1:44 PM, Karl Wright wrote: > >> So the Allowed Document transformer is now working, and your connector is >> now skipping document

Re: Exception in the running Custom Job

2018-08-29 Thread Karl Wright
for both the Length and checkLengthIndexable() method is same. > And the Allowed Document is also working. But main problem is crashing > down of the service and it displays memory Leakage error every time after > crawling few set of documents.. > > > > On Tue, Aug 28, 2018

Re: Exception in the running Custom Job

2018-08-28 Thread Karl Wright
ue is also used in the code and it is > returning the exact value for document length. > > Also garbage collector and disposing for the threads is used. > > > > On Tue, Aug 28, 2018 at 5:44 PM, Karl Wright wrote: > >> I don't see checkLengthIndexable() in this list. You n

Re: Exception in the running Custom Job

2018-08-28 Thread Karl Wright
LIndexable(fileUrl)) > (!activities.checkMimeTypeIndexable(contentType)) > (!activities.checkDateIndexable(modifiedDate)) > > > But this service crashes after crawling approx 2000 documents. > > I think there is some other thing hitting it and creating problem. > > > > >

Re: Exception in the running Custom Job

2018-08-24 Thread Karl Wright
;45" is also being checked and as per > the documentation it is checked for different values. > > > Please suggest the possible problem area and steps to be taken. > > On Mon, Aug 20, 2018 at 7:30 PM, Karl Wright wrote: > >> Obviously your Allowed Documents filter is som

Re: Documents that didn't change are reindexed

2018-08-23 Thread Karl Wright
ifferent. > > Consulting the "Simple history" menu option shows that Elastic output > connector is called > "08-23-2018 06:27:19.274 Indexation (Elasticsearch 2.4.6)" > So I guess there is a miss-configuration somewhere... > > > > El jue., 23 ago. 20

Re: Documents that didn't change are reindexed

2018-08-22 Thread Karl Wright
Hi Gustavo, I take it from your question that you are using the Web Connector? All connectors create a version string that is used to determine whether content needs to be reindexed or not. The Web Connector's version string uses a checksum of the page contents; we found the "last modified"

Re: Exception in the running Custom Job

2018-08-20 Thread Karl Wright
ly. > > I am using in the same sequence. The allowed document is added first and > then the Tika Transformation. > > > > > But nothing runs in that scenario. The job simply ends without returning > anything in the output. > > > > > > > On Mon, Aug 20

Re: Exception in the running Custom Job

2018-08-20 Thread Karl Wright
Hi, You are running out of memory. Tika's memory consumption is not well defined so you will need to limit the size of documents that reach it. This is not the same as limiting the size of documents *after* Tika extracts them. The Allowed Documents transformer therefore should be placed in the

Re: Driver class not found: net.sourceforge.jtds.jdbc.Driver

2018-08-16 Thread Karl Wright
Hi Sven, When MCF is built, two entirely distinct versions of the examples are created -- a standard version, and a "proprietary" version. The proprietary version does not in general include any proprietary jars and leaves connectors that depend on them disabled in the connectors.xml file. The

Re: Documentum indexing issue

2018-08-16 Thread Karl Wright
Hi Sharnel, (1) I cannot create a patch unless you create a ticket I can attach it to. (2) I can easily recognize this kind of corruption and allow MCF to skip the document, and I've committed that change (r1838171). However, partially indexing a document that is partially corrupted like this is

Re: Different time in Simple History Report

2018-08-14 Thread Karl Wright
ltiprocess-file-example-proprietary/ > > sudo cp > /opt/manifoldcf_ok/multiprocess-file-example-proprietary/properties.xml > /opt/manifoldcf/multiprocess-file-example-proprietary/ > > > > sudo service tomcat start > > > > > > I obtained some warnings in th

Re: Different time in Simple History Report

2018-08-14 Thread Karl Wright
gt; > > > > > > > *Da:* Karl Wright > *Inviato:* martedì 14 agosto 2018 15:25 > *A:* user@manifoldcf.apache.org > *Oggetto:* Re: Different time in Simple History Report > > > > There were a number of files committed. > > > > > > On Tue, Aug

<    1   2   3   4   5   6   7   8   9   10   >