Re: Documentum connection not working

2019-07-16 Thread Karl Wright
Are you running the documentum connector sidecar processes? You need to be running those, and the documentum_server process must include a valid DFC distribution with a valid configuration file. This is where the documentum server name comes from. The documentation for "how to build and deploy"

Documentum connection not working

2019-07-16 Thread Bisonti Mario
Hallo. I am using MCF 2.12 I would like to create a Repository connection to a Documentum Docbase I obtain always the error: Connection temporarily failed: Connection refused to host: 127.0.0.1; nested exception is: java.net.ConnectException: Connection refused (Connection refused) I don’t

Re: Reg. unstable Manifold instance

2019-07-15 Thread Michael Cizmar
Are there some errors or anything interest in the log? -- Michael Cizmar From: Karl Wright Reply-To: "user@manifoldcf.apache.org" Date: Monday, July 15, 2019 at 7:56 AM To: "user@manifoldcf.apache.org" Subject: Re: Reg. unstable Manifold instance I have heard of this issue before. The app

Re: Reg. unstable Manifold instance

2019-07-15 Thread Karl Wright
I have heard of this issue before. The app server is what is giving back the 404 errors. I wonder if the version of jetty we ship has a resource leak of some kind. Karl On Sun, Jul 14, 2019 at 11:34 PM Praveen Bejji wrote: > Hi, > > We have been running manifoldcf for almost 6 months now.

Reg. unstable Manifold instance

2019-07-14 Thread Praveen Bejji
Hi, We have been running manifoldcf for almost 6 months now. We see that the Web app goes down intermittently and gives an 404 error. However, we are able to access the authority URL and could see that jobs are running as scheduled. Restarting Manifold fixes the issue but it appears again after

Re: Some jobs is waiting as "stopping" status

2019-07-13 Thread Karl Wright
If that *fails*, let's try running the standard load tests on your machine. To do that, download the sources, and do the following: ant make-core-deps ant load-hs That should take many hours, but if it fails you'll know there's something fundamentally wrong with your environment. Karl On

Re: Some jobs is waiting as "stopping" status

2019-07-13 Thread Cihad Guzel
In addition my all job are waiting as "End notification" status now. Cihad Güzel Cihad Guzel , 13 Tem 2019 Cmt, 22:27 tarihinde şunu yazdı: > Hi Karl, > > No, this is different new setup. But I use same version for mfc and > database. I am tring new setup for my every testing. I didn't see any

Re: Some jobs is waiting as "stopping" status

2019-07-13 Thread Cihad Guzel
Hi Karl, No, this is different new setup. But I use same version for mfc and database. I am tring new setup for my every testing. I didn't see any repeated or non-repeated error logs like before. Then, I have build the jdbc connector from trunk branch and changed the jdbc-connector.jar with new

Re: Some jobs is waiting as "stopping" status

2019-07-13 Thread Karl Wright
You previously reported errors of the kind that ManifoldCF throws when it finds that the database seemingly lost transactional integrity. My question is whether you are still using the same database setup where you previously got those errors? The ArrayIndexOutOfBounds exception applied to JDBC

Re: Some jobs is waiting as "stopping" status

2019-07-13 Thread Cihad Guzel
Hi Karl, If you are talking about is https://issues.apache.org/jira/browse/CONNECTORS-1613, my setup doesn't include this change because I use mfc 2.12. Are you suggesting I use a trunk version? Cihad Güzel Karl Wright , 13 Tem 2019 Cmt, 17:27 tarihinde şunu yazdı: > Is this the same setup

Re: Some jobs is waiting as "stopping" status

2019-07-13 Thread Karl Wright
Is this the same setup where you were getting errors because of inconsistent database states? That could lead to this problem, you know. Karl On Sat, Jul 13, 2019 at 10:14 AM Cihad Guzel wrote: > Hi Karl, > > I also have a job waiting for 12 days as an "Aborting" status. > > Status: Aborting

Re: Some jobs is waiting as "stopping" status

2019-07-13 Thread Cihad Guzel
Hi Karl, I also have a job waiting for 12 days as an "Aborting" status. Status: Aborting Start time: 7/1/19 4:01:46 PM Documents: 10003 Active: 10003 Processed: 10002 Cihad Guzel Cihad Guzel , 13 Tem 2019 Cmt, 16:53 tarihinde şunu yazdı: > Hi Karl, > > I tried quick-start single process

Re: Some jobs is waiting as "stopping" status

2019-07-13 Thread Cihad Guzel
Hi Karl, I tried quick-start single process model. After your suggestion , i have tried multiprocess-zk-example for zookeeper-based locking. But I have the same problem. Status: Aborting Start time: 7/13/19 3:42:10 PM Documents: 10003 Active: 10003 Processed: 1021 I'm waiting for over an hour

Re: Some jobs is waiting as "stopping" status

2019-07-08 Thread Karl Wright
Are you using file-based locking? If so, I would suggest strongly migrating to zookeeper-based locking. But if you are using the file-based locking, please execute the "lock clean procedure" as follows: - shut down all manifoldcf processes, including the web UI - run the lock-clean script - start

Re: Some jobs is waiting as "stopping" status

2019-07-08 Thread Cihad Guzel
Hi Karl, Nothing. I don't have any error log. 8 Tem 2019 Pzt 03:18 tarihinde Karl Wright şunu yazdı: > Hi Cihad, > > What does your manifoldcf log have in it? Any errors? > > Karl > > > On Sun, Jul 7, 2019 at 3:52 PM Cihad Guzel wrote: > >> Hi Karl, >> >> I mistakenly wrote "Stopping"

Re: Some jobs is waiting as "stopping" status

2019-07-07 Thread Karl Wright
Hi Cihad, What does your manifoldcf log have in it? Any errors? Karl On Sun, Jul 7, 2019 at 3:52 PM Cihad Guzel wrote: > Hi Karl, > > I mistakenly wrote "Stopping" instead of "Aborting". My job is waiting as > "Aborting" status. I have also the same problem while restarting. I am > waiting

Re: Some jobs is waiting as "stopping" status

2019-07-07 Thread Cihad Guzel
Hi Karl, I mistakenly wrote "Stopping" instead of "Aborting". My job is waiting as "Aborting" status. I have also the same problem while restarting. I am waiting for 2 days for one job. Regards, Cihad Guzel Cihad Guzel , 7 Tem 2019 Paz, 22:42 tarihinde şunu yazdı: > Hi Karl, > > I have a few

Some jobs is waiting as "stopping" status

2019-07-07 Thread Cihad Guzel
Hi Karl, I have a few jobs. I stopped all of them but only one job is waiting as "stopping" status. I know that some large jobs is waited long time. But, I have only 1000 rows on database. So, all of jobs crawled small number of documents. But , It doesn't make much sense to stay status of

Re: manifoldCF and sitemap

2019-07-04 Thread Karl Wright
Maybe? The web connector might be able to do this for you. Karl On Thu, Jul 4, 2019 at 6:18 AM LIROT Daniel (Chef de projet web et collaboratif) - SG/SNUM/UNI/DETN/GPBCW/PPCW < daniel.li...@developpement-durable.gouv.fr> wrote: > Hello, > > I'd like to know if manifoldCF is able to used

manifoldCF and sitemap

2019-07-04 Thread Chef de projet web et collaboratif
Hello, I'd like to know if manifoldCF is able to used sitemap.xml file to crawl a website, or optimized it Best regards --

Re: 'real-time'/frequent ingestion using ManifoldCF

2019-07-02 Thread Karl Wright
About the only thing I can suggest that would work within the ManifoldCF framework would be to structure your jobs so that most runs are "Minimal" runs with "Complete" runs being done every 24 hours. This should pick up documents that have been changed or added but will not go through the process

'real-time'/frequent ingestion using ManifoldCF

2019-07-02 Thread R .
Hello,   we are ingesting Documentum system using ManifoldCF and index those documents into Elasticsearch. Structure of the Documentum system is a few hundreds of cabinets containing together a few millions of documents. We have defined about 80 ManifoldCF jobs and each job process some portion

JDBC Connector Max Connection Size is set as hardcoded

2019-06-28 Thread Cihad Guzel
Hi Karl, I create a new jdbc repository connection and I set "max connections "and "Max avg fetches/min" from throttling tab on mfc-ui. Then, I reviewed JDBCConnectionFactory.java and I have encountered some hardcoded parameters as follow: cp =_pool.addAlias(poolKey, driverClassName, dburl,

Re: ssh connector

2019-06-24 Thread Cihad Guzel
Hi Saunier, ManifoldCF does not exist SFTP connectors yet. You can write a new connector for this purpose. Please follow this guide: https://manifoldcf.apache.org/release/release-2.12/en_US/writing-repository-connectors.html Regards, Cihad Guzel SAUNIER Maxence , 24 Haz 2019 Pzt, 14:57

Re: Unexpected job status encountered

2019-06-24 Thread Karl Wright
Created and resolved CONNECTORS-1613. Karl On Mon, Jun 24, 2019 at 8:28 AM Karl Wright wrote: > Hi Cihad, > > The unexpected job status error I cannot help you with; somehow your > database has gotten corrupted. But I'm looking into the AIOOBE issue now. > > Karl > > > On Mon, Jun 24, 2019

Re: Unexpected job status encountered

2019-06-24 Thread Karl Wright
Hi Cihad, The unexpected job status error I cannot help you with; somehow your database has gotten corrupted. But I'm looking into the AIOOBE issue now. Karl On Mon, Jun 24, 2019 at 6:01 AM Cihad Guzel wrote: > Hi Karl, > > I have 2 types error as follow: > > FATAL 2019-06-24T09:37:27,226

RE: ssh connector

2019-06-24 Thread SAUNIER Maxence
Hello Cihad, And for SFTP, do you know what I could use as a connector ? Thanks De : Cihad Guzel Envoyé : dimanche 23 juin 2019 20:51 À : user@manifoldcf.apache.org Objet : Re: ssh connector Hi Saunier, SSH is a network protocol for securely communicating between computers. It is not a

Re: Unexpected job status encountered

2019-06-24 Thread Cihad Guzel
Hi Karl, I have 2 types error as follow: FATAL 2019-06-24T09:37:27,226 (Worker thread '3') - Error tossed: 7 java.lang.ArrayIndexOutOfBoundsException: 7 at org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.applyMultiAttributeValues(JDBCConnector.java:2188) ~[?:?] at

Re: Unexpected job status encountered

2019-06-23 Thread Karl Wright
Hi Cihad, Do you have a stack trace of the ArrayIndexOutOfBounds exception? It would have to be taken from early when it started happening. What the "Error: Unexpected job status encountered: 1" error means is that the character that is stored in the job status field is not one that ManifoldCF

Re: ssh connector

2019-06-23 Thread Cihad Guzel
Hi Saunier, SSH is a network protocol for securely communicating between computers. It is not a repository. You can use jcifs connector for remote shared files. Jcifs is support samba connection for remote shared files. Please follow :

Unexpected job status encountered

2019-06-23 Thread Cihad Guzel
Hi, My crawler is received an error and is waiting with aborting status for a long time. My manifoldcf.log has some repetitive error logs as follow: java.lang.ArrayIndexOutOfBoundsException FATAL 2019-06-19T21:58:12,180 (Worker thread '5') - Error tossed: null So I restarted the manifoldcf

Re: Manifold Crawler Crashes

2019-06-20 Thread Karl Wright
If you are already on postgresql, then the memory usage is likely due to the Tika Extractor. It's not very well determined how much Tika uses for any given document; we try never to load documents into memory, but in some situations Tika uses a ton of memory nonetheless. The more worker threads

Re: Manifold Crawler Crashes

2019-06-20 Thread Priya Arora
I would highly recommend moving to Postgresql if you have any really sizable crawl. Yes, we are already using Postgresql 9.6.10 for it. Below are the settings in postgresql.conf file our postgres server. max_connections = 100 shared_buffers = 128MB #temp_buffers = 8MB #max_prepared_transactions

Re: Manifold Crawler Crashes

2019-06-20 Thread Karl Wright
If you are running single-process on top of HSQLDB, all database tables are kept in memory so you need a lot of memory. I would highly recommend moving to Postgresql if you have any really sizable crawl. Alternatively you could just hand the manifoldCF process more memory. Your choice.

Re: Manifold Crawler Crashes

2019-06-20 Thread Priya Arora
Hi Karl, 1) It's single process deployment process. 2) Not able to access through bash(during crash happens) 3) Server Configuration:- For Crawler server - 16 GB RAM and 8-Core Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz and For Elasticsearch server - 48GB and 1-Core Intel(R) Xeon(R) CPU E5-2660

Re: Manifold Crawler Crashes

2019-06-20 Thread Karl Wright
Hi Priya, Being unable to reach the web interface sounds like either a network issue or a problem with the app server. Can you describe the configuration you are running in? Is this a multiprocess deployment or a single-process deployment? When your docker container dies, can you still reach

Re: Manifold Crawler Crashes

2019-06-20 Thread Priya Arora
Hi Karl, Crash here means, "the site could not be reached" kind of HTML page appears , when accessing http://localhost:3000/mcf-crawler-ui/index.jsp. Explanation:- When running certain job on ManifoldCF server(2.13) after sometime (of successful running state), suddenly browser gives me "the site

Re: Manifold Crawler Crashes

2019-06-20 Thread Karl Wright
Please describe what you mean by "crash". What actually happens? Karl On Thu, Jun 20, 2019, 2:04 AM Priya Arora wrote: > > > Hi, > > I am running multiple jobs(2,3) simultaneously on Manifold server and the > configuration is > > 1) For Crawler server - 16 GB RAM and 8-Core Intel(R) Xeon(R)

Fwd: Manifold Crawler Crashes

2019-06-20 Thread Priya Arora
Hi, I am running multiple jobs(2,3) simultaneously on Manifold server and the configuration is 1) For Crawler server - 16 GB RAM and 8-Core Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz and 2) For Elasticsearch server - 48GB and 1-Core Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz Job working is to

RE: ssh connector

2019-06-17 Thread SAUNIER Maxence
I want to use an SSH or SFTP connection type for my repository connection. But, I do not know which connector to use. Do you have any idea ? Thanks De : Cihad Guzel Envoyé : lundi 17 juin 2019 12:24 À : user@manifoldcf.apache.org Objet : Re: ssh connector Hi Saunier, What do you want do

Re: ssh connector

2019-06-17 Thread Cihad Guzel
Hi Saunier, What do you want do exactly via ssh? Cihad Güzel SAUNIER Maxence , 17 Haz 2019 Pzt, 12:37 tarihinde şunu yazdı: > Hello Karl, > > > > Do ou have any news for this question ? > > > > Thanks you, > > > > *De :* SAUNIER Maxence > *Envoyé :* mercredi 12 juin 2019 12:16 > *À :*

Re: ssh connector

2019-06-17 Thread Karl Wright
Ssh is a connection technology, not a repository, so I really cannot answer this question. Karl On Mon, Jun 17, 2019 at 5:37 AM SAUNIER Maxence wrote: > Hello Karl, > > > > Do ou have any news for this question ? > > > > Thanks you, > > > > *De :* SAUNIER Maxence > *Envoyé :* mercredi 12 juin

TR: ssh connector

2019-06-17 Thread SAUNIER Maxence
Hello Karl, Do ou have any news for this question ? Thanks you, De : SAUNIER Maxence Envoyé : mercredi 12 juin 2019 12:16 À : user@manifoldcf.apache.org Objet : ssh connector Do you have any SSH connector for the repository ?

ApacheCon North America 2019 Schedule Now Live!

2019-06-12 Thread Rich Bowen
Dear Apache Enthusiast, (You’re receiving this message because you’re subscribed to one or more Apache Software Foundation project user mailing lists.) We’re thrilled to announce the schedule for our upcoming conference, ApacheCon North America 2019, in Las Vegas, Nevada. See it now at

ssh connector

2019-06-12 Thread SAUNIER Maxence
Do you have any SSH connector for the repository ?

Re: Crawling SharePoint Data

2019-06-11 Thread Furkan KAMACI
Hi Karl, Thanks for the explanation! Kind Regards, Furkan KAMACI On Wed, Jun 12, 2019 at 3:17 AM Karl Wright wrote: > Hi Furkan, > > No, the Lists service is critical for enumerating documents within > libraries, and libraries within sites. > > Karl > > > On Tue, Jun 11, 2019 at 6:13 PM

Re: Crawling SharePoint Data

2019-06-11 Thread Karl Wright
Hi Furkan, No, the Lists service is critical for enumerating documents within libraries, and libraries within sites. Karl On Tue, Jun 11, 2019 at 6:13 PM Furkan KAMACI wrote: > Hi, > > So, a SharePoint configuration without a List may not need that plugin? > > Kind Regards, > Furkan KAMACI >

Re: Crawling SharePoint Data

2019-06-11 Thread Furkan KAMACI
Hi, So, a SharePoint configuration without a List may not need that plugin? Kind Regards, Furkan KAMACI On Wed, Jun 12, 2019 at 12:43 AM Karl Wright wrote: > Hi Furkan, > > The plugin has been necessary for crawling, period, since SharePoint 2010, > because the native SharePoint Lists service

Re: Crawling SharePoint Data

2019-06-11 Thread Karl Wright
Hi Furkan, The plugin has been necessary for crawling, period, since SharePoint 2010, because the native SharePoint Lists service does not fully function. Thanks, Karl On Tue, Jun 11, 2019 at 5:30 PM Furkan KAMACI wrote: > Hi, > > One should install a plugin to crawl data from SharePoint.

Crawling SharePoint Data

2019-06-11 Thread Furkan KAMACI
Hi, One should install a plugin to crawl data from SharePoint. However is it possible to crawl SharePoint data without installing that plugin i.e. in the case of when security is disabled? Kind Regards, Furkan KAMACI

Re: Error: Unexpected jobqueue status - record id X, expecting active status, saw 4 (MySQL compatible Database)

2019-06-07 Thread Karl Wright
And yes, we'd also need to hand the mysql folks a similar test case. Karl On Sat, Jun 8, 2019 at 1:53 AM Karl Wright wrote: > Here's an explanation for Postgresql about what is supposed to happen in > this case. See slide 7. > > https://www.postgresql.org/files/developer/concurrency.pdf > >

Re: Error: Unexpected jobqueue status - record id X, expecting active status, saw 4 (MySQL compatible Database)

2019-06-06 Thread Karl Wright
I can look at the log output but not until this weekend. Karl On Thu, Jun 6, 2019 at 3:59 AM Markus Schuch wrote: > Hi Olivier, > > > > we were not able to fix this yet. > > > > But now we have new diagnostics log data, after the error occurred again > yesterday: > > > > Unexpected jobqueue

AW: Error: Unexpected jobqueue status - record id X, expecting active status, saw 4 (MySQL compatible Database)

2019-06-06 Thread Markus Schuch
Hi Olivier, we were not able to fix this yet. But now we have new diagnostics log data, after the error occurred again yesterday: Unexpected jobqueue status - record id 1522147023170, expecting active status, saw 4

RE: Web connector empty session cookie cache

2019-06-04 Thread Julien
Hi Karl, I understand, I’ll check that. Thanks Julien De : Karl Wright Envoyé le :lundi 3 juin 2019 20:37 À : user@manifoldcf.apache.org Objet :Re: Web connector empty session cookie cache Hi Julien, When the session-based web crawl detects entry into a login sequence, the session cookies

Re: Web connector empty session cookie cache

2019-06-03 Thread Karl Wright
Hi Julien, When the session-based web crawl detects entry into a login sequence, the session cookies are cleared at that point. Essentially your symptom means that you haven't been complete about setting up your login sequence. If you make it detect the case when the session cookie is wrong,

Web connector empty session cookie cache

2019-06-03 Thread Julien Massiera
Hi all, I was doing some tests with the Web connector, and after several tries with different configurations of my job to crawl a session based website, I noticed that one configuration was not working. So I debugged the job and noticed that the connector was using a wrong session cookie. In

Re: mxt file with TikaExtractor

2019-05-29 Thread Karl Wright
Hi Maxence, This should be something that you report to the Tika team. It's not something ManifoldCF can do anything about. Thanks, Karl On Wed, May 29, 2019 at 6:23 PM SAUNIER Maxence wrote: > > Hello Karl, > > > > We just realized that the TikaExtractor does not keep the line breaks for >

mxt file with TikaExtractor

2019-05-29 Thread SAUNIER Maxence
Hello Karl, We just realized that the TikaExtractor does not keep the line breaks for .mxt files for the content field. Can you look if you have the save problem ? I joined an mxt file Thank you. 20101103113807EMAIL.mxt Description: 20101103113807EMAIL.mxt

Re: Long running queries on jobqueue

2019-05-28 Thread Karl Wright
Oh, and you might want to check the JDBC driver to be sure it's rated as compatible with the version of the POstgresql database you are using. I imagine that can matter too. Karl On Tue, May 28, 2019 at 9:35 AM Karl Wright wrote: > When it fails again, I expect that the diagnostics will

Re: Long running queries on jobqueue

2019-05-28 Thread Karl Wright
When it fails again, I expect that the diagnostics will demonstrate conclusively that we had a database transactional integrity failure, as I have always seen when I look at the generated logs from this debug mode. In the past this has happened most often with mysql, but once in a great while with

RE: Long running queries on jobqueue

2019-05-28 Thread Julien
For MCF I am using a Postgres 10.1 database which is located on the same machine that performs the crawl, so no reason there is a latency problem. The MCF threads are configured like this : For info, following the recommended settings from the MCF documentation, I applied the

Re: Long running queries on jobqueue

2019-05-28 Thread Karl Wright
Hi Julien, "Error : Unexpected jobqueue status - record id 15588697113928, expecting active status, saw 0" As you said, the only thing that can be done here is to turn on diagnostic logging. Essentially, the status returned is not possible if the database is truly honoring transactional

Long running queries on jobqueue

2019-05-28 Thread Julien
Hi, I need some help to better understand what I am experiencing with a job using a JDBC connector. The job crawls a table containing approximately 9M of documents. I am currently unable to complete this job as it randomly fails with the Following error : Error : Unexpected jobqueue status -

Re: Error: Unexpected jobqueue status - record id X, expecting active status, saw 4 (MySQL compatible Database)

2019-05-21 Thread Olivier Tavard
Hi Markus, We have the same error (with postgresql database). Did the error occur again since your last mail ? Did you change something on your MCF configuration to fix this ? Thanks, Best regards, Olivier > Le 13 févr. 2019 à 13:58, Markus Schuch a > écrit : > > Hi Karl, > > we set

About the authorities to get ACLs

2019-05-13 Thread Cihad Guzel
Hi, We need to get ACLs for file crawler with Shared Driver connector. So we need to have an authorized user to read ACLs. I suggest that the crawler user has "domain admin" group at Active Directory. But it is an authority that people do not want to give easily. So, do you have any guide or

Re: Report export functionality

2019-05-08 Thread Karl Wright
The api supports access to report data. Karl On Wed, May 8, 2019, 5:49 AM Remko Mantel wrote: > Hi Manifold peoples, > > Is there any chance on development of export functionality of Reports > within Manifold? For example form the Simpel History reports you can create. > -- > Met vriendelijke

Report export functionality

2019-05-08 Thread Remko Mantel
Hi Manifold peoples, Is there any chance on development of export functionality of Reports within Manifold? For example form the Simpel History reports you can create. -- Met vriendelijke groet / Kind regards, Remko Mantel *Smartshore* Email: re...@smartshore.nl Mobile: +31 (0)6 15371601

Re: schedule job

2019-04-24 Thread Karl Wright
Read the manual, perhaps? http://manifoldcf.apache.org/release/release-2.12/en_US/end-user-documentation.html#jobs On Wed, Apr 24, 2019 at 6:04 PM Cihad Guzel wrote: > Hi, > > I try schedule job. I have created a job and set schedule values on > schedule tab. I have waited but my job hasn't

schedule job

2019-04-24 Thread Cihad Guzel
Hi, I try schedule job. I have created a job and set schedule values on schedule tab. I have waited but my job hasn't run. How do I run my job at specified time? Regards, Cihad Guzel

Re: about schedule job

2019-04-16 Thread Karl Wright
Hi Cihad, For the schedule to fire, *all* of the fields must match. So for example, if you want crawls to start on Tuesdays in October, you select "Tuesday" and "October" from the pulldowns. If you select both "Monday" and "13" then the crawl only starts on Monday the 13th. Karl On Tue, Apr

about schedule job

2019-04-16 Thread Cihad Guzel
Hi, I try schedule job. We can select "day of week" and "day of month" at scheduling tab. I think, we should only choose one of both. Is it true? Is there a case we use both at the same time? Regards, Cihad Guzel

Re: to handle multiple tables in a db crawler job

2019-04-07 Thread Karl Wright
Not in a single job. Each table is presumed to potentially have a different schema so the queries would be different. But you can construct views that represent your data if you wish to concatenate multiple tables. Karl On Sun, Apr 7, 2019 at 10:58 AM Cihad Guzel wrote: > Hi, > > I try

to handle multiple tables in a db crawler job

2019-04-07 Thread Cihad Guzel
Hi, I try database crawler. I see that one job handle only one table.Is there any way to handle multiple tables (or all tables) of a database in a job? Regards, Cihad Guzel

RE: setup Windows share repository connector on local machine

2019-03-28 Thread Craig Eby
Hi Karl, As an update, I have the Windows share repository connector working connecting to another windows machine. It is not what I wanted, but it is working. So if you have any direction on getting the Windows share connectors connecting to a share on the same machine, I would appreciate

RE: setup Windows share repository connector on local machine

2019-03-27 Thread Craig Eby
Hi Karl, Thanks for your response. My apologies, I meant to refer to the ‘File system’ repository connector type as the one that is working, but is not doing everything I would like it to. I do have a share setup on my computer, it is just that I can’t get the Windows share repository

Re: setup Windows share repository connector on local machine

2019-03-27 Thread Karl Wright
Hi Craig, 'FYI, I can get the Fileshare repository connector to work, but it is buggy and doesn’t do all that I need.' We don't actually have a "Fileshare" repository connector, so I am not sure what you are talking about here. And if you have bugs to report, please do so. The paths that you

setup Windows share repository connector on local machine

2019-03-27 Thread Craig Eby
Hello, I am trying to get a Windows share repository connector to crawl files on my local Windows 10 machine (to index into solr for some text analytics). Currently after I setup the Windows share repository connector and save it, the connection status states "Error" and one of a few

Re: Where and how is ManifoldCF used in production?

2019-03-21 Thread Markus Schuch
Hi, we use MCF as part of a homegrown Solr based enterprise search solution (1,1M docs). We use some of the shipped connectors (sharepoint, livelink, web) but also implemented a lot of ourselves for connecting proprietary repositories. We deploy on AWS on EC2 and RDS Aurora MySQL. Cheers,

Where and how is ManifoldCF used in production?

2019-03-21 Thread James Thomas
Hi all, We've been experimenting with and learning about ManifoldCF for a few months now and were wondering who is doing what with it in production. Would any of you be able to share a little about the applications you're working on, or know about, please? Any replies would be much

Re: Sharepoint Crawl - Missing documents

2019-03-06 Thread Karl Wright
The SharePoint connector requests documents in chunks of size 10,000. The request you point at gets the documents from row 50,000 through 60,000. The error text (if that is related to this request) shows that the request is timing out because SharePoint is not responding in a timely manner. I

Re: Sharepoint Crawl - Missing documents

2019-03-06 Thread Gaurav G
Hi Karl, On further digging in the Manifold log, I found the following lines..Do they point to any possible reason... We are working on getting the web service specific logs enabled in Sharepoint. Also wanted to check if the Manifold sharepoint plugin prints any logs.. DEBUG

4 Apache Events in 2019: DC Roadshow soon; next up Chicago, Las Vegas, and Berlin!

2019-03-06 Thread Rich Bowen
Dear Apache Enthusiast, (You’re receiving this because you are subscribed to one or more user mailing lists for an Apache Software Foundation project.) TL;DR: * Apache Roadshow DC is in 3 weeks. Register now at https://apachecon.com/usroadshowdc19/ * Registration for Apache Roadshow Chicago is

Re: Sharepoint Crawl - Missing documents

2019-03-06 Thread Karl Wright
Hi Guarav, Then I don't understand what is wrong. I've never seen this before, and that was the only thing I could think of. The only thing I can add is that the problem is taking place on the SharePoint side, so maybe (as the error suggests) it might be worth looking at the SharePoint server

Re: Sharepoint Crawl - Missing documents

2019-03-06 Thread Gaurav G
Hi Karl, The Sharepoint version is 2013. I double checked. The version of the plugin that is installed on the server and the one in the connection configuration is all 2013. Thanks, Gaurav On Wed, Mar 6, 2019 at 12:33 PM Karl Wright wrote: > Hi Guarav, > Which version of SharePoint is this?

Re: Sharepoint Crawl - Missing documents

2019-03-05 Thread Karl Wright
Hi Guarav, Which version of SharePoint is this? And, did you install the SharePoint plugin for ManifoldCF, and select the correct versions of SharePoint in the connection configuration? Versions of SharePoint after 2010 limiited the number of documents that could be returned from the Lists

Re: Sharepoint Crawl - Missing documents

2019-03-05 Thread Gaurav G
Hi Karl, There are no subsites as such. It is one big library with all documents in it in a flat structure. The same goes for the list. We enabled the logging for the connector and ran the list job. Below is the exception that it throws after it has crawled the list partially. It looks like after

Re: Sharepoint Crawl - Missing documents

2019-03-04 Thread Karl Wright
Hi Gaurav, There is no document count threshold value. If you can identify libraries or subsites that aren't being crawled, you can turn on connector debugging to see why the connector is skipping them. There could be many reasons for a library or site to be skipped, e.g. bad specification rules,

Sharepoint Crawl - Missing documents

2019-03-04 Thread Gaurav G
Hi, We are trying to crawl a Sharepoint list with about 150,000 items and a library with about 125,000 documents. We have separate jobs for both. The list job only crawls about 5 items and completes cleanly while the library job crawls about 4 documents and completes cleanly. We are

Re: Difference between Maximum document length and Max file size

2019-02-27 Thread Karl Wright
Hi Cihad, For "Maximum document length", you are talking about the Solr connector, correct? In that case it is the maximum size of extracted content that will be sent to Solr. (The connector assumes that when you aren't using the /update/extract handler you are extracting the content upstream

Difference between Maximum document length and Max file size

2019-02-27 Thread Cihad Guzel
Hi, What is difference between "Maximum document length" on "Content Length" tab and "Max file size (bytes)" on "Allowed Content" tab ? I think that: "Maximum document length" is the extracted data size , "Max file size" is the file size Is it true? Regards, Cihad Guzel

R: Threw exception: 'Driver class not found: net.sourceforge.jtds.jdbc.Driver'

2019-02-26 Thread Bisonti Mario
Great ! The problem wasn’t about the driver! Your suggestion illuminated me! It isn’t necessary to add to proprierty.xml the lib-proprietary folder, my problem was that I deployed in Tomcat: /opt/manifoldcf/web/war/mcf-api-service.war /opt/manifoldcf/web/war/mcf-authority-service.war

Re: custom jcifs properties

2019-02-25 Thread Cihad Guzel
Hi Karl, In some cases, "jcifs" is running slowly. In order to solve this problem, we need to set custom some properties. For example; my problem was in my test environment: I have a windows server and an ubuntu server in same network in AWS EC2 Service. The windows server has Active Directory

Re: Threw exception: 'Driver class not found: net.sourceforge.jtds.jdbc.Driver'

2019-02-25 Thread Karl Wright
Hi, any news here? Karl On Wed, Feb 20, 2019 at 1:35 PM Karl Wright wrote: > No, I stand corrected: the right class is in that jar: > > >> > C:\wip\mcf\trunk\dist\lib-proprietary>"c:\Program > Files\Java\jdk1.8.0_181\bin\jar" -tf jtds-1.2.4.jar | grep Driver >

Re: custom jcifs properties

2019-02-24 Thread Karl Wright
These settings were provided by the developer of jcifs, Michael Allen. You have to really understand the protocol well before you should consider changing them in any way. Thanks, Karl On Sun, Feb 24, 2019 at 9:53 AM Cihad Guzel wrote: > Hi, > > SharedDriveConnector have some hardcoded

custom jcifs properties

2019-02-24 Thread Cihad Guzel
Hi, SharedDriveConnector have some hardcoded system properties as follow: static { System.setProperty("jcifs.smb.client.soTimeout","15"); System.setProperty("jcifs.smb.client.responseTimeout","12"); System.setProperty("jcifs.resolveOrder","LMHOSTS,DNS,WINS");

Re: Sharepoint incremental crawl - last version

2019-02-22 Thread Karl Wright
Hi Gaurav, Yes, we can add fields to how the lastmodified column is computed, provided the information is available via web services. Please propose a patch. Thanks, Karl On Fri, Feb 22, 2019 at 7:16 AM Gaurav G wrote: > Hi All, > > We're facing a problem in getting updated sharepoint

Sharepoint incremental crawl - last version

2019-02-22 Thread Gaurav G
Hi All, We're facing a problem in getting updated sharepoint documents during the incremental crawl. The documents have an approval workflow in sharepoint. Upon creation the modified date gets assigned the current timestamp. We then crawl it and it gets crawled successfully. However then an

Re: Active documents

2019-02-21 Thread Remko Mantel
Hi Karl, We have set the Recrawl interval (I guess that is what you were asking for) at 8640 - that is (according to us) 6 days. Or do you want to know wore (specific) settings? With the smaller jobs we actually see that the amount of active documents is diminishing towards the end of the job,

Re: Active documents

2019-02-21 Thread Karl Wright
Please tell us how you have configured your job. Is it running in continuous mode? Because if so, that is exactly how it's supposed to look -- all documents remain active forever, until you stop the job. Karl On Thu, Feb 21, 2019 at 8:18 AM Remko Mantel wrote: > Good afternoon all, > > I

Active documents

2019-02-21 Thread Remko Mantel
Good afternoon all, I have actually 2 questions concerning 'Active documents' in a running job. We are currently using ManifoldCF 2.10 and crawl and process more than 500k documents. In Manifold we see a huge amount of documents in status 'active'. The processed amount is only going up slowly.

<    1   2   3   4   5   6   7   8   9   10   >