Manifold fails with alfresco | pipeline exception

2018-11-24 Thread Sivakoti, Nikhilesh
Hi Team, We have been trying to use ManifoldCF with our Alfresco. We have customized the manifold alfresco connector and manifold elastic search connector as per our needs. We have added the authentication mechanism in elastic search connector to connect to the QA servers. But when we try to

ManifoldCF Docker MySQL Connection Error

2018-11-24 Thread Furkan KAMACI
Hi All, I try to test ManifoldCF via docker. I've run mysql as follows: docker run --name custom-mysql -v /home/ubuntu/mysql-conf:/etc/mysql/conf.d -e MYSQL_ROOT_PASSWORD=mypass -d mysql:5.7.16 I've run my docker container of ManifoldCF as follows: docker run --name manifoldcf --link

Re: Language Detection for the data

2018-11-21 Thread Nikita Ahuja
HI All, Thanks for the timely replies. But I am basically concerned for the language detection of the .doc,.pdf or any other data present in the repository. As per my understanding Tika Transformation provides functionality for the same. But there is no output for the language of the documents.

Re: Language Detection for the data

2018-11-21 Thread Furkan KAMACI
Hi Nikita, First of all, OpenNLP is a transformation connector at ManifoldCF and should be enabled by default. It extracts named entities (people, locations and organizations) from document. You should download trained models to run OpenNLP connector. You can check here for such purpose:

Re: Language Detection for the data

2018-11-21 Thread Karl Wright
Hi Nikita, Can you be more specific when you say "OpenNLP is not working"? All that this connector does is integrate OpenNLP as a ManifoldCF transformer. It uses a specific directory to deliver the models that OpenNLP uses to match and extract content from documents. Thus, you can provide any

Language Detection for the data

2018-11-20 Thread Nikita Ahuja
Hi, I have query related to detect the language of the records/data which is going to be ingest in the Output Connector. OpenNLP connector is not working for the detection as per the user documentation, but this is not working appropriately. Please suggest is NLP has to be used if yes, then how

Re: web connector : links extraction issues

2018-11-15 Thread Karl Wright
Hi Olivier, The HTML parser built into MCF is quite resilient against badly formed HTML, but there are limits. Characters like "<" and ">" are used to denote tags and thus they confuse the parser when they are present in unescaped form. It may be possible, with a fair bit of work, to handle

Re: web connector : links extraction issues

2018-11-15 Thread Olivier Tavard
Hi Karl, Thanks for your answer. Could you detail your answer please ? Just to better understand : you mean that there is no chance that special characters could be escaped in the MCF code in this case ie the website needs to escape itself the special characters otherwise the extraction will

Re: web connector : links extraction issues

2018-11-15 Thread Karl Wright
Hi Olivier, You can create a ticket but I don't have a good solution for you in any case. Karl On Thu, Nov 15, 2018 at 6:53 AM Olivier Tavard < olivier.tav...@francelabs.com> wrote: > Hi Karl, > > Do you think that I need to create a Jira issue relative to this bug ie > that the links

Re: Error Job stop after repeatidly interruption

2018-11-15 Thread Karl Wright
(1) I increased the retries to go at least 10 minutes. (2) I handled the 503 response explicitly, with the same logic. See: https://issues.apache.org/jira/browse/CONNECTORS-1556 Karl On Thu, Nov 15, 2018 at 3:35 AM Bisonti Mario wrote: > Yes, Karl. > > > > Is it possible to apply the same

Re: Error Job stop after repeatidly interruption

2018-11-15 Thread Karl Wright
Hi Mario, Here's the code: >> try { //System.out.println("About to do a content PUT"); response = this.httpClient.execute(tikaHost, httpPut); //System.out.println("... content PUT succeeded"); } catch (IOException e) {

Re: Job stuck - WorkerThread functions return null

2018-11-14 Thread Karl Wright
Hi Cheng, Unless you are using carrydown information (that is, information that is recorded for a parent document that the child document needs access to), this is the method you want to use: activities.addDocumentReference(documentIdentifier); If you DO need to pull data recorded for a parent

Re: Job stuck - WorkerThread functions return null

2018-11-13 Thread Cheng Zeng
Hi Karl, Thanks a lot for your replay. I didn't change any code in the framework except my own repository connector. I found that there five methods which are available to inject document identifiers. Could you please tell me how I should choose the right way to inject the document

Re: Job stuck - WorkerThread functions return null

2018-11-12 Thread Karl Wright
Hi, Have you been modifying the framework code? If so, I really cannot help you. If you haven't -- it looks like you've got code that is injecting document identifiers that are incorrect. But I will need to see a full stack trace to be sure of that. Thanks, Karl On Mon, Nov 12, 2018 at 4:06

Job stuck - WorkerThread functions return null

2018-11-12 Thread Cheng Zeng
Hi Karl, I am developing my own repository where I borrowed some code from the file repository connector. I use my repository connector to crawling documents from IBM domino system. I managed to retrieve all the files in the domino, however, when I restart my job to recrawl the database in the

Re: Error Job stop after repeatidly interruption

2018-11-08 Thread Karl Wright
Hi Mario, The Tika external connector retries for a while before it gives up and aborts the job. If you can get the Tika server back up within a reasonable period of time all should be well. But if one specific document *always* brings down the Tika server, it will be hard to recover from that.

Error Job stop after repeatidly interruption

2018-11-08 Thread Bisonti Mario
Hallo. I am trying to index more than 500 documents in a Windows Share. It happens that job is interrupted due to repeatidly interruption. This is the manifold.log: . . WARN 2018-11-07T21:53:25,296 (Worker thread '59') - Service interruption reported for job 1533797717712 connection

Re: Job stuck without message

2018-11-06 Thread Karl Wright
I added a couple of questions to the ticket. Please reply. Thanks, Karl On Tue, Nov 6, 2018 at 8:56 AM Bisonti Mario wrote: > Thanks a lot, Karl. > > I created a ticket. > > https://issues.apache.org/jira/browse/CONNECTORS-1554 > > > > > > Thanks > > > > Mario > > > > > > > > *Da:* Karl

R: Job stuck without message

2018-11-06 Thread Bisonti Mario
Thanks a lot, Karl. I created a ticket. https://issues.apache.org/jira/browse/CONNECTORS-1554 Thanks Mario Da: Karl Wright Inviato: martedì 6 novembre 2018 14:28 A: user@manifoldcf.apache.org Oggetto: Re: Job stuck without message ok, can you create a ticket? Also, I'd appreciate it if

Re: Job stuck without message

2018-11-06 Thread Karl Wright
ok, can you create a ticket? Also, I'd appreciate it if you can look at the simple history for one of these documents; I need to see what happened to it last. Thanks, Karl On Tue, Nov 6, 2018 at 7:32 AM Bisonti Mario wrote: > My version is 2.11 > > > > > > > > > > *Da:* Karl Wright >

R: Job stuck without message

2018-11-06 Thread Bisonti Mario
My version is 2.11 Da: Karl Wright Inviato: martedì 6 novembre 2018 13:07 A: user@manifoldcf.apache.org Oggetto: Re: Job stuck without message Thanks. What version of ManifoldCF are you using? We fixed a problem a while back having to do with documents that (because of error processing)

Re: Job stuck without message

2018-10-30 Thread Karl Wright
What I am interested in now is the Document Status report for any one of the documents that is 'stuck'. The next crawl time value is the critical field. Can you include an example? Karl On Tue, Oct 30, 2018, 12:36 PM Bisonti Mario wrote: > Thanks a lot, Karl. > > > > It happens that the job

R: Job stuck without message

2018-10-30 Thread Bisonti Mario
Thanks a lot, Karl. It happens that the job starts, it works and index for an hour and after it frezzes, I haven’t error or waiting status in Document Queue o Simple History, I have only “OK” status so, I haven’t failures. I am not able to see other log errors other from the manifoldcf.log

Re: Job stuck without message

2018-10-30 Thread Karl Wright
Hi Mario, Please look at the Queue Status report to determine what is waiting and why it is waiting. You can also look at the Simple History to see what has been happening. If you are getting 100% failures in fetching documents then you may need to address this because your infrastructure is

R: Job stuck without message

2018-10-30 Thread Bisonti Mario
Thanks a lot Karl Yes, I see many docs in the docs queue but they are inactive. Infact i see that no more docs are indexed in Solr and I see that job is with the same number of docs Active (35012) [cid:image002.jpg@01D47065.DEFF7B40] Da: Karl Wright Inviato: martedì 30 ottobre 2018 13:59

Re: Job stuck without message

2018-10-30 Thread Karl Wright
The reason the job is "stuck" is because: ' JCIFS: Possibly transient exception detected on attempt 1 while getting share security: All pipe instances are busy.' This means that ManifoldCF will retry this document for a while before it gives up on it. It appears to be stuck but it is not. You

Re: web connector : links extraction issues

2018-10-30 Thread Olivier Tavard
Hi Karl, Thanks for your answer. I kept looking into this and I found what was the problem. The Javascript code into the tags contained the character '<'. If so the links extraction does not work with the web connector. To reproduce it, I created this page hosted in local Apache

Job stuck without message

2018-10-30 Thread Bisonti Mario
Hallo. I started a job that works for some minutes, and after it stucks. In the manifoldcf.log I see: at org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:627) [mcf-jcifs-connector.jar:?] at

Re: web connector : links extraction issues

2018-10-29 Thread Karl Wright
Hi Olivier, Javascript inclusion in the Web Connector is not evaluated. In fact, no Javascript is executed at all. Therefore it should not matter what is included via javascript. Thanks, Karl On Mon, Oct 29, 2018 at 1:39 PM Olivier Tavard < olivier.tav...@francelabs.com> wrote: > Hi, > >

web connector : links extraction issues

2018-10-29 Thread Olivier Tavard
Hi, Regarding the web connector, I noticed that for specific websites, some Javascript code can prevent the web connector to fetch correctly all the links present on the page. Specifically, for websites that contain a deprecated version of New relic web agent as

Re: Contribution help for the Confluence connector patch

2018-10-24 Thread 白井 隆/ Shirai Takashi
Hi, Karl. Karl Wright wrotes: >I've created CONNECTORS-1551, and attached the patch. I confirm the SVN and Git repositories to find it was attached. Thank you. >Unfortunately there seems to be some encoding issues with >common_ja_JP.properties; can you send that one file via email as an

Re: Contribution help for the Confluence connector patch

2018-10-24 Thread Karl Wright
Never mind, I was able to get it fixed. Karl On Wed, Oct 24, 2018 at 10:19 AM Karl Wright wrote: > I've created CONNECTORS-1551, and attached the patch. > > Unfortunately there seems to be some encoding issues with > common_ja_JP.properties; can you send that one file via email as an >

Re: Contribution help for the Confluence connector patch

2018-10-24 Thread Karl Wright
I've created CONNECTORS-1551, and attached the patch. Unfortunately there seems to be some encoding issues with common_ja_JP.properties; can you send that one file via email as an attachment? Thanks! Karl On Tue, Oct 23, 2018 at 8:54 PM 白井 隆/ Shirai Takashi wrote: > Hi, there. > > I've just

Re: error when running jobs

2018-10-24 Thread Karl Wright
Hi Gustavo, There's a great deal of noise in this log that ManifoldCF has nothing to do with. Did you turn on logging for the JDBC driver? If so, can you turn it off? I *do* see signs that the forensics ran: October 23rd 2018, 18:22:41.662 message:DEBUG 2018-10-23T18:22:57,492 (Worker thread

Contribution help for the Confluence connector patch

2018-10-23 Thread 白井 隆/ Shirai Takashi
Hi, there. I've just made the patch to extend mcf-confluence-connector. The official site says that I can create a JIRA ticket for improvements. But I cannot access the JIRA via the firewall in our office. Can someone create a ticket instead of me? The patch is attached to this mail. [Extension]

Re: error when running jobs

2018-10-23 Thread Gustavo Beneitez
Thanks Karl, we are going to make new crawls with that property enable and will get back to you. El mar., 23 oct. 2018 a las 10:09, Karl Wright () escribió: > Add this to your properties.xml: > > > > This keeps stuff in memory and dumps a lot to the log as well. > > I'm afraid that groveling

Re: error when running jobs

2018-10-23 Thread Karl Wright
Add this to your properties.xml: This keeps stuff in memory and dumps a lot to the log as well. I'm afraid that groveling through the logs after a failure to confirm it's the same kind of thing we've seen before takes many hours. I can only promise to do this when I have the time. Karl On

Re: error when running jobs

2018-10-23 Thread Gustavo Beneitez
I Karl, MySQL. As per config variables: version 5.7.23-log version comment MySQL Community Server (GPL) which file should I enable logging/debugging? Thanks! El lun., 22 oct. 2018 a las 21:36, Karl Wright () escribió: > Hi Gustavo, > > I have seen this error before; it is apparently due to

Re: error when running jobs

2018-10-22 Thread Karl Wright
Hi Gustavo, I have seen this error before; it is apparently due to the database failing to properly gate transactions and behave according to the concurrency model selected for a transaction. We have a debugging setting you can configure which logs the needed information so that forensics get

error when running jobs

2018-10-22 Thread Gustavo Beneitez
Hi Karl, lately we are facing job status problems. After a few minutes the job ends suddenly, always the same way: Error: Unexpected jobqueue status - record id 1539339908660, expecting active status, saw 2 Error: Unexpected jobqueue status - record id 1539291541171, expecting active status, saw

Re: Logging and Document filter transformation connector

2018-10-17 Thread Olivier Tavard
Hi Karl, I opened a ticket on JIRA, it will be simpler to discuss on it : https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1547 Thanks, Olivier > Le 11 oct. 2018 à 19:25, Karl Wright a écrit : > > The fact that the history is different for the two suggests that the >

R: Add field to Output Solr

2018-10-16 Thread Bisonti Mario
I set in the job the connection: 1. Repository: WinShare 2. Transformation: Allowed Documents 3. Transformation: TikaExternal 4. Transformation: MetadataExtractor 5. Output: SolrShare so, in allowed contents I put the allowed mimetypes and extension in the field mapping I added

Add field to Output Solr

2018-10-16 Thread Bisonti Mario
Hallo I am using Tika server as processor of file pdf, doc, etc I configured: [cid:image003.png@01D4653C.61DD4040] In my solr output connection, so, when I index the documents I see the field: id last_modified resourcename content_type allow_token_document deny_token_document allow_token_share

R: How to set Tika with ManifoldCF and Solr

2018-10-12 Thread Bisonti Mario
Hallo. I downloaded and compiled ManifoldCF 2.11 from scratch, I used Tika internal but I obtain the same problem. [cid:image002.jpg@01D4621B.29A03030] Da: Karl Wright Inviato: giovedì 11 ottobre 2018 19:29 A: user@manifoldcf.apache.org Oggetto: Re: How to set Tika with ManifoldCF and Solr I

Re: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Karl Wright
I cannot reproduce your problem. Perhaps you can download a new instance and configure it from scratch using the embedded tika? If that works it should be possible to figure out what the difference is. Karl On Thu, Oct 11, 2018, 12:23 PM Bisonti Mario wrote: > I tried to update Solr, Tika

Re: Logging and Document filter transformation connector

2018-10-11 Thread Karl Wright
Hi Olivier, The Repository connector has no knowledge of what the pipeline looks like. It simply asks the framework whether the mime type, length, etc. is acceptable to the downstream pipeline. It's the connector's responsibility to note the reason for the rejection in the simple history, but it

Logging and Document filter transformation connector

2018-10-11 Thread Olivier Tavard
Hello, I have a question regarding the Document filter transformation connector and the log about it. I would like to have a look of all the documents excluded by the rules configured in the Document filter transformation connector by looking at the Simple history or by the MCF log but it is

Re: Debug logging properties location

2018-10-11 Thread Olivier Tavard
Hi Karl, OK thanks for the answer. So it is its normal location, I just wanted to be sure. In my opinion, a suggestion of improvement would be to complete the section about the properties in the how to build and deploy page to add an additional column on the tab to distinguish if the

Re: Debug logging properties location

2018-10-11 Thread Karl Wright
Hi Olivier, it sounds like you are using Zookeeper. Certain properties are global and are imported into Zookeeper. Other properties are local and found in each local properties.xml file. The debug properties for logging is, I believe, global. Karl On Thu, Oct 11, 2018 at 8:39 AM Olivier

Re: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Karl Wright
When you don't check the "use extracting update handler" field is UNCHECKED, the mime types you list are IGNORED. Only "text" mime types are accepted by the Solr connection in that case. But that is exactly what the Tika extractor sends along, and many other people do this, and I can make it

Debug logging properties location

2018-10-11 Thread Olivier Tavard
Hello, I have a question regarding the debug logging properties and their location in the multi process model. If I put the properties in the properties.xml file (as org.apache.manifoldcf.connectors for example), it seems that the properties are not taken into account. In the other hand, if I

R: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Bisonti Mario
This is my solr output connection: [cid:image002.jpg@01D4616F.EA54D800] I tried to put content_type as “Mime type field name:” but the result is always the same Could be that, unchecking the flag, ManifoldCF doesn’t use the mime types specified? I am using a snapshot version of ManifoldCF of

Re: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Karl Wright
I confirmed that both the Tika Service transformer and the Tika transformer check the same exact mime type: >> @Override public boolean checkMimeTypeIndexable(VersionContext pipelineDescription, String mimeType, IOutputCheckActivity checkActivity) throws ManifoldCFException,

R: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Bisonti Mario
Now the document isn’t ingested by solr because I obtain: Solr connector rejected document due to mime type restrictions: (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet) But the mime type is on the tab [cid:image001.jpg@01D4616C.27CBFFF0] And the settings worked well

R: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Bisonti Mario
My mistake… As you wrote me I had to uncheck “use extracting update handler” Now I have to understand the field mentioned in schema etc. Da: Bisonti Mario Inviato: giovedì 11 ottobre 2018 13:45 A: user@manifoldcf.apache.org Oggetto: R: How to set Tika with ManifoldCF and Solr I see the job

R: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Bisonti Mario
I see the job processed but without the document inside. 10-11-2018 13:32:25.649 job end 1539153700219(G_IT_Area_condivisa_Mario_XLSM) 0 1 10-11-2018 13:32:14.211 job start 1539153700219(G_IT_Area_condivisa_Mario_XLSM) 0 1 Have I to uncheck, on my Solr output connection the “Use the

Re: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Karl Wright
Please have a look at your "Simple History" report to see why the documents aren't getting indexed. Thanks, Karl On Thu, Oct 11, 2018 at 7:10 AM Bisonti Mario wrote: > Thanks Karl. > > I tried, but it doesn’t index documents. > > It seemes that it doesn’t see them? > > > > Perhaps is the

R: How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Bisonti Mario
Thanks Karl. I tried, but it doesn’t index documents. It seemes that it doesn’t see them? Perhaps is the “Ignore Tika exception that I don’t know where to set in ManifoldCF the problem? Da: Karl Wright Inviato: giovedì 11 ottobre 2018 12:24 A: user@manifoldcf.apache.org Oggetto: Re: How

How to set Tika with ManifoldCF and Solr

2018-10-11 Thread Bisonti Mario
Hallo. I would like to use Tika server started from command line into ManifoldCF so, ManifoldCF as Trasformation connector, process with Tika and index to the output connecto Solr. I started Tika server: java -jar /opt/tika/tika-server-1.19.1.jar After, I created a transformation connection

Re: Option to skip documents

2018-10-10 Thread Romaric Pighetti
Hi Karl, Thanks a lot for the integration. I will create a ticket next time to discuss such subjects. Regards, Romaric Le 09/10/2018 à 23:04, Karl Wright a écrit : r1843343 adds this condition to the list of caught conditions. In the future it would be better to create a ticket. Karl On

Re: Option to skip documents

2018-10-09 Thread Karl Wright
r1843343 adds this condition to the list of caught conditions. In the future it would be better to create a ticket. Karl On Tue, Oct 9, 2018 at 3:06 PM Karl Wright wrote: > I can make it retry then skip if it doesn't succeed in a while. > > Karl > > > On Tue, Oct 9, 2018 at 11:38 AM Romaric

Re: Option to skip documents

2018-10-09 Thread Karl Wright
I can make it retry then skip if it doesn't succeed in a while. Karl On Tue, Oct 9, 2018 at 11:38 AM Romaric Pighetti < romaric.pighe...@francelabs.com> wrote: > Hi Karl, > > You're right it might be better to reschedule the file for later in this > case. > > In my case, I was able to crawl

Re: Option to skip documents

2018-10-09 Thread Romaric Pighetti
Hi Karl, You're right it might be better to reschedule the file for later in this case. In my case, I was able to crawl the files the first time I tried. When launching another crawl a few days later, the same files were locked. I tried to crawl them several times during the day but never

Re: Option to skip documents

2018-10-09 Thread Karl Wright
Hi Romaric, If the error is transient, then the right thing to do is *not* to skip the file, but to retry later. What currently happens? Karl On Tue, Oct 9, 2018 at 10:05 AM Romaric Pighetti < romaric.pighe...@francelabs.com> wrote: > Hi Karl, > Along the lines of this ticket >

Option to skip documents

2018-10-09 Thread Romaric Pighetti
Hi Karl, Along the lines of this ticket https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1455?filter=allissues submitted by Julien, I recently stumbled across another smb exception thrown when dealing with some kind of locked files. The error was SmbException tossed

Re: Query to get the number of documents processed from PostgreSQL

2018-10-09 Thread Romaric Pighetti
Thanks Karl for the quick answer. I guess to get only the documents completed while the job is running i will have to fiddle around with the status, for which potential values are expressed in the JobQueue class. I noticed that sometimes (mainly when pausing and restarting a job), selecting

Re: Sharepoint connector help : site didn't exist or external

2018-10-08 Thread Karl Wright
Excellent news! Thanks for the update. Karl On Mon, Oct 8, 2018 at 1:54 PM Susheel Kumar wrote: > Thank you so much Karl. I was able to crawl the site and index them. > > On Wed, Oct 3, 2018 at 3:31 PM Karl Wright wrote: > >> Please read the user documentation for the sharepoint connector

Re: Sharepoint connector help : site didn't exist or external

2018-10-08 Thread Susheel Kumar
Thank you so much Karl. I was able to crawl the site and index them. On Wed, Oct 3, 2018 at 3:31 PM Karl Wright wrote: > Please read the user documentation for the sharepoint connector very > carefully. You will need a site rule AND a path rule. > > Thanks, > Karl > > > On Wed, Oct 3, 2018 at

Re: Query to get the number of documents processed from PostgreSQL

2018-10-08 Thread Karl Wright
If you want all the documents for a specific job, the query is: select count(*) from jobqueue where jobid= Karl On Mon, Oct 8, 2018 at 4:23 AM Romaric Pighetti < romaric.pighe...@francelabs.com> wrote: > Hi Karl, > > I am currently facing the need of getting the number of documents >

Query to get the number of documents processed from PostgreSQL

2018-10-08 Thread Romaric Pighetti
Hi Karl, I am currently facing the need of getting the number of documents processed by MCF in a specific job. This number is getting bigger than the limit set for the web interface and i don't want to increase this limit because of the stress it will put on the database (openning the tab in

How to exclude invalid characters in the name of documents being imported?

2018-10-04 Thread douglascrp
Hello. I am executing a migration process using Google Drive as the source and Alfresco, with the CMIS Output Connector, as the destination. At the source, there are documents and folders containing some characters like "/", ":", ";" and even the " itself. Besides that, thare are files and

Re: Sharepoint connector help : site didn't exist or external

2018-10-03 Thread Karl Wright
Please read the user documentation for the sharepoint connector very carefully. You will need a site rule AND a path rule. Thanks, Karl On Wed, Oct 3, 2018 at 3:29 PM Susheel Kumar wrote: > Hi Karl, > > Please ignore my previous message. I was just able to crawl it but for > the files which

Re: Sharepoint connector help : site didn't exist or external

2018-10-03 Thread Susheel Kumar
Hi Karl, Please ignore my previous message. I was just able to crawl it but for the files which i wanted to get extracted it is showing me below message. I already had Path configured to have /content included file and library type included but it is still not including them. How to correctly

Re: Sharepoint connector help : site didn't exist or external

2018-10-03 Thread Karl Wright
You may not have installed all the services you need on the sharepoint side. You need them all to exist in the _vti_bin directory. If they don't the connector won't work. The Sharepoint connector documentation describes what needs to be there. If what you need isn't present, you will need to

Re: Sharepoint connector help : site didn't exist or external

2018-10-03 Thread Susheel Kumar
Thank you so much, Karl and taking me up to here. I am able to see connector loggings now. The next I am struggling with, when I run a job to use sharepoint repository and output to local file system, I see sometime 401 Unauthorized for usergroup.asmx OR 404 for lists.asmx or 404 for

RE: Status and Job Management

2018-10-01 Thread Damien Collis
Karl, Thanks for the information. That was indeed the issue. I have setup daily vacuuming and table analysis. Regards Damien Collis Team Leader – Systems Integration Technology & Innovation Division, Link Group Level 3, 1A Homebush Bay Drive, Rhodes NSW 2138 T+61 2 9375 7909

Re: Google Drive connector help

2018-10-01 Thread douglascrp
Hello Karl. Team Drives are indeed different things, and there are some dependencies and code level changes required to make it work. I was able to "fix" this, and I will be working with Piergiorgio Lucidi in some (I hope so) usefull changes for the project. On 2018/09/21 23:51:09,

Re: Sharepoint connector help : site didn't exist or external

2018-10-01 Thread Karl Wright
You're putting this in the wrong place. Leave logging.xml alone. Instead, add this into your properties.xml file: Karl On Mon, Oct 1, 2018 at 2:31 PM Susheel Kumar wrote: > Hi Karl, > > Trying the first put the logging in place. Below I have in logging.xml > (manifold 2.10) which is

Re: Sharepoint connector help : site didn't exist or external

2018-10-01 Thread Susheel Kumar
Hi Karl, Trying the first put the logging in place. Below I have in logging.xml (manifold 2.10) which is outputting the wire debugging messages but I am not able to see connector debug messages. I tried to add below Logger under Loggers but it's not writing any connectors debug messages. What

Re: Sharepoint connector help : site didn't exist or external

2018-09-28 Thread Karl Wright
Please review the documentation on how to set up your paths for indexing; your specification is likely incorrect. The paths use matching rules to determine what to traverse. By the way, you've turned on wire debugging for httpcomponents/httpclient, but you probably meant to turn on connector

Re: Sharepoint connector help : site didn't exist or external

2018-09-28 Thread Susheel Kumar
Hello Karl, Hoping you can help me out moving further to index documents into Solr from Sharepoint. I am able to have Sharepoint repository connection working and then created Solr output connection which is working as well. Now when I am running the job, i see it run for 10-30 seconds shows

Re: Status and Job Management

2018-09-27 Thread Karl Wright
Hi Damien, I basically wanted to see if you were using Postgresql. Postgresql is not very efficient in counting records; it's one of Postgresql's flaws as a database. So in ManifoldCF, we limit the number of records counted on the status page, displaying ">" where would be the limit.

RE: Status and Job Management

2018-09-27 Thread Damien Collis
Karl, I have a single Windows 2012 standalone environment hosting ManifoldCf, Solr and Postgress Db: 4 x Xeon E5-2660 cores 16GB Physical Memory (2GB Tomcat/ManifoldCf, 12GB Solr) I have attached my D:\ProgramFiles\PostgreSQL\9.3\data\postgresql.conf file. For convenience here are the

Re: Status and Job Management

2018-09-27 Thread Karl Wright
Hi Damien, Can you describe your database setup? Karl On Thu, Sep 27, 2018 at 1:50 AM Damien Collis wrote: > All, > > > > I am currently having trouble loading the “Status and Job Management” web > page. I have set up a new Job but am unable to start it. > > > > Sometimes the “Status and Job

Status and Job Management

2018-09-26 Thread Damien Collis
All, I am currently having trouble loading the "Status and Job Management" web page. I have set up a new Job but am unable to start it. Sometimes the "Status and Job Management" will finally respond (> 30 mins), but by then I have moved on, and when I finally check back and see if it has

Re: Solr examples with long metadata needed

2018-09-26 Thread Karl Wright
Awesome, thanks! Karl On Wed, Sep 26, 2018 at 12:58 PM Julien Massiera < julien.massi...@francelabs.com> wrote: > Hi Karl, > > sorry for the delay, you will find below the solr log that you ask for. > You did not ask for it but I will also make a reply on your Solr ticket > with this log and I

Re: Solr examples with long metadata needed

2018-09-26 Thread Julien Massiera
Hi Karl, sorry for the delay, you will find below the solr log that you ask for. You did not ask for it but I will also make a reply on your Solr ticket with this log and I will attach as well the original file ! INFO 2018-09-26T16:44:40,795 (qtp952486988-14) -

Solr examples with long metadata needed

2018-09-26 Thread Karl Wright
Hi ManifoldCF Community, I need one or two concrete examples of solr [INFO] log messages that include very long metadata (>8192). This is apparently critical for getting the SolrJ team to be able to understand ManifoldCF's usage of solr. If you have such examples around, please be sure that the

Re: Scheduler not working as we expected

2018-09-25 Thread Karl Wright
It's obviously a configuration problem. Are you using the extract update handler? If not, do you have tika in the pipeline? Karl On Tue, Sep 25, 2018 at 4:24 AM Ronny Heylen wrote: > Hi, > We have been using SOLR for a few years and now the server has been > transferred to the VM's in out

Re: Scheduler not working as we expected

2018-09-25 Thread Ronny Heylen
Hi, We have been using SOLR for a few years and now the server has been transferred to the VM's in out HQ ( and reinstalled ), We ara having the the following issue now : orcing SOLR indexation by curl works, as we can see from: *curl "*

Re: Google Drive connector help

2018-09-21 Thread douglascrp
I tried creating a new folder to be used as the source for the migration, using the same user account, but outside of the Team Drive (which is no more than a shared folder for the company's employees), and then the migration worked normally. On 2018/09/21 12:54:02, Karl Wright wrote: > I

Re: Google Drive connector help

2018-09-21 Thread Karl Wright
I have only ever tried this with a personal account. I have no idea why a business account would differ. Karl On Fri, Sep 21, 2018 at 8:16 AM douglas...@gmail.com wrote: > I forgot to mention that I am using the version 2.10. > > On 2018/09/21 12:15:21, douglas...@gmail.com > wrote: > >

Re: Google Drive connector help

2018-09-21 Thread douglascrp
I forgot to mention that I am using the version 2.10. On 2018/09/21 12:15:21, douglas...@gmail.com wrote: > Hello guys. > > Is anyone using the Google Drive connector to retrieve content to be ingested? > > When I configure the Google Drive connector to use my personal Google account > as

Google Drive connector help

2018-09-21 Thread douglascrp
Hello guys. Is anyone using the Google Drive connector to retrieve content to be ingested? When I configure the Google Drive connector to use my personal Google account as the source, it works well, when I use a query like '' in parents But the same scenario does not work for a Google Business

Re: Sharepoint connector help : site didn't exist or external

2018-09-19 Thread Susheel Kumar
No, Karl. I even didn't realize we would need it on the Sharepoint server. Thanks so much again for helping me moving forward. I have asked Site admin to get that installed and will update here with the progress. On Wed, Sep 19, 2018 at 12:44 PM Karl Wright wrote: > Have you installed the

Re: Sharepoint connector help : site didn't exist or external

2018-09-19 Thread Karl Wright
Have you installed the ManifoldCF Sharepoint 2013 plugin on the Sharepoint server? Karl On Wed, Sep 19, 2018 at 12:32 PM Susheel Kumar wrote: > Thank you so much, Karl. It fixed that but the error message in manifold > UI remain same and from the logs I can see it got authenticated when >

Re: Sharepoint connector help : site didn't exist or external

2018-09-19 Thread Susheel Kumar
Thank you so much, Karl. It fixed that but the error message in manifold UI remain same and from the logs I can see it got authenticated when accessing lists.asmx but later when accessing MCPermission.asmx and calling GetPermissionCollection it is having 404 not found error. Any suggestion?

Re: Sharepoint connector help : site didn't exist or external

2018-09-19 Thread Karl Wright
Hi Susheel, The problem is likely your site path. The actual path looks like it should be just "/ES", not "/ES/_layouts/15". Karl On Wed, Sep 19, 2018 at 9:18 AM Susheel Kumar wrote: > Hello, > > I am new this mailing list and just started using manifold to able to > index data from our

Sharepoint connector help : site didn't exist or external

2018-09-19 Thread Susheel Kumar
Hello, I am new this mailing list and just started using manifold to able to index data from our Sharepoint 2013 and index into Solr. I have a site which i am able to access throw browser http://teamsites.corp.com/sites/ES/_layouts/15/viewlsts.aspx and when I setup Repository connection using

Speakers needed for Apache DC Roadshow

2018-09-11 Thread Rich Bowen
We need your help to make the Apache Washington DC Roadshow on Dec 4th a success. What do we need most? Speakers! We're bringing a unique DC flavor to this event by mixing Open Source Software with talks about Apache projects as well as OSS CyberSecurity, OSS in Government and and OSS Career

<    2   3   4   5   6   7   8   9   10   11   >