Re: ManifoldCF two server setup

2018-03-23 Thread Shashank Raj
Hi Karl, We followed your documentation and made a multi node setup both with file based synchronisation and zoo keeper based one. With zk based setup, we found that if we run two jobs in two seperate tomcat processes, only one job will pickup and post records. The other job will

Re: Modify job to add excludes files and directory

2018-03-13 Thread Karl Wright
;>> >>>> Hi Maxence, >>>> >>>> If you EXPORT a job that works in JSON, and then IMPORT the exported >>>> JSON into a new job, is that job broken? >>>> >>>> Karl >>>> >>>> >>>> On T

Re: Modify job to add excludes files and directory

2018-03-13 Thread Karl Wright
gt;>>> >>>> >>>> >>>> 1. Create job manually (1_job_manually.json | 1_job_manually.png) >>>> >>>> 2. Create job with script and modify the order manually >>>> (2_job_mixte.json | 2_job_mixte.png) >>>>

Re: Modify job to add excludes files and directory

2018-03-13 Thread Karl Wright
>>> >>> I do not see the difference. >>> >>> >>> >>> So : 1 and 2 work good, with the good order, but 3 have included files >>> and directories in first. >>> >>> >>> >>> Thanks, >>> >>> Maxence

Re: Modify job to add excludes files and directory

2018-03-13 Thread Karl Wright
;> >> >> Thanks, >> >> Maxence >> >> >> >> *De :* Karl Wright [mailto:daddy...@gmail.com] >> *Envoyé :* lundi 12 mars 2018 21:29 >> *À :* user@manifoldcf.apache.org >> *Cc :* Fabien Harrang <fharr...@citya.com>; REUILLON Domi

Re: Modify job to add excludes files and directory

2018-03-13 Thread Karl Wright
t; *À :* user@manifoldcf.apache.org > *Cc :* Fabien Harrang <fharr...@citya.com>; REUILLON Dominique < > dreuil...@citya.com> > > *Objet :* Re: Modify job to add excludes files and directory > > > > Here is an idea. Define your job in the ui and use the API to fetch t

Re: ManifoldCF two server setup

2018-03-13 Thread Karl Wright
Hi Raj, First, I'd start by running the multiprocess example on ONE machine with multiple processes. That's what the multiprocess-file-example demonstrates, although it can be easily generalized to multiple machines, PROVIDED there is a shared file system available, like NFS. If not, you must

Re: ManifoldCF two server setup

2018-03-13 Thread Shashank Raj
Hi Karl, In the documentation for "Simplified multiprocess model using file based synchronisation", it is indicated that the war files should be taken from "web" folder of multiprocess-file-example. But there is no such folder or file. Can we get some inputs on where do we need to take

Re: Modify job to add excludes files and directory

2018-03-12 Thread Karl Wright
>> >> For my problem, the JSON format is not the problem. It work. I join the >> json, generated with my python script and my database. *(srvics33.json)* >> >> >> >> If I go on the interface after PUT the configuration, they included files >> are in first an

Re: Modify job to add excludes files and directory

2018-03-12 Thread Karl Wright
:* vendredi 9 mars 2018 12:53 > *À :* user@manifoldcf.apache.org > *Cc :* Fabien Harrang <fharr...@citya.com>; REUILLON Dominique < > dreuil...@citya.com> > *Objet :* Re: Modify job to add excludes files and directory > > > > Hi Maxence, > > > > I

RE: Modify job to add excludes files and directory

2018-03-12 Thread msaunier
citya.com> Objet : Re: Modify job to add excludes files and directory Hi Maxence, In the middle of job run, if you change the specification of what documents are included and excluded, the implementation of the connector determines how it will behave. There is no guarantee th

Re: Modify job to add excludes files and directory

2018-03-09 Thread Karl Wright
Hi Maxence, In the middle of job run, if you change the specification of what documents are included and excluded, the implementation of the connector determines how it will behave. There is no guarantee that documents that are excluded will be removed, for example if the connector filters

RE: 3 document blocked on the jobqueue

2018-03-08 Thread msaunier
Objet : Re: 3 document blocked on the jobqueue The only place that might give a clue is the Simple History, as I said, and/or the logs. When was the document processed? Was it ever processed? Did somebody hard-kill the agents process at some point? Those questions are the only clues.

Re: Manifold RSS connector gets "stuck" after a few docs are processed

2018-03-08 Thread Mike Hugo
Thanks for the ideas and the sanity check! Based on your feedback we've been able to narrow down the problem to something in the custom output connector. Seems we need to join the thread at the end. On Thu, Mar 8, 2018 at 9:37 AM, Karl Wright wrote: > As a sanity check, I

Re: Manifold RSS connector gets "stuck" after a few docs are processed

2018-03-08 Thread Karl Wright
As a sanity check, I ran the postgresql RSS connector IT test on trunk and it passed: >> run-IT-postgresql: [junit] Testsuite: org.apache.manifoldcf.crawler.connectors.rss.tests.RSSSimpleCrawlPostgresqlIT [junit] Configuration file successfully read [junit] [main] INFO

Re: Manifold RSS connector gets "stuck" after a few docs are processed

2018-03-08 Thread Karl Wright
I've reviewed all changes to the RSS connector and to the framework over the last year, and none of them could reasonably have been expected to have any kind of effect like this. The only things changed were the redirect strategy and updating to the latest Postgresql JDBC driver. If the problem

Re: Manifold RSS connector gets "stuck" after a few docs are processed

2018-03-08 Thread Karl Wright
Hi Mike, You are the third person this morning that has reported this in conjunction with Postgresql. It is possible that some behavior we count on broke in the latest postgresql release. Can you tell me what version you are using? Do you see the same behavior when you run with the built-in

Re: 3 document blocked on the jobqueue

2018-03-08 Thread Karl Wright
:* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* jeudi 8 mars 2018 15:06 > *À :* user@manifoldcf.apache.org > *Cc :* Fabien Harrang <fharr...@citya.com>; dreuil...@citya.com > *Objet :* Re: 3 document blocked on the jobqueue > > > > They have a "null" do

RE: 3 document blocked on the jobqueue

2018-03-08 Thread msaunier
: Fabien Harrang <fharr...@citya.com>; dreuil...@citya.com Objet : Re: 3 document blocked on the jobqueue They have a "null" document priority: public static final Double nullDocPriority = new Double(noDocPriorityValue + 1.0); That is why they are not being queued. The

Re: 3 document blocked on the jobqueue

2018-03-08 Thread Karl Wright
They have a "null" document priority: public static final Double nullDocPriority = new Double(noDocPriorityValue + 1.0); That is why they are not being queued. The question is how they wound up in that state. Whenever documents are queued, they are given a document priority. The Simple

Re: Apache EU Roadshow CFP Closing Soon (23 February)

2018-03-05 Thread Piergiorgio Lucidi
Hi Furkan, thank you for your availability, let's try to understand if we found more people. In the meanwhile I can try to ask if a space for 3 or 4 persons is available. I'll let you know soon. Cheers, PJ 2018-02-16 15:20 GMT+01:00 Furkan KAMACI : > Hi Piergiorgio, >

Re: Job Queue Status on the Database

2018-02-26 Thread Karl Wright
>From JobQueue.java: >> static { statusMap.put("P",new Integer(STATUS_PENDING)); statusMap.put("A",new Integer(STATUS_ACTIVE)); statusMap.put("C",new Integer(STATUS_COMPLETE)); statusMap.put("U",new Integer(STATUS_UNCHANGED)); statusMap.put("G",new

Re: Generic Output Connection

2018-02-22 Thread Nikita Ahuja
Hi Karl, Whenever I run the generic API repository connector the job does not start, [image: Inline image 2] And in the log file I have getting this exception for parsing: [image: Inline image 1] Please suggest a solution for this. Thanks and Regards, Nikita On Mon, Feb 19, 2018 at

Re: ManifoldCF two server setup

2018-02-20 Thread Karl Wright
Hi Shashank, You can have multiple servers running against the same database, BUT if you do so, they must be individually configured to have their own IDs, and they must share locks and by extension, must use the same zookeeper. See multiprocess-zk-example in the binary distribution. Thanks,

Re: Generic Output Connection

2018-02-19 Thread Karl Wright
Hi Nikita, If you want to develop connectors, I recommend reading the book "ManifoldCF In Action". It's online, free: https://github.com/DaddyWri/manifoldcfinaction/tree/master/pdfs Karl On Mon, Feb 19, 2018 at 3:39 AM, Nikita Ahuja wrote: > Thanks Karl, > > > But what

Re: Generic Output Connection

2018-02-19 Thread Nikita Ahuja
Thanks Karl, But what should be steps then,to be followed to work on API calls and fetch the data using ManifoldCF. Is there any need of creating a new custom connector? If so, then please share steps or the flow which is followed in the creating the connectors. Thanks and Regards, Nikita

Re: Apache EU Roadshow CFP Closing Soon (23 February)

2018-02-16 Thread Furkan KAMACI
Hi Piergiorgio, I would like to be there to arrange a ManifoldCF meeting! I want to join you for that proposal. Kind Regards, Furkan KAMACI On Thu, Feb 15, 2018 at 10:32 AM, Piergiorgio Lucidi wrote: > Hi, > > we have a great opportunity to arrange a ManifoldCF

Re: Generic Output Connection

2018-02-15 Thread Karl Wright
Hi Nikita, I do not understand your question. The Generic Connector was written by a committer who has since become unavailable, and nobody here knows how it is supposed to work. All that we have is the code and the documentation. Karl On Thu, Feb 15, 2018 at 5:58 AM, Nikita Ahuja

RE: API ManifoldCF modify transformation/output/job

2018-02-12 Thread msaunier
I found the answer a few minutes later. The isnew attribute had to be used to specify the nature of the request. Thanks Cordialement, De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : lundi 12 février 2018 18:17 À : user@manifoldcf.apache.org Objet : Re: API ManifoldCF

Re: API ManifoldCF modify transformation/output/job

2018-02-12 Thread Karl Wright
I am going to need more information than that. Specifically you are going to need to give me a detailed sequence of API calls. Karl On Mon, Feb 12, 2018 at 11:46 AM, msaunier wrote: > Hello Karl, > > > > I create scripts with API to create transformation/output connectors

Re: Getting output in ElasticSearch

2018-02-12 Thread Karl Wright
Hi Nikita, If I recall correctly, you get base64-encoded output from the ES connector when you configure it to use the mapper attachment. You obviously will want the mapper attachment installed if you are going to run the connector in this mode. If you are using the Tika extractor, though, I

Re: ManifoldCF adding our own list of files.

2018-02-08 Thread Shashank Raj
Hi Karl, I had missed out on the format for file system that was given at the end of the document. I hope to add on to the ManifoldCF project towards the end of my POC. Thanks for your continued support. On 09-Feb-2018 12:40 PM, "Karl Wright" wrote: > Here is

Re: ManifoldCF adding our own list of files.

2018-02-08 Thread Karl Wright
Here is the reference page. http://manifoldcf.apache.org/release/release-2.9.1/en_US/programmatic-operation.html Karl On Fri, Feb 9, 2018 at 12:47 AM, Shashank Raj wrote: > Hi Karl, > I have checked the API for ManifoldCF regarding the same and > could

Re: ManifoldCF adding our own list of files.

2018-02-08 Thread Shashank Raj
Hi Karl, I have checked the API for ManifoldCF regarding the same and could not find a way to modify a job. Could you give me one such example url such that we edit the job only to add another repository Path in addition to the existing one. Thanks and regards. On 08-Feb-2018 7:57

Re: ManifoldCF adding our own list of files.

2018-02-08 Thread Karl Wright
Hi, (1) You can use the REST API to programmatically create or modify your job as you see fit. (2) Another user (maybe in your group) reported that bizarre characters are not picked up by Java under Linux, although it works under Windows. This would be an Oracle JDK issue; you will need to log

Re: ManifoldCF heap error.

2018-02-08 Thread Shashank Raj
Hi Karl, Sorry for the late reply but changing the worker threads as well as throttling to 2-3 did help and now I am not getting this issue anymore. On 18-Jan-2018 9:44 PM, "Steph van Schalkwyk" wrote: > Also check if all the files are successfully parsed by Tika. >

Re: How to extract JIRA authorities

2018-02-05 Thread Karl Wright
barney.rub...@aas.com.au"," >>> avatarUrls":{"48x48":"http://jira.apac.linkgroup.corp/ >>> secure/useravatar?ownerId=BarneyRubble=14201","24x24":" >>> http://jira.apac.linkgroup.corp/secure/useravatar?size=smal

Re: How to extract JIRA authorities

2018-02-05 Thread Karl Wright
;> ownerId=BarneyRubble=14201","16x16":"http://jira. >> apac.linkgroup.corp/secure/useravatar?size=xsmall= >> BarneyRubble=14201","32x32":"http://jira.apac. >> linkgroup.corp/secure/useravatar?size=medium= >> BarneyRubble=14201"

Re: How to extract JIRA authorities

2018-02-05 Thread Karl Wright
> ownerId=BarneyRubble=14201"},"displayName":"Barney > Rubble","active":true,"timeZone":"Australia/Sydney"," > locale":"en_AU"},{"self":"http://jira.apac.linkgroup.corp/ > rest/api/2/user?userna

Re: How to extract JIRA authorities

2018-02-04 Thread Karl Wright
;> =ClassificationOfSummary >> >> _fields_priority_id=4 >> >> _id=4 >> >> _thumbnail=http://jira/secure/thumbnail/7 >> 9798/_thumb_79798.png >> >> _thumbnail=http://jira/secure/thumbnail/7 >> 8645/_thumb_78645.png >> >>

Re: How to extract JIRA authorities

2018-02-04 Thread Karl Wright
dd+or+ > update+Testing+details. > > _statusCategory_name=Done > > _author_active=true > > _author_active=true > > _avatarUrls_32x32=http://jira/secure/ > useravatar?size%3Dmedium%26avatarId%3D10200 > > _avatarUrls_24x24=http://jira/secure/ > useravatar?si

RE: How to extract JIRA authorities

2018-02-04 Thread Damien Collis
ubject: Re: How to extract JIRA authorities All looks good; the token qualification should always take place in the output connection in any case. So it looks like all the code is there and seems to be doing reasonable stuff. The only question is whether you've got forced acls conf

Re: How to extract JIRA authorities

2018-02-01 Thread Karl Wright
All looks good; the token qualification should always take place in the output connection in any case. So it looks like all the code is there and seems to be doing reasonable stuff. The only question is whether you've got forced acls configured or not. Karl On Fri, Feb 2, 2018 at 1:18 AM,

Re: How to extract JIRA authorities

2018-02-01 Thread Karl Wright
Hi Damien, The JIRA connector fetches users from JIRA and converts them to acls: >> if (acls == null) { // Get acls from issue List users = getUsers(issueID); aclsToUse = (String[])users.toArray(new String[0]); java.util.Arrays.sort(aclsToUse);

Re: How to extract JIRA authorities

2018-02-01 Thread Karl Wright
Hi Damien, First, let me understand the problem. You say you are seeing no authorization tokens being indexed at all, correct? It sounds like you have the authority side configured properly. You have confirmed that you are getting authority tokens back that you expect, it sounds like. So the

Re: Problem in fetching the access tokens from Active Directory in Elastic Search index for FileSystem Connector

2018-01-25 Thread Nikita Ahuja
Thanks Karl, It is working fine now and Also I am able to get the access tokens in the elastic search index. Thanks a lot for your valuable suggestions. With Regards, Nikita On Thu, Jan 25, 2018 at 4:42 PM, Karl Wright wrote: > The __nosecurity__ token is generated by the

Re: Problem in fetching the access tokens from Active Directory in Elastic Search index for FileSystem Connector

2018-01-25 Thread Karl Wright
The __nosecurity__ token is generated by the Elastic Search Connector. It is produced (correctly) when no acls reach the connector. This is why I asked you about the repository connection, because that connection is not producing any ACLs. If you are crawling files in a file system, you must

Re: Problem in fetching the access tokens from Active Directory in Elastic Search index for FileSystem Connector

2018-01-25 Thread Nikita Ahuja
Hi Karl, I have tried for ElasticSearch version 1.3.1 and 5.6.1 but still getting this " __nosecurity__" token and for Repository connection File Sytem is used and for Transformation "Tika Content Extractor" and Metadata adjuster Connector is used. [image: Inline image 1] Is there any

Re: Problem in fetching the access tokens from Active Directory in Elastic Search index for FileSystem Connector

2018-01-25 Thread Karl Wright
Hi Nikita, You are getting the __nosecurity__ token value transmitted and stored, which means your Elastic Search setup is probably reasonable. Can you give us details about your pipeline? What repository connector is this? Karl On Thu, Jan 25, 2018 at 1:49 AM, Nikita Ahuja

Re: ManifoldCF heap error.

2018-01-18 Thread Steph van Schalkwyk
Also check if all the files are successfully parsed by Tika. *Steph van Schalkwyk* Principal, Remcam Search Engines +1.314.452. <+1+314+452+2896>2896st...@remcam.net http://remcam.net Skype: svanschalkwyk

Re: ManifoldCF heap error.

2018-01-18 Thread Karl Wright
Oh, also the maximum number of Tika connections should be limited to the number of threads to be sure you're not wasting memory on extra Tika instances (which might be expensive). Karl On Thu, Jan 18, 2018 at 10:52 AM, Karl Wright wrote: > Hmm, it might be worth asking

Re: ManifoldCF heap error.

2018-01-18 Thread Shashank Raj
Hi Karl, I changed the number of worker threads to 6 but still the problem persists when I use ManifoldCF's Tika. When going with "null" as output connection, there seems no problem. Also tried with Solr without tika transformation connection. That also works fine. But as soon as I switch to

Re: ManifoldCF heap error.

2018-01-18 Thread Karl Wright
Hi Shashank, ManifoldCF's memory consumption is bounded but scales by the number of worker threads you allow. If you have 100 worker threads and each doc can consume 50mb then you need to have at least 5gb right there for Solr output. Tika is also quite expensive memory-wise so I'd allocate at

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-12 Thread Karl Wright
gt; Solr. It's normal ? > > On 2.9.0, the commit works well. > > > > Regards, > > > > > > *De :* msaunier [mailto:msaun...@citya.com] > *Envoyé :* vendredi 12 janvier 2018 11:52 > > *À :* user@manifoldcf.apache.org > *Objet :* RE: Document connector e

RE: Document connector excluding mime type and size - Tika Parser error

2018-01-11 Thread msaunier
Ok. I'll confirm that tomorrow. De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : jeudi 11 janvier 2018 18:09 À : user@manifoldcf.apache.org Objet : Re: Document connector excluding mime type and size - Tika Parser error No Tika error is good, but have a look at Simple

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-11 Thread Karl Wright
onfiguration but Tika 1.17 : > > > > ·No Tika error > > ·But, no documents send to Solr. I don’t understand why. I > research. > > > > > > > > > > *De :* msaunier [mailto:msaun...@citya.com] > *Envoyé :* jeudi 11 janvier 2018 15

RE: Document connector excluding mime type and size - Tika Parser error

2018-01-11 Thread msaunier
I crawl for the moment. I think, I would have finished in 30 minutes. De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : jeudi 11 janvier 2018 15:05 À : user@manifoldcf.apache.org Objet : Re: Document connector excluding mime type and size - Tika Parser error Did this work

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-11 Thread Karl Wright
ctor-lib-proprietary >> >> >> >> But not have the proposition on the manifold interface. >> >> >> >> Any idea ? >> >> Thanks. >> >> >> >> >> >> *De :* msaunier [mailto:msaun...@citya.com] >&g

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-11 Thread Karl Wright
.xml file > in the dist folder > > ·I have add jcifs.jar in connector-lib-proprietary > > > > But not have the proposition on the manifold interface. > > > > Any idea ? > > Thanks. > > > > > > *De :* msaunier [mailto:msaun...@citya.com]

RE: Document connector excluding mime type and size - Tika Parser error

2018-01-11 Thread msaunier
: user@manifoldcf.apache.org Objet : RE: Document connector excluding mime type and size - Tika Parser error Good ! I configure and test that. I give you a return as soon as the reading is finished. 400k documents. If it works, I test on few million of documents. Thank. De

RE: Document connector excluding mime type and size - Tika Parser error

2018-01-10 Thread msaunier
@manifoldcf.apache.org Objet : Re: Document connector excluding mime type and size - Tika Parser error The build you should be using is the ant build. Do not use the maven build for this purpose. - Check out trunk: svn co https://svn.apache.org/repos/asf/manifoldcf/trunk - Download dependencies

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-10 Thread Karl Wright
ier <msaun...@citya.com> wrote: > >> Test check out and building with POI 3.17 and Tika 1.17? >> >> >> >> It’s possible. >> >> >> >> I finish a project and I test that. >> >> >> >> *De :* Karl Wright [mailto:daddy...

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread Karl Wright
ble. > > > > I finish a project and I test that. > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* mardi 9 janvier 2018 16:57 > > *À :* user@manifoldcf.apache.org > *Objet :* Re: Document connector excluding mime type and size - Tika > Parser err

RE: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread msaunier
Test check out and building with POI 3.17 and Tika 1.17? It’s possible. I finish a project and I test that. De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : mardi 9 janvier 2018 16:57 À : user@manifoldcf.apache.org Objet : Re: Document connector excluding mime type and size

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread Karl Wright
t; >> >> >> >> >> >> >> >> >> *De :* Karl Wright [mailto:daddy...@gmail.com] >> *Envoyé :* mardi 9 janvier 2018 15:54 >> >> *À :* user@manifoldcf.apache.org >> *Objet :* Re: Document connector excluding mime type and size -

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread Karl Wright
> > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* mardi 9 janvier 2018 15:54 > > *À :* user@manifoldcf.apache.org > *Objet :* Re: Document connector excluding mime type and size - Tika > Parser error > > > > As for the Tika issue, we explicitly tested

RE: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread msaunier
the scrawlers at the same time with the API to add « Allowed documents » condition. Thanks. De : msaunier [mailto:msaun...@citya.com] Envoyé : mardi 9 janvier 2018 16:09 À : user@manifoldcf.apache.org Objet : RE: Document connector excluding mime type and size - Tika Parser error File info

RE: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread msaunier
They 2 versions (2.8.1 and 2.9) of ManifoldCF are on 2 differents servers. De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : mardi 9 janvier 2018 15:54 À : user@manifoldcf.apache.org Objet : Re: Document connector excluding mime type and size - Tika Parser error

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread Karl Wright
it again to be sure. >> >> >> >> >> >> *De :* Karl Wright [mailto:daddy...@gmail.com] >> *Envoyé :* mardi 9 janvier 2018 15:26 >> >> *À :* user@manifoldcf.apache.org >> *Objet :* Re: Document connector excluding mime type and size - Tika >>

RE: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread msaunier
Ok. The aim of putting it in the connector was mainly not to have to repeat the operation for the 300 jobs in production. Cordialement, De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : mardi 9 janvier 2018 15:44 À : user@manifoldcf.apache.org Objet : Re: Document

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread Karl Wright
eem to have emptied the index before the last > indexation. (ManifoldCF and Solr) I do it again to be sure. > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* mardi 9 janvier 2018 15:26 > > *À :* user@manifoldcf.apache.org > *Objet :* Re: Do

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread Karl Wright
gt;> >> >> >> >> >> >> *De :* Karl Wright [mailto:daddy...@gmail.com] >> *Envoyé :* mardi 9 janvier 2018 15:12 >> >> *À :* user@manifoldcf.apache.org >> *Objet :* Re: Document connector excluding mime type and size - Tika >

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread Karl Wright
transfert a document with this error, they are privates. Sorry. > > > > If I encounter the error again on a non-private document, I'll come back > to you. > > > > > > > > *De :* Karl Wright [mailto:daddy...@gmail.com] > *Envoyé :* mardi 9 janvier 2018 15:12 > >

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread Karl Wright
ve a 2.8.1 in an other server with same job and same documents. I >> will test on this other server and make you a return. >> >> >> >> Thanks for your help. >> >> >> >> *De :* Karl Wright [mailto:daddy...@gmail.com] >> *Envoyé :* mardi 9 janv

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread Karl Wright
daddy...@gmail.com] > *Envoyé :* mardi 9 janvier 2018 13:15 > *À :* user@manifoldcf.apache.org > *Objet :* Re: Document connector excluding mime type and size - Tika > Parser error > > > > I looked at the history of this. We had to release a patch (2.8.1) that > put various

RE: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread msaunier
: Re: Document connector excluding mime type and size - Tika Parser error I looked at the history of this. We had to release a patch (2.8.1) that put various poi jars at root level in order to work around a Tika problem. That patch may not have been entirely correct in that it looks like

Re: Document connector excluding mime type and size - Tika Parser error

2018-01-09 Thread Karl Wright
I looked at the history of this. We had to release a patch (2.8.1) that put various poi jars at root level in order to work around a Tika problem. That patch may not have been entirely correct in that it looks like it may have blocked access by one of the deeper jars to a higher level. Release

RE: OCR Tika to read PDF, txt and doc docx

2018-01-08 Thread msaunier
Hello, I have unchecked the Extract Update Handler and it work. Thank you. De : Karl Wright [mailto:daddy...@gmail.com] Envoyé : vendredi 5 janvier 2018 18:52 À : user@manifoldcf.apache.org Objet : Re: OCR Tika to read PDF, txt and doc docx The Tika transformer replaces

Re: OCR Tika to read PDF, txt and doc docx

2018-01-05 Thread Karl Wright
> > "xmptpg_npages":"1", > > "access_permission_can_print_degraded":"true", > > "filecreatedon":"2017-12-22T09:37:04.000Z", > > "access_permission_can_modify":"true&quo

RE: OCR Tika to read PDF, txt and doc docx

2018-01-05 Thread msaunier
70Z", "creation_date":"2017-12-22T09:37:03Z", "xmptpg_npages":"1", "access_permission_can_print_degraded":"true", "filecreatedon":"2017-12-22T09:37:04.000Z", "access_permission_can_modify"

Re: OCR Tika to read PDF, txt and doc docx

2018-01-05 Thread Karl Wright
Hi, It's pretty straightforward. EITHER you configure your Solr output connection to use the extracting update handler and Solr Cell (the default), so that Tika is used on the Solr side, OR you configure to use the standard update handler and insert the Tika Extractor as a document transformer

RE: OCR Tika to read PDF, txt and doc docx

2018-01-05 Thread msaunier
Sorry, it’s an error. I need the text content of PDF, txt and doc docx to index in solr. Thanks for your help. De : msaunier [mailto:msaun...@citya.com] Envoyé : vendredi 5 janvier 2018 18:05 À : user@manifoldcf.apache.org Objet : OCR Tika to read PDF, txt and doc docx Hello,

Re: How to initialize an external PostgreSQL DB

2018-01-04 Thread Karl Wright
Well, you are correct in that there is no magic table-creating script. Karl On Thu, Jan 4, 2018 at 11:14 AM, Beelz Ryuzaki wrote: > I specified the user name parementer in my global properties. The problem > is : I don’t have the rights to create a new database on that

Re: How to initialize an external PostgreSQL DB

2018-01-04 Thread Karl Wright
Did you specify a postgresql user name parameter in your global properties? 'When you execute the initialize script, does it create a new database with a new user?' Yes it does but you will need to supply the database superuser and super user password in the global properties in order for this

Re: How to initialize an external PostgreSQL DB

2018-01-04 Thread Beelz Ryuzaki
Hi Karl, I used the setglobalproperties script to deploy them however I get the following error : FATAL: no PostgreSQL user name specified in startup packet. I have checked that my user exists. When you execute the initialize script, does it create a new database with a new user? If so, is there

Re: How to initialize an external PostgreSQL DB

2018-01-04 Thread Karl Wright
Hi Othman, Once you added the global properties, you need to deploy them into zookeeper using the setglobalproperties script. Thanks, Karl On Thu, Jan 4, 2018 at 10:48 AM, Beelz Ryuzaki wrote: > Hi Karl, > > I read the “how to build and deploy” page thoroughly, but my

Re: How to initialize an external PostgreSQL DB

2018-01-04 Thread Karl Wright
Hi Othman, Have you read the "how to build and deploy" page? Here's a reference: https://manifoldcf.apache.org/release/release-2.9/en_US/how-to-build-and-deploy.html Karl On Thu, Jan 4, 2018 at 9:55 AM, Beelz Ryuzaki wrote: > Hello, > > I have in my possession two

How to re-index specific document?

2018-01-03 Thread Najman, Radko
Hello, There is an action “Re-index all associated documents” on Output connection. Is there a chance to re-index only a specific document? I believe it should be possible by updating/deleting document related row in the database but I wasn’t successful in it. I tried to do it by updating

Re: Alfresco webscript connection problem

2018-01-02 Thread aurelien . mazoyer
e and created a webscript > repository connection with the following configuration: > > PROTOCOL: > > http > > HOST NAME: > > host > > PORT: > > 8080 > > CONTEXT: > > /alfresco/service > > STORE PROTOCOL: > > workspac

Re: Alfresco webscript connection problem

2018-01-02 Thread Luis Cabaceira
ection with the following configuration: >>> >>> >>> >>> *Protocol:* >>> >>> http >>> >>> *Host name:* >>> >>> host >>> >>> *Port:* >>> >>> 8080 >>> >>> *Contex

Re: Alfresco webscript connection problem

2018-01-02 Thread aurelien . mazoyer
Store > > USER NAME: > > user > > PASSWORD: > > > > * Saved the config and got the exception... Is there any way to be sure that > my configuration is correct? > > Thank you, > > Aurélien > > DE : Luis Cabaceira [mailto:cabace...@gm

Re: MCF not indexing documents due to mime-type

2017-12-23 Thread Phillip Rhodes
As far as I know, the wonkiness with the data I'm seeing is actually a reflection of an underlying problem with digital images. Apparently some or all of the various date typed fields mandated by EXIF and XMP don't require time-zone information. So apparently you can have an image that

Re: User rights for Sharepoint connector

2017-12-23 Thread Julien Massiera
Hi Karl, No problem, it is what I would have proposed anyway ! Julien Le 23/12/2017 à 16:27, Karl Wright a écrit : Do you mind if I include this in the SharePoint connector documentation? Thanks, Karl On Sat, Dec 23, 2017 at 10:13 AM, Julien Massiera

Re: User rights for Sharepoint connector

2017-12-23 Thread Karl Wright
Do you mind if I include this in the SharePoint connector documentation? Thanks, Karl On Sat, Dec 23, 2017 at 10:13 AM, Julien Massiera < julien.massi...@francelabs.com> wrote: > Hi folks, > > first of all thank you for your answers. I want to notice you that I found > a solution for my

Re: MCF not indexing documents due to mime-type

2017-12-22 Thread Karl Wright
Hi Phil, Are these fields extracted by Tika from your document? Just curious, because if it's in MCF itself we could do something about it. Anyhow, what you want is the metadata adjuster: https://manifoldcf.apache.org/release/release-1.10/en_US/end-user-documentation.html#metadataadjuster

Re: MCF not indexing documents due to mime-type

2017-12-21 Thread Phillip Rhodes
On Thu, Dec 21, 2017 at 8:35 PM, Karl Wright wrote: > Well, there are some differences; "Solr Cell" (as they used to call it) > generates a couple of fields that the standard Tika extractor in MCF won't. > But other than that it should work. By and large I don't think I care

Re: MCF not indexing documents due to mime-type

2017-12-21 Thread Karl Wright
escription: Excluding document because of mime type > >> (application/pdf) > >> (and so on for many other mime types) > >> > >> So... this is *not* what I would expect to happen as I have nothing at > >> all listed in the "excluded mime types"

RE: Issue Extracting Authorities.

2017-12-21 Thread Damien Collis
il.com] Sent: Thursday, 21 December 2017 9:13 PM To: user@manifoldcf.apache.org Subject: Re: Issue Extracting Authorities. Right, we cannot distribute jcifs.jar for licensing reasons. You can also build ManifoldCF yourself from the distribution sources and libs and then run "ant make-deps&q

Re: User rights for Sharepoint connector

2017-12-21 Thread Karl Wright
Hi Julien, I wish one of us was a SharePoint or Microsoft expert, but we're not, and we get asked this question all the time. All I can say is that it works when you use a user with full rights, and it doesn't when you don't. Karl On Thu, Dec 21, 2017 at 9:57 AM, Julien Massiera <

RE: User rights for Sharepoint connector

2017-12-21 Thread Konrad Holl
Hi Julien, for crawling SharePoint you should use the *Default Content Access Account* configured in the SharePoint *Search Service Application* (Central Administration > Manage Service Applications > Search Service Application). This user should have sufficient permissions to access all

<    3   4   5   6   7   8   9   10   11   12   >