Re: Regarding skip limit

2018-05-30 Thread Karl Wright
Hi Vinay, I don't have complete information, but offhand it looks to me like the tar is being extracted more than once because the ingestion fails and is being retried. The retries are happening every 7-8 minutes, which is exactly what one expects for error retries. Please note that the number

Fwd: Regarding skip limit

2018-05-30 Thread VINAY Bengaluru
Hi, While running a job with file-system repository and solr as an output conneciton, with tika transformation in between, we see that a tar.gz file is being extracted again and again without going to Solr ingestion phase. We are seeing the following in the history screen: 05-30-2018

Regarding skip limit

2018-05-30 Thread VINAY Bengaluru
Hi, While running a job with file-system repository and solr as an output conneciton, with tika transformation in between, we see that a tar.gz file is being extracted again and again without going to Solr ingestion phase. We are seeing the following in the history screen:

Re: ManifoldCF API file system exclusion list

2018-05-30 Thread Shashank Raj
Hi Karl, We checked the JSON being formed in the version 2.10 but could not find any change. The order is still not preserved. What other workaround can you suggest for this. We also tried the method "msaunier" had sent and that did not work out either. Thanks. On Wed 30 May, 2018,

Re: org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838) error SPAM 10Go/hour

2018-05-30 Thread Karl Wright
Thanks for the update. Karl On Wed, May 30, 2018 at 5:34 AM msaunier wrote: > Hello Karl, > > I have check they files and our provider make a mistake in generating PDF > for this server. We have null joined scan parameter. > > We have similare errors with others server with no error log. I will

RE: org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:838) error SPAM 10Go/hour

2018-05-30 Thread msaunier
Hello Karl, I have check they files and our provider make a mistake in generating PDF for this server. We have null joined scan parameter. We have similare errors with others server with no error log. I will also look. So, for this PDF error it’s ok, it’s just an error. For they other

Re: ManifoldCF API file system exclusion list

2018-05-30 Thread Karl Wright
Hi Shashank, This question has come up recently from another source as well, and there was a ticket and a fix committed. I believe it went out in 2.10. The problem is that the *exported* JSON is not in the proper form and so order does not get preserved when it is re-imported. The fix changes

ManifoldCF API file system exclusion list

2018-05-30 Thread Shashank Raj
Hi Karl, We have been trying to modify the present job through API. For this we are first calling API/JSON/jobs/jobid to get the JSON format and then appending an excluded file in the repository column of the job. But by priority the included files show up first causing exclusion not to