Diagnosing REJECTED documents in job history

2013-01-21 Thread Andrew Clegg
Hi, I'm trying to set up a fairly simple crawl where I pull documents from Documentum and push them into ElasticSearch, using the 1.0.1 binary release with all appropriate extras for Documentum added. The repository connection looks fine -- in the job config I can see the paths, document types,

Re: Diagnosing REJECTED documents in job history

2013-01-21 Thread Andrew Clegg
Close, it's ElasticSearch. Okay, I'll play around with these, thanks. On 21 January 2013 11:26, Karl Wright daddy...@gmail.com wrote: Hi Andrew, The reason for rejection has to do with the criteria you provide for the job. Specifically: if

Re: XML parsing error quits file crawling using Windows share connection

2013-01-21 Thread Karl Wright
This means that the Solr you are talking to has returned an unintelligible (non-XML) response. When this happens I believe the actual return text is included in the Simple History, so I'd look there first to see what the problem might be. You may also eventually want to update to the current

Re: Diagnosing REJECTED documents in job history

2013-01-21 Thread Andrew Clegg
So, the only content types in Documentum are pdf and pdftext. application/pdf is enabled in the ES tab in the job config. (I assume they both map to application/pdf -- how would I check for sure?) And my max file size is 16777216000 which is wy bigger than any of the rejected documents.

Re: Diagnosing REJECTED documents in job history

2013-01-21 Thread Andrew Clegg
Just to clarify that last post, I haven't disabled any of the allowed mime types for ES, so as long as they're not something really weird it should be fine. Unless it's a file extension problem (ES also has allowed file extensions) but is there a way to get that level of information about each

Job hanging on Starting up with never ending external query.

2013-01-21 Thread Anthony Leonard
Hi there, We have recently started running a nightly job 2AM in ManifoldCF to extract data from an Oracle repository and populate a Solr index. Most nights this works fine, but occasionally the job has been hanging at the Starting up phase. We have observed this on our test setup also

Re: Job hanging on Starting up with never ending external query.

2013-01-21 Thread Karl Wright
Hi Anthony, What happens between the framework recognizing that the job should be started (which it does fine in both cases), and actually achieving a correct job start, is the seeding phase, which is going to try to execute the seeding query against your Oracle database. If something happens at

Re: Job hanging on Starting up with never ending external query.

2013-01-21 Thread Anthony Leonard
Dear Karl, Many thanks for your insights. I'll do a kill -QUIT next time we have this issue which should hopefully give me the thread dump. However we've noticed that killing processes means we have to run the locks-clean script so it's not our favourite way of doing it. Also I definitely think

Re: Crawling new/updated files using Windows share connection

2013-01-21 Thread Karl Wright
CONNECTORS-618 Karl On Mon, Jan 21, 2013 at 9:08 AM, Karl Wright daddy...@gmail.com wrote: Bad news, I am afraid. MySQL seems to always put null values at the front of the index, and that cannot be changed through any means I can find. This is different from all other databases I know of.

Re: Job hanging on Starting up with never ending external query.

2013-01-21 Thread Karl Wright
kill -QUIT should not abort the agents process, just cause a thread dump. kill -9 is a different story. You can also do the same thing by using jstack, in the jvm bin directory. Karl On Mon, Jan 21, 2013 at 9:04 AM, Anthony Leonard anthony.leon...@york.ac.uk wrote: Dear Karl, Many thanks

Re: Crawling new/updated files using Windows share connection

2013-01-21 Thread Karl Wright
I checked a fix for this into trunk. Please sync up with trunk and see if this fixes your problem. If it does, I will gladly include the fix in MCF 1.1. Karl On Mon, Jan 21, 2013 at 9:14 AM, Karl Wright daddy...@gmail.com wrote: CONNECTORS-618 Karl On Mon, Jan 21, 2013 at 9:08 AM, Karl