Sadly, I did a completely fresh build, with a new database, and I
still get REJECTED for all the documents found, with no log messages.
I also tried upgrading my DFC jars to those from Documentum 6.7 as one
of my colleagues pointed out that we use 6.6 which doesn't officially
support
The problem is that there are some documents you are indexing that
have no mime type set at all. The ElasticSearch connector is not
handling that case properly. I've opened ticket CONNECTORS-637, and
will fix it shortly.
Karl
On Fri, Feb 1, 2013 at 9:36 AM, Andrew Clegg andrew.cl...@gmail.com
OK, I've checked in a fix to trunk.
Please synch up and try again.
Karl
On Fri, Feb 1, 2013 at 10:10 AM, Karl Wright daddy...@gmail.com wrote:
The problem is that there are some documents you are indexing that
have no mime type set at all. The ElasticSearch connector is not
handling that
Great, thanks, I'll give it a try.
On 30 January 2013 18:52, Karl Wright daddy...@gmail.com wrote:
I just checked in a refactoring to trunk that should improve Elastic
Search error reporting significantly.
Karl
On Wed, Jan 30, 2013 at 9:39 AM, Karl Wright daddy...@gmail.com wrote:
I agree
Hi Karl,
I finally had a chance to go back to this and here's what I found.
Documentum was returning pdf and pdftext for the content type, not
a full mime type, so as an experiment I added these to the list of
allowed mime types in the ElasticSearch configuration for the job.
This time, it got
Ok, so let's back up a bit.
First, which version of ManifoldCF is this? I need to know that
before I can interpret the stack trace.
Second, what do you see when you view the connection in the crawler
UI? Does it say Connection working, or something else, and if so,
what?
I've created a ticket
Thanks for all your help Karl!
It's 1.0.1 from the binary distro.
And yes, it says Connection working when I view it.
On 30 January 2013 14:03, Karl Wright daddy...@gmail.com wrote:
Ok, so let's back up a bit.
First, which version of ManifoldCF is this? I need to know that
before I can
That information isn't being recorded in manifoldcf.log unfortunately
-- I included all that was there. And there are no exceptions in
elasticsearch.log either...
I'll try running wireshark to see if I can follow the TCP stream.
On 30 January 2013 14:16, Karl Wright daddy...@gmail.com wrote:
Nailed it with the help of wireshark! Turns out it was my fault -- I
had set it up to use (i.e. create) an index called DocumentumRoW but
it turns out ES index names must be all lowercase.
Never knew that before.
Slightly annoyed that ES didn't log that...
Thanks again for your help Karl :-)
I agree that the Elastic Search connector needs far better logging and
error handling. CONNECTORS-629.
Karl
On Wed, Jan 30, 2013 at 9:27 AM, Andrew Clegg andrew.cl...@gmail.com wrote:
Nailed it with the help of wireshark! Turns out it was my fault -- I
had set it up to use (i.e. create) an
I just checked in a refactoring to trunk that should improve Elastic
Search error reporting significantly.
Karl
On Wed, Jan 30, 2013 at 9:39 AM, Karl Wright daddy...@gmail.com wrote:
I agree that the Elastic Search connector needs far better logging and
error handling. CONNECTORS-629.
Karl
Hi,
I'm trying to set up a fairly simple crawl where I pull documents from
Documentum and push them into ElasticSearch, using the 1.0.1 binary
release with all appropriate extras for Documentum added.
The repository connection looks fine -- in the job config I can see
the paths, document types,
Close, it's ElasticSearch. Okay, I'll play around with these, thanks.
On 21 January 2013 11:26, Karl Wright daddy...@gmail.com wrote:
Hi Andrew,
The reason for rejection has to do with the criteria you provide for
the job. Specifically:
if
So, the only content types in Documentum are pdf and pdftext.
application/pdf is enabled in the ES tab in the job config. (I
assume they both map to application/pdf -- how would I check for
sure?)
And my max file size is 16777216000 which is wy bigger than any of
the rejected documents.
Just to clarify that last post, I haven't disabled any of the allowed
mime types for ES, so as long as they're not something really weird it
should be fine.
Unless it's a file extension problem (ES also has allowed file
extensions) but is there a way to get that level of information about
each
15 matches
Mail list logo