Re: Repository document stream empty after Tika Transformation

2015-07-17 Thread chalitha udara Perera
Hi Karl, I mainly work with images. Actually tika extracts exif metadata from images. I have attached manifold log containing image metadata extracted from tika. I like to use a separate connector after that to extract low level features such as SIFT to provide image search. Currently cannot do th

[jira] [Commented] (CONNECTORS-1219) Lucene Output Connector

2015-07-17 Thread Shinichiro Abe (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632230#comment-14632230 ] Shinichiro Abe commented on CONNECTORS-1219: it will work if we just crea

Re: Repository document stream empty after Tika Transformation

2015-07-17 Thread Karl Wright
Hi Chalitha, The only documents I see here are documents that Tika cannot extract content from, namely JPG's etc. Karl On Fri, Jul 17, 2015 at 12:09 PM, chalitha udara Perera < chalithaud...@gmail.com> wrote: > Hi Karl, > > Here I have attached the result from File System -> Tika Transform ->

Re: Repository document stream empty after Tika Transformation

2015-07-17 Thread chalitha udara Perera
Hi Karl, Here I have attached the result from File System -> Tika Transform -> Null Output. Please find the attachment. Thank you, Chalitha On Fri, Jul 17, 2015 at 6:41 PM, Karl Wright wrote: > I don't see this here. > > I set up the following: > - file system repository connection > - null ou

RE: Repository document stream empty after Tika Transformation

2015-07-17 Thread Karl Wright
I don't see this here. I set up the following: - file system repository connection - null output connection - tika extractor - a job using all three Running the job and looking at the simple history, I see null output connection ingestion records that have proper document sizes. Can you repeat t

Re: Repository document stream empty after Tika Transformation

2015-07-17 Thread chalitha udara Perera
Hi Karl, I'm using 2.1 release and I am using only the Solr output connector. If you look at the inputstream size ( document.getBinaryLength()) after tika connector it is zero. Thanks, Chalitha On Fri, Jul 17, 2015 at 6:08 PM, Karl Wright wrote: > The document stream contains what tika ext

RE: Repository document stream empty after Tika Transformation

2015-07-17 Thread Karl Wright
The document stream contains what tika extracts. If it can't extract anything then you will have an empty stream. It is also possible that if the stream is split, you are tripping over a bug that was fixed some time ago. What mcf version is this, and do you have more than one output? Karl Sent

RE: Recovering ManifoldCF from a job stuck in terminating state

2015-07-17 Thread Karl Wright
I bet you are using file based synchronization, correct? If so, read up on the lock clean procedure. Karl Sent from my Windows Phone From: Dileepa Jayakody Sent: 7/17/2015 7:16 AM To: dev@manifoldcf.apache.org Subject: Recovering ManifoldCF from a job stuck in terminating state Hi All, I'm tryi

Repository document stream empty after Tika Transformation

2015-07-17 Thread chalitha udara Perera
Hi All, I'm writing a transformation connector to extract low level features from images. First I used that connector without tika extractor and I worked fine. But when I used it with Tika connector (after tika) if fails to extract features. After debugging I found out that the stream is empty aft

Recovering ManifoldCF from a job stuck in terminating state

2015-07-17 Thread Dileepa Jayakody
Hi All, I'm trying out ManifoldCF 2.1 by creating a Job with a file system repository connection, tika transformation and solr output connection. During a job I abruptly shutdown the server. After restarting I can see that Start up idle cleanup thread is in a loop without shutting down. See the e