[jira] [Commented] (CONNECTORS-1591) RTF comment parsing problem

2019-08-13 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906672#comment-16906672
 ] 

Karl Wright commented on CONNECTORS-1591:
-

Ok, I'll look into this update shortly.


> RTF comment parsing problem
> ---
>
> Key: CONNECTORS-1591
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1591
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Zoltan Farago
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.14
>
> Attachments: comment.rtf, result.txt
>
>
> We have a problem with Manifold/Tika. When a comment is parsed from and RTF 
> file, the result has no separator. see attachments



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (CONNECTORS-1591) RTF comment parsing problem

2019-08-13 Thread Markus Schuch (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906639#comment-16906639
 ] 

Markus Schuch commented on CONNECTORS-1591:
---

Tika 1.21 should fix this

> RTF comment parsing problem
> ---
>
> Key: CONNECTORS-1591
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1591
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Zoltan Farago
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.14
>
> Attachments: comment.rtf, result.txt
>
>
> We have a problem with Manifold/Tika. When a comment is parsed from and RTF 
> file, the result has no separator. see attachments



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (CONNECTORS-1591) RTF comment parsing problem

2019-03-12 Thread Zoltan Farago (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790347#comment-16790347
 ] 

Zoltan Farago commented on CONNECTORS-1591:
---

[~kwri...@metacarta.com], thank you! Please link the new ticket here, or add me 
to the watchers list. 

> RTF comment parsing problem
> ---
>
> Key: CONNECTORS-1591
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1591
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Zoltan Farago
>Assignee: Karl Wright
>Priority: Major
> Attachments: comment.rtf, result.txt
>
>
> We have a problem with Manifold/Tika. When a comment is parsed from and RTF 
> file, the result has no separator. see attachments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1591) RTF comment parsing problem

2019-03-12 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790343#comment-16790343
 ] 

Karl Wright commented on CONNECTORS-1591:
-

Hi [~zfarago], I think the right approach here is to leave this ticket open and 
link to a TIKA ticket describing your problem.  The issue is not really related 
to ManifoldCF itself, and we cannot solve it for you until the Tika team 
corrects the issue.

I'll go ahead and create the linked ticket.


> RTF comment parsing problem
> ---
>
> Key: CONNECTORS-1591
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1591
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Zoltan Farago
>Priority: Major
> Attachments: comment.rtf, result.txt
>
>
> We have a problem with Manifold/Tika. When a comment is parsed from and RTF 
> file, the result has no separator. see attachments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1591) RTF comment parsing problem

2019-03-12 Thread Zoltan Farago (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790321#comment-16790321
 ] 

Zoltan Farago commented on CONNECTORS-1591:
---

[~kwri...@metacarta.com] Manifold version is 2.10 an we do not use the mapper 
attachment. We tried TIka 1.17 and 1.19 both has the same problem. 

> RTF comment parsing problem
> ---
>
> Key: CONNECTORS-1591
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1591
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Zoltan Farago
>Priority: Major
> Attachments: comment.rtf, result.txt
>
>
> We have a problem with Manifold/Tika. When a comment is parsed from and RTF 
> file, the result has no separator. see attachments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1591) RTF comment parsing problem

2019-03-12 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790294#comment-16790294
 ] 

Karl Wright commented on CONNECTORS-1591:
-

[~zfarago]  Ok, we're getting closer.

What version of ManifoldCF is this?


> RTF comment parsing problem
> ---
>
> Key: CONNECTORS-1591
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1591
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Zoltan Farago
>Priority: Major
> Attachments: comment.rtf, result.txt
>
>
> We have a problem with Manifold/Tika. When a comment is parsed from and RTF 
> file, the result has no separator. see attachments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1591) RTF comment parsing problem

2019-03-12 Thread Zoltan Farago (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790287#comment-16790287
 ] 

Zoltan Farago commented on CONNECTORS-1591:
---

the output is an Elastic index. Comments in all other filetypes (.doc, .xls, 
.pdf, .dcx, .odt, etc) are separated with space from the content text. 

> RTF comment parsing problem
> ---
>
> Key: CONNECTORS-1591
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1591
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Zoltan Farago
>Priority: Major
> Attachments: comment.rtf, result.txt
>
>
> We have a problem with Manifold/Tika. When a comment is parsed from and RTF 
> file, the result has no separator. see attachments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1591) RTF comment parsing problem

2019-03-11 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789712#comment-16789712
 ] 

Karl Wright commented on CONNECTORS-1591:
-

[~zfarago] When you run a ManifoldCF job that fetches an RTF document and runs 
it through the Tika extractor, what comes out is a stream of characters (the 
content stream) plus various metadata fields.  All of these are sent to the 
output connector, which then does whatever it wants with these.

You *cannot* see the content stream nor the metadata directly.  So I need to 
know where you are getting result.txt from.  There is a missing step that you 
aren't telling me about and it's a critical one.


> RTF comment parsing problem
> ---
>
> Key: CONNECTORS-1591
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1591
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Zoltan Farago
>Priority: Major
> Attachments: comment.rtf, result.txt
>
>
> We have a problem with Manifold/Tika. When a comment is parsed from and RTF 
> file, the result has no separator. see attachments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1591) RTF comment parsing problem

2019-03-11 Thread Zoltan Farago (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789704#comment-16789704
 ] 

Zoltan Farago commented on CONNECTORS-1591:
---

basically, we processeed comment.rtf with manifold using Tika content connector 
and the result is the result.txt this is the content of the RTF file

 

> RTF comment parsing problem
> ---
>
> Key: CONNECTORS-1591
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1591
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Zoltan Farago
>Priority: Major
> Attachments: comment.rtf, result.txt
>
>
> We have a problem with Manifold/Tika. When a comment is parsed from and RTF 
> file, the result has no separator. see attachments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1591) RTF comment parsing problem

2019-03-11 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789698#comment-16789698
 ] 

Karl Wright commented on CONNECTORS-1591:
-

I will repeat the question. *Where* is result.txt coming from?  Where are you 
finding it?  Is it content or metadata?  If metadata, what metadata field?



> RTF comment parsing problem
> ---
>
> Key: CONNECTORS-1591
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1591
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Zoltan Farago
>Priority: Major
> Attachments: comment.rtf, result.txt
>
>
> We have a problem with Manifold/Tika. When a comment is parsed from and RTF 
> file, the result has no separator. see attachments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1591) RTF comment parsing problem

2019-03-11 Thread Zoltan Farago (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789671#comment-16789671
 ] 

Zoltan Farago commented on CONNECTORS-1591:
---

[~kwri...@metacarta.com] you are right, it was a slack issue between a 
developer and me. now I attached it as .txt fille. Thank you.

> RTF comment parsing problem
> ---
>
> Key: CONNECTORS-1591
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1591
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Zoltan Farago
>Priority: Major
> Attachments: comment.rtf, result.txt
>
>
> We have a problem with Manifold/Tika. When a comment is parsed from and RTF 
> file, the result has no separator. see attachments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1591) RTF comment parsing problem

2019-03-11 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789564#comment-16789564
 ] 

Karl Wright commented on CONNECTORS-1591:
-

Hi [~zfarago], the result.xml file you attached is certainly not xml.  Was this 
intended?  In its current form I have no idea what this is and what it's 
supposed to represent and where you got it from exactly.  Please clarify that, 
and also clarify what you *expect* to see.  Bear in mind that if you are 
looking at the actual content or metadata output of the Tika Extractor, it's no 
help to create a ticket against ManifoldCF for that.  We do not develop Tika 
and there nothing we could do other than open a Tika ticket.  So I suggest that 
you do that instead.


> RTF comment parsing problem
> ---
>
> Key: CONNECTORS-1591
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1591
> Project: ManifoldCF
>  Issue Type: Bug
>Reporter: Zoltan Farago
>Priority: Major
> Attachments: comment.rtf, result.xml
>
>
> We have a problem with Manifold/Tika. When a comment is parsed from and RTF 
> file, the result has no separator. see attachments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)