[
https://issues.apache.org/jira/browse/TIKA-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Md updated TIKA-2900:
-
Attachment: Document_with_Comments_Text_extarction_Tika_APP.docx.txt
> Removing comments from *.docx, *.pdf files
> --
[
https://issues.apache.org/jira/browse/TIKA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Md updated TIKA-2901:
-
Summary: Tika extracting points data from Chart (was: Tika extracting
points from Chart )
> Tika extracting points data
[
https://issues.apache.org/jira/browse/TIKA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Md updated TIKA-2901:
-
Attachment: Chart_data_sample_text_possible_issue.docx.txt
Chart_data_sample_text_possible_issue.docx
> Ti
[
https://issues.apache.org/jira/browse/TIKA-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Md updated TIKA-2900:
-
Attachment: (was: Chart_data_sample_text_possible_issue.docx.txt)
> Removing comments from *.docx, *.pdf files
> -
[
https://issues.apache.org/jira/browse/TIKA-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Md updated TIKA-2900:
-
Attachment: (was: Chart_data_sample_text_possible_issue.docx)
> Removing comments from *.docx, *.pdf files
> -
[
https://issues.apache.org/jira/browse/TIKA-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Md updated TIKA-2900:
-
Attachment: Chart_data_sample_text_possible_issue.docx.txt
Chart_data_sample_text_possible_issue.docx
> Re
Md created TIKA-2901:
Summary: Tika extracting points from Chart
Key: TIKA-2901
URL: https://issues.apache.org/jira/browse/TIKA-2901
Project: Tika
Issue Type: Bug
Components: app
Affects V
[
https://issues.apache.org/jira/browse/TIKA-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Md updated TIKA-2900:
-
Attachment: Document_with_Comments_Text_extarction_Tika_APP.docx
> Removing comments from *.docx, *.pdf files
> --
[
https://issues.apache.org/jira/browse/TIKA-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Md updated TIKA-2900:
-
Description:
Hello,
I do use Apache tika to extract text from mostly *.doc, *docx and *pdf files.
Sometimes there are co
Md created TIKA-2900:
Summary: Removing comments from *.docx, *.pdf files
Key: TIKA-2900
URL: https://issues.apache.org/jira/browse/TIKA-2900
Project: Tika
Issue Type: Wish
Components: app, exa
[
https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382220#comment-16382220
]
Md edited comment on TIKA-2593 at 3/1/18 4:51 PM:
--
I would like to do few
[
https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382220#comment-16382220
]
Md edited comment on TIKA-2593 at 3/1/18 4:11 PM:
--
I would like to do few
[
https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382220#comment-16382220
]
Md commented on TIKA-2593:
--
I would like to do few things
* exclude comments
* possibly exclude
[
https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382193#comment-16382193
]
Md commented on TIKA-2593:
--
I am talking about this ticket and for example you can see the attache
[
https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382167#comment-16382167
]
Md edited comment on TIKA-2593 at 3/1/18 3:33 PM:
--
No deleted content is n
[
https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382167#comment-16382167
]
Md commented on TIKA-2593:
--
No deleted content is not showing if we do
officeParserConfig.setUs
[
https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382014#comment-16382014
]
Md commented on TIKA-2593:
--
I think I did figure it out. I need to set
officeParserConfig.setUse
[
https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382005#comment-16382005
]
Md commented on TIKA-2593:
--
I notice it works nicely when I am asking to exclude header and footer
[
https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Md updated TIKA-2593:
-
Description:
I am using following code to extract text from docx file
{code:java}
AutoDetectParser parser = new AutoDetect
Md created TIKA-2593:
Summary: docx with track change producing incorrect output
Key: TIKA-2593
URL: https://issues.apache.org/jira/browse/TIKA-2593
Project: Tika
Issue Type: Bug
Components: co
[
https://issues.apache.org/jira/browse/TIKA-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380773#comment-16380773
]
Md commented on TIKA-207:
-
By the way I am using AutoDetectParser()
> MS word doc containing tracke
[
https://issues.apache.org/jira/browse/TIKA-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380752#comment-16380752
]
Md commented on TIKA-207:
-
I am using tika 1.17 but still it's getting deleted text from track revis
[
https://issues.apache.org/jira/browse/TIKA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967812#comment-15967812
]
Md commented on TIKA-2326:
--
Thanks once again, I was going through above mention discussion
unfor
[
https://issues.apache.org/jira/browse/TIKA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967786#comment-15967786
]
Md commented on TIKA-2326:
--
We have many files which are archived, for them, RecursiveParserWrappe
[
https://issues.apache.org/jira/browse/TIKA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Md closed TIKA-2326.
Resolution: Fixed
Fixed in 1.13 or later version
> java.lang.OutOfMemoryError: Java heap space
> --
[
https://issues.apache.org/jira/browse/TIKA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Md updated TIKA-2326:
-
Fix Version/s: 1.13
> java.lang.OutOfMemoryError: Java heap space
> ---
>
>
[
https://issues.apache.org/jira/browse/TIKA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967764#comment-15967764
]
Md commented on TIKA-2326:
--
Yes, you are right. It did fix in recent version(1.14). Thanks so much
[
https://issues.apache.org/jira/browse/TIKA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Md updated TIKA-2326:
-
Description:
I am using RecursiveParserWrapper with AutoDetectParser() and here is the part
of my code which is doing par
[
https://issues.apache.org/jira/browse/TIKA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Md updated TIKA-2326:
-
Attachment: 5d3e815263c73061d8804e15db3ammn0789_CLEAN_REVISED.docx
Here is the file I am having issue with
> java.lan
Md created TIKA-2326:
Summary: java.lang.OutOfMemoryError: Java heap space
Key: TIKA-2326
URL: https://issues.apache.org/jira/browse/TIKA-2326
Project: Tika
Issue Type: Bug
Affects Versions: 1.8
30 matches
Mail list logo