[jira] [Updated] (TIKA-2900) Removing comments from *.docx, *.pdf files

2019-07-08 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Md updated TIKA-2900: - Attachment: Document_with_Comments_Text_extarction_Tika_APP.docx.txt > Removing comments from *.docx, *.pdf files > --

[jira] [Updated] (TIKA-2901) Tika extracting points data from Chart

2019-07-08 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Md updated TIKA-2901: - Summary: Tika extracting points data from Chart (was: Tika extracting points from Chart ) > Tika extracting points data

[jira] [Updated] (TIKA-2901) Tika extracting points from Chart

2019-07-08 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Md updated TIKA-2901: - Attachment: Chart_data_sample_text_possible_issue.docx.txt Chart_data_sample_text_possible_issue.docx > Ti

[jira] [Updated] (TIKA-2900) Removing comments from *.docx, *.pdf files

2019-07-08 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Md updated TIKA-2900: - Attachment: (was: Chart_data_sample_text_possible_issue.docx.txt) > Removing comments from *.docx, *.pdf files > -

[jira] [Updated] (TIKA-2900) Removing comments from *.docx, *.pdf files

2019-07-08 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Md updated TIKA-2900: - Attachment: (was: Chart_data_sample_text_possible_issue.docx) > Removing comments from *.docx, *.pdf files > -

[jira] [Updated] (TIKA-2900) Removing comments from *.docx, *.pdf files

2019-07-08 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Md updated TIKA-2900: - Attachment: Chart_data_sample_text_possible_issue.docx.txt Chart_data_sample_text_possible_issue.docx > Re

[jira] [Created] (TIKA-2901) Tika extracting points from Chart

2019-07-08 Thread Md (JIRA)
Md created TIKA-2901: Summary: Tika extracting points from Chart Key: TIKA-2901 URL: https://issues.apache.org/jira/browse/TIKA-2901 Project: Tika Issue Type: Bug Components: app Affects V

[jira] [Updated] (TIKA-2900) Removing comments from *.docx, *.pdf files

2019-07-08 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Md updated TIKA-2900: - Attachment: Document_with_Comments_Text_extarction_Tika_APP.docx > Removing comments from *.docx, *.pdf files > --

[jira] [Updated] (TIKA-2900) Removing comments from *.docx, *.pdf files

2019-07-08 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Md updated TIKA-2900: - Description: Hello, I do use Apache tika to extract text from mostly *.doc, *docx and *pdf files. Sometimes there are co

[jira] [Created] (TIKA-2900) Removing comments from *.docx, *.pdf files

2019-07-08 Thread Md (JIRA)
Md created TIKA-2900: Summary: Removing comments from *.docx, *.pdf files Key: TIKA-2900 URL: https://issues.apache.org/jira/browse/TIKA-2900 Project: Tika Issue Type: Wish Components: app, exa

[jira] [Comment Edited] (TIKA-2593) docx with track change producing incorrect output

2018-03-01 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382220#comment-16382220 ] Md edited comment on TIKA-2593 at 3/1/18 4:51 PM: -- I would like to do few

[jira] [Comment Edited] (TIKA-2593) docx with track change producing incorrect output

2018-03-01 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382220#comment-16382220 ] Md edited comment on TIKA-2593 at 3/1/18 4:11 PM: -- I would like to do few

[jira] [Commented] (TIKA-2593) docx with track change producing incorrect output

2018-03-01 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382220#comment-16382220 ] Md commented on TIKA-2593: -- I would like to do few things * exclude comments * possibly exclude

[jira] [Commented] (TIKA-2593) docx with track change producing incorrect output

2018-03-01 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382193#comment-16382193 ] Md commented on TIKA-2593: -- I am talking about this ticket and for example you can see the attache

[jira] [Comment Edited] (TIKA-2593) docx with track change producing incorrect output

2018-03-01 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382167#comment-16382167 ] Md edited comment on TIKA-2593 at 3/1/18 3:33 PM: -- No deleted content is n

[jira] [Commented] (TIKA-2593) docx with track change producing incorrect output

2018-03-01 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382167#comment-16382167 ] Md commented on TIKA-2593: -- No deleted content is not showing if we do   officeParserConfig.setUs

[jira] [Commented] (TIKA-2593) docx with track change producing incorrect output

2018-03-01 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382014#comment-16382014 ] Md commented on TIKA-2593: -- I think I did figure it out. I need to set  officeParserConfig.setUse

[jira] [Commented] (TIKA-2593) docx with track change producing incorrect output

2018-03-01 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382005#comment-16382005 ] Md commented on TIKA-2593: -- I notice it works nicely when I am asking to exclude header and footer

[jira] [Updated] (TIKA-2593) docx with track change producing incorrect output

2018-03-01 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Md updated TIKA-2593: - Description: I am using following code to extract text from docx file  {code:java} AutoDetectParser parser = new AutoDetect

[jira] [Created] (TIKA-2593) docx with track change producing incorrect output

2018-02-28 Thread Md (JIRA)
Md created TIKA-2593: Summary: docx with track change producing incorrect output Key: TIKA-2593 URL: https://issues.apache.org/jira/browse/TIKA-2593 Project: Tika Issue Type: Bug Components: co

[jira] [Commented] (TIKA-207) MS word doc containing tracked changes produces incorrect text

2018-02-28 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380773#comment-16380773 ] Md commented on TIKA-207: - By the way I am using AutoDetectParser() > MS word doc containing tracke

[jira] [Commented] (TIKA-207) MS word doc containing tracked changes produces incorrect text

2018-02-28 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380752#comment-16380752 ] Md commented on TIKA-207: - I am using tika 1.17 but still it's getting deleted text from track revis

[jira] [Commented] (TIKA-2326) java.lang.OutOfMemoryError: Java heap space

2017-04-13 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967812#comment-15967812 ] Md commented on TIKA-2326: -- Thanks once again, I was going through above mention discussion unfor

[jira] [Commented] (TIKA-2326) java.lang.OutOfMemoryError: Java heap space

2017-04-13 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967786#comment-15967786 ] Md commented on TIKA-2326: -- We have many files which are archived, for them, RecursiveParserWrappe

[jira] [Closed] (TIKA-2326) java.lang.OutOfMemoryError: Java heap space

2017-04-13 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Md closed TIKA-2326. Resolution: Fixed Fixed in 1.13 or later version > java.lang.OutOfMemoryError: Java heap space > --

[jira] [Updated] (TIKA-2326) java.lang.OutOfMemoryError: Java heap space

2017-04-13 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Md updated TIKA-2326: - Fix Version/s: 1.13 > java.lang.OutOfMemoryError: Java heap space > --- > >

[jira] [Commented] (TIKA-2326) java.lang.OutOfMemoryError: Java heap space

2017-04-13 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967764#comment-15967764 ] Md commented on TIKA-2326: -- Yes, you are right. It did fix in recent version(1.14). Thanks so much

[jira] [Updated] (TIKA-2326) java.lang.OutOfMemoryError: Java heap space

2017-04-13 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Md updated TIKA-2326: - Description: I am using RecursiveParserWrapper with AutoDetectParser() and here is the part of my code which is doing par

[jira] [Updated] (TIKA-2326) java.lang.OutOfMemoryError: Java heap space

2017-04-13 Thread Md (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Md updated TIKA-2326: - Attachment: 5d3e815263c73061d8804e15db3ammn0789_CLEAN_REVISED.docx Here is the file I am having issue with > java.lan

[jira] [Created] (TIKA-2326) java.lang.OutOfMemoryError: Java heap space

2017-04-13 Thread Md (JIRA)
Md created TIKA-2326: Summary: java.lang.OutOfMemoryError: Java heap space Key: TIKA-2326 URL: https://issues.apache.org/jira/browse/TIKA-2326 Project: Tika Issue Type: Bug Affects Versions: 1.8