[jira] [Created] (TIKA-3159) Macros not extracted from OpenDocument format Office files (flatXML format)

2020-08-11 Thread Robert Kaulbach (Jira)
Robert Kaulbach created TIKA-3159: - Summary: Macros not extracted from OpenDocument format Office files (flatXML format) Key: TIKA-3159 URL: https://issues.apache.org/jira/browse/TIKA-3159 Project:

[jira] [Created] (TIKA-3158) Macros not extracted from OpenDocument format Office files (zip format)

2020-08-11 Thread Robert Kaulbach (Jira)
Robert Kaulbach created TIKA-3158: - Summary: Macros not extracted from OpenDocument format Office files (zip format) Key: TIKA-3158 URL: https://issues.apache.org/jira/browse/TIKA-3158 Project: Tika

[jira] [Updated] (TIKA-3157) Missing content from .docx file with hyperlinked shape

2020-08-11 Thread Robert Kaulbach (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kaulbach updated TIKA-3157: -- Description: The attached .docx file was created in MS Office, simply drew a rectangle and

[jira] [Updated] (TIKA-3157) Missing content from .docx file with hyperlinked shape

2020-08-11 Thread Robert Kaulbach (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kaulbach updated TIKA-3157: -- Description: The attached .docx file was created in MS Office, simply drew a rectangle and

[jira] [Created] (TIKA-3157) Missing content from .docx file with hyperlinked shape

2020-08-11 Thread Robert Kaulbach (Jira)
Robert Kaulbach created TIKA-3157: - Summary: Missing content from .docx file with hyperlinked shape Key: TIKA-3157 URL: https://issues.apache.org/jira/browse/TIKA-3157 Project: Tika Issue

[jira] [Commented] (TIKA-3156) Missing content from .odt file with hyperlinked image

2020-08-11 Thread Robert Kaulbach (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175903#comment-17175903 ] Robert Kaulbach commented on TIKA-3156: --- The issue seems to be happening in

[jira] [Created] (TIKA-3156) Missing content from .odt file with hyperlinked image

2020-08-11 Thread Robert Kaulbach (Jira)
Robert Kaulbach created TIKA-3156: - Summary: Missing content from .odt file with hyperlinked image Key: TIKA-3156 URL: https://issues.apache.org/jira/browse/TIKA-3156 Project: Tika Issue

Study On Rejected Refactorings

2020-08-11 Thread Jevgenija Pantiuchina
Dear contributors, As part of a research team from Università della Svizzera italiana (Switzerland) and University of Sannio (Italy), we have analyzed refactoring pull requests in apache/tika repository and are looking for developers for a short 5-10 min survey

Re: Access to corpora server to run regression tests

2020-08-11 Thread Andreas Lehmkuehler
Am 10.08.20 um 19:27 schrieb Tim Allison: I've updated the process here: https://cwiki.apache.org/confluence/display/TIKA/TikaEvalOnVM One of the key missing pieces was the batch-scripts.tgz file. Apparently, that attached file never made it during the confluence migration. I was able to

[GitHub] [tika] kkrugler commented on pull request #337: Fixing typo on SAX Parser exception

2020-08-11 Thread GitBox
kkrugler commented on pull request #337: URL: https://github.com/apache/tika/pull/337#issuecomment-671978803 Thanks João! This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [tika] kkrugler merged pull request #337: Fixing typo on SAX Parser exception

2020-08-11 Thread GitBox
kkrugler merged pull request #337: URL: https://github.com/apache/tika/pull/337 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Commented] (TIKA-3155) Parse Error while extracting CSV files

2020-08-11 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175608#comment-17175608 ] Kenneth William Krugler commented on TIKA-3155: --- For the common cases, it would probably

[jira] [Commented] (TIKA-3129) Tika server - track a "last parsed on" timestamp and provide an endpoint to get it

2020-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175529#comment-17175529 ] Nicholas DiPiazza commented on TIKA-3129: - [~tallison] I will be testing out this feature today.

[jira] [Commented] (TIKA-3153) Text File identified as message/rfc822

2020-08-11 Thread Jira
[ https://issues.apache.org/jira/browse/TIKA-3153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175523#comment-17175523 ] Luís Filipe Nassif commented on TIKA-3153: -- +1 > Text File identified as message/rfc822 >

[jira] [Comment Edited] (TIKA-3155) Parse Error while extracting CSV files

2020-08-11 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175415#comment-17175415 ] Nick Burch edited comment on TIKA-3155 at 8/11/20, 9:50 AM: If we can use

[jira] [Commented] (TIKA-3155) Parse Error while extracting CSV files

2020-08-11 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175415#comment-17175415 ] Nick Burch commented on TIKA-3155: -- If we can use quote mode we should, it will make the output from Tika

[GitHub] [tika] JoaoGFarias opened a new pull request #337: Fixing typo on SAX Parser exception

2020-08-11 Thread GitBox
JoaoGFarias opened a new pull request #337: URL: https://github.com/apache/tika/pull/337 prooblem => problem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[jira] [Commented] (TIKA-3155) Parse Error while extracting CSV files

2020-08-11 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175290#comment-17175290 ] Peter Lee commented on TIKA-3155: - We can do it in _TextAndCSVParser_ like this {code:java} CSVFormat

[jira] [Commented] (TIKA-3155) Parse Error while extracting CSV files

2020-08-11 Thread Akash (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175289#comment-17175289 ] Akash commented on TIKA-3155: -

[jira] [Commented] (TIKA-3155) Parse Error while extracting CSV files

2020-08-11 Thread Peter Lee (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175286#comment-17175286 ] Peter Lee commented on TIKA-3155: - Hey. I think it's caused by the Quote Mode of Apache Commons CSV. We