[jira] [Issue Comment Deleted] (TIKA-3528) WMV file detected as WMA (audio/x-ms-wma)

2021-08-17 Thread Nitish Gupta (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nitish Gupta updated TIKA-3528: --- Comment: was deleted (was: [^WMVFile.wmv]) > WMV file detected as WMA (audio/x-ms-wma) >

[jira] [Issue Comment Deleted] (TIKA-3441) tika server stuck in loop trying to bind

2021-06-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3441: -- Comment: was deleted (was: Steps to reproduce: run tika-server in spawnchild mode and at some point, it

[jira] [Issue Comment Deleted] (TIKA-3424) tika-app in 2.x should log to stderr

2021-06-01 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3424: -- Comment: was deleted (was: saw...) > tika-app in 2.x should log to stderr > ---

[jira] [Issue Comment Deleted] (TIKA-1570) Seeking a stop method for better use with Apache Commons Daemon

2021-05-13 Thread Sal (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sal updated TIKA-1570: -- Comment: was deleted (was: [~epugh] Will NSSM work ok even though there is no stop method?) > Seeking a stop method for

[jira] [Issue Comment Deleted] (TIKA-3244) General upgrades for 1.26

2021-03-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3244: -- Comment: was deleted (was: If we go with jackson for xml parsing, we avoid this problem: {noformat}

[jira] [Issue Comment Deleted] (TIKA-3255) Parsing MP3 file with record > 100000

2020-12-22 Thread Kenneth William Krugler (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth William Krugler updated TIKA-3255: -- Comment: was deleted (was: A description of what happens, or a title that indica

[jira] [Issue Comment Deleted] (TIKA-3172) PDF Parser configuration enable auto space using tika config file

2020-08-18 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3172: -- Comment: was deleted (was: Please try if setting and changing "sortByPosition" has any effect. T

[jira] [Issue Comment Deleted] (TIKA-3077) OneNote parser - very inefficient when parsing OneNote <= 2007 files

2020-03-24 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3077: Comment: was deleted (was: addressing this in https://github.com/apache/tika/pull/314) > On

[jira] [Issue Comment Deleted] (TIKA-3075) Add an HTTP parser

2020-03-22 Thread GGYF (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GGYF updated TIKA-3075: --- Comment: was deleted (was: Thank you David Eric Pugh for your reply I'm sorry for the late reply There are many scenar

[jira] [Issue Comment Deleted] (TIKA-3065) Not able to parse the document with inline image

2020-03-11 Thread suchendra (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] suchendra updated TIKA-3065: Comment: was deleted (was: [~hudson], when I debug it hit other override method of detect and stuck at line

[jira] [Issue Comment Deleted] (TIKA-3035) Tika-app --extract mode outputs to stderr instead of stdout

2020-02-24 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-3035: -- Comment: was deleted (was: I have no position on this. [~sorend] did not bring any further argum

[jira] [Issue Comment Deleted] (TIKA-2962) Tika在识别以caff开头的txt类型文档时,会错误地把它识别为 audio/x-caf 音频文件

2019-10-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-2962: -- Comment: was deleted (was: !image-2019-10-17-18-01-40-322.png!) > Tika在识别以caff开头的txt类型文档时,会错误地把

[jira] [Issue Comment Deleted] (TIKA-2962) Tika在识别以caff开头的txt类型文档时,会错误地把它识别为 audio/x-caf 音频文件

2019-10-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-2962: -- Comment: was deleted (was: I loged in with my Apache user/psw. Yes please open a ticket. Here ar

[jira] [Issue Comment Deleted] (TIKA-2962) Tika在识别以caff开头的txt类型文档时,会错误地把它识别为 audio/x-caf 音频文件

2019-10-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-2962: -- Comment: was deleted (was: [~Mr_Jiang] you closed it as "fixed", but it should be closed as "du

[jira] [Issue Comment Deleted] (TIKA-2962) Tika在识别以caff开头的txt类型文档时,会错误地把它识别为 audio/x-caf 音频文件

2019-10-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-2962: -- Comment: was deleted (was: Oh, I can't add files.) > Tika在识别以caff开头的txt类型文档时,会错误地把它识别为 audio/x-

[jira] [Issue Comment Deleted] (TIKA-2962) Tika在识别以caff开头的txt类型文档时,会错误地把它识别为 audio/x-caf 音频文件

2019-10-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-2962: -- Comment: was deleted (was: [~tilman], are you logged in with your Apache credentials via LDAP?

[jira] [Issue Comment Deleted] (TIKA-2962) Tika在识别以caff开头的txt类型文档时,会错误地把它识别为 audio/x-caf 音频文件

2019-10-17 Thread Tilman Hausherr (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-2962: -- Comment: was deleted (was: "Yes, the network was disconnected once I sent the question, and I se

[jira] [Issue Comment Deleted] (TIKA-2909) Contributing HWP v5 Parser

2019-07-20 Thread SooMyung Lee (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SooMyung Lee updated TIKA-2909: --- Comment: was deleted (was: [~talli...@apache.org], I downloaded your attached hwp files and took test

[jira] [Issue Comment Deleted] (TIKA-2889) Tika Server keeps crashing

2019-06-04 Thread Thomas van Hesteren (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas van Hesteren updated TIKA-2889: -- Comment: was deleted (was: Hmm I already though so... however, there are no log files in

[jira] [Issue Comment Deleted] (TIKA-2789) Apache tika - java.lang.NoClassDefFoundError

2018-12-12 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2789: -- Comment: was deleted (was: poi-ooxml-schemas-4.0.1.jar) > Apache tika - java.lang.NoClassDefFoundError

[jira] [Issue Comment Deleted] (TIKA-2767) Problem with import xlsx with null cells

2018-11-05 Thread ionut hodor (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ionut hodor updated TIKA-2767: -- Comment: was deleted (was: Hi [~davemeikle] I have 2 example for you) > Problem with import xlsx with

[jira] [Issue Comment Deleted] (TIKA-1358) Add support for newer iWork file formats

2018-10-07 Thread king.wyx (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] king.wyx updated TIKA-1358: --- Comment: was deleted (was: 在IWork13ParserTest中,我把pages的地址修改为我的mac电脑上的地址,但是使用testParsePages13并没有获取到内容) > Add s

[jira] [Issue Comment Deleted] (TIKA-1358) Add support for newer iWork file formats

2018-10-07 Thread king.wyx (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] king.wyx updated TIKA-1358: --- Comment: was deleted (was: [^666.pages] ) > Add support for newer iWork file formats > -

[jira] [Issue Comment Deleted] (TIKA-2658) Add magic numbers of Olympus ORF Files

2018-06-01 Thread Selim Dincer (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Selim Dincer updated TIKA-2658: --- Comment: was deleted (was: I created a pull request with the changes mentioned above:  https://github.

[jira] [Issue Comment Deleted] (TIKA-2632) Analyze unknown govdocs files

2018-04-17 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2632: -- Comment: was deleted (was: bq. Turned out that someone else already investigated this case a month ago..

[jira] [Issue Comment Deleted] (TIKA-2608) tika matlab parser incorrectly identifies content type of minified javascript file

2018-03-16 Thread pdwalker (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] pdwalker updated TIKA-2608: --- Comment: was deleted (was: I tried downloading the tiki-app.jar file from [http://tika.apache.org/download.htm

[jira] [Issue Comment Deleted] (TIKA-2163) POIXMLException from ClassCastException on a valid Word template

2018-02-19 Thread Richard A (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard A updated TIKA-2163: Comment: was deleted (was: Is there any update on this issue? Do we know if it has/will be fixed? If so, in

[jira] [Issue Comment Deleted] (TIKA-2571) Swallows security exception and returns null

2018-02-09 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2571: -- Comment: was deleted (was: Sorry...I think I found it in POI's OPCPackageContainer. If this isn't what

[jira] [Issue Comment Deleted] (TIKA-2542) Support in tika-server for getting plain text and metadata at the same time

2018-01-05 Thread Manolo Caracuel (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manolo Caracuel updated TIKA-2542: -- Comment: was deleted (was: Pull request: https://github.com/apache/tika/pull/216) > Support in

[jira] [Issue Comment Deleted] (TIKA-2479) Handle empty cells in tables uniformly

2017-12-29 Thread Geoff Baskwill (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geoff Baskwill updated TIKA-2479: - Comment: was deleted (was: See also: https://github.com/apache/tika/pull/214) > Handle empty cells

[jira] [Issue Comment Deleted] (TIKA-2496) TIKA crashes / runs out of memory on simple PDF

2017-12-20 Thread chelambarasan (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chelambarasan updated TIKA-2496: Comment: was deleted (was: Yes. I am planning to do a pre-processing by extracting the zip file in m

[jira] [Issue Comment Deleted] (TIKA-2519) Issue parsing multiple CHM files concurrently

2017-12-07 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2519: -- Comment: was deleted (was: I'm seeing this when I run the code against chm multithreaded: {noformat} Cau

[jira] [Issue Comment Deleted] (TIKA-2456) Emails extracted from MBOX not detected as rfc822

2017-08-31 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Filipe Nassif updated TIKA-2456: - Comment: was deleted (was: Fixed in r560e91a) > Emails extracted from MBOX not detected as

[jira] [Issue Comment Deleted] (TIKA-2447) PSDParser creates unnecessary large byte array and discards it

2017-08-25 Thread Jan Burkhardt (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Burkhardt updated TIKA-2447: Comment: was deleted (was: yep its https://bz.apache.org/bugzilla/show_bug.cgi?id=61294) > PSDParser

[jira] [Issue Comment Deleted] (TIKA-2439) Avoid NullPointerException in org.apache.tika.langdetect.OptimaizeLangDetector if models haven't been loaded

2017-08-02 Thread Karl Richter (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Richter updated TIKA-2439: --- Comment: was deleted (was: There's even a comment `// TODO throw exception if models haven't been load

[jira] [Issue Comment Deleted] (TIKA-860) Make ZIP bomb detection configureable

2017-07-27 Thread Nicholas DiPiazza (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-860: --- Comment: was deleted (was: Ignore me. It's totally configurable. ) > Make ZIP bomb detection co

[jira] [Issue Comment Deleted] (TIKA-860) Make ZIP bomb detection configureable

2017-07-27 Thread Nicholas DiPiazza (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-860: --- Comment: was deleted (was: We should reopen this. Issue comes up quite often where some people

[jira] [Issue Comment Deleted] (TIKA-2408) ZipException in text extraction from DOCX file

2017-06-30 Thread Jorge Spinsanti (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Spinsanti updated TIKA-2408: -- Comment: was deleted (was: Thank you for your reply. Yes, I need help with tika-config.xml confi

[jira] [Issue Comment Deleted] (TIKA-2405) SAXParseException in text extraction from DOCX file

2017-06-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2405: -- Comment: was deleted (was: The footer does have a hdr element in it. See: https://issues.apache.org/ji

[jira] [Issue Comment Deleted] (TIKA-2347) Underlined text is not decorated as such when extracting from word documents

2017-04-28 Thread Stuart Hendren (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stuart Hendren updated TIKA-2347: - Comment: was deleted (was: See https://github.com/apache/tika/pull/173 ) > Underlined text is not

[jira] [Issue Comment Deleted] (TIKA-2338) Change Scope of Jai-ImageIO-Core dependency

2017-04-24 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Gribov updated TIKA-2338: Comment: was deleted (was: [~talli...@apache.org], thanks Tim. IMHO, it should be resolved b

[jira] [Issue Comment Deleted] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2017-03-17 Thread Sharath Kumar (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharath Kumar updated TIKA-2146: Comment: was deleted (was: Hi Tim, Can you please remove the document Test.doc. Seems it contains se

[jira] [Issue Comment Deleted] (TIKA-2281) Let's extract the MAPI subtype (NOTE, STICKY, etc.) for msg files

2017-02-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2281: -- Comment: was deleted (was: FROM and TO, CC, BCC don't apply to all .msg files. ) > Let's extract the MA

[jira] [Issue Comment Deleted] (TIKA-2265) Problem with footnotes/endnotes in Tika.parseToString with MS Word (.docx) files

2017-02-14 Thread Mike Rodent (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Rodent updated TIKA-2265: -- Comment: was deleted (was: I'm not that surprised to find that... So I ran my tests again... same proble

[jira] [Issue Comment Deleted] (TIKA-2170) Tika 1.13 ForkParser fails intermittently with very large MS Word docx

2016-11-07 Thread Tim Kingsbury (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Kingsbury updated TIKA-2170: Comment: was deleted (was: Awesome :) Do you have any sort of a wild guess for an ETA? ) > Tika 1

[jira] [Issue Comment Deleted] (TIKA-2146) Unable to extract contents from protected MS word-doc-java.lang.ArrayIndexOutOfBoundsException

2016-10-28 Thread Sharath Kumar (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharath Kumar updated TIKA-2146: Comment: was deleted (was: Does tika support extracting the contents of a protected MS-word documen

[jira] [Issue Comment Deleted] (TIKA-2105) Unable to process documents with french accents in filenames

2016-09-30 Thread susserj (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] susserj updated TIKA-2105: -- Comment: was deleted (was: Hi Tim I tried added chcp 65001 which didn't work and then I tried chcp 1252 which d

[jira] [Issue Comment Deleted] (TIKA-2105) Unable to process documents with french accents in filenames

2016-09-30 Thread susserj (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] susserj updated TIKA-2105: -- Comment: was deleted (was: Hi Tim When I added the -I -o to my command line I got a bunch of zero byte files

[jira] [Issue Comment Deleted] (TIKA-2038) A more accurate facility for detecting Charset Encoding of HTML documents

2016-08-04 Thread Shabanali Faghani (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shabanali Faghani updated TIKA-2038: Comment: was deleted (was: As I've said above URLs are available in the [./test-data/languag

[jira] [Issue Comment Deleted] (TIKA-2041) Charset detection doesn't appear to be thread-safe

2016-07-26 Thread Christian Aistleitner (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Aistleitner updated TIKA-2041: Comment: was deleted (was: Hallo Florian, Danke \o/ Liebe Grüße, Christian ) > Ch

[jira] [Issue Comment Deleted] (TIKA-1967) Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@10b8c32

2016-05-09 Thread kostali (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kostali updated TIKA-1967: -- Comment: was deleted (was: I already try tika 1.12.jar not working. ) > Unexpected RuntimeException from > org

Re: JIRA issue?

2016-04-22 Thread Nick Burch
On Thu, 21 Apr 2016, Ben McCann wrote: I'd like to create an issue on the JIRA. When I visit https://issues.apache.org/jira/browse/TIKA/ and hit Create I don't see Tika as an option. I can only create issues for Zookeeper and other projects If you let us know your JIRA username, someone can g

Re: JIRA issue?

2016-04-21 Thread Tyler Palsulich
Hi Ben, Sorry for the inconvenience. The infrastructure team had to disable the create and comment features of JIRA for many projects to mitigate spam. Hopefully everything will be back up and running again soon. Thanks for emailing. Tyler Hi, I'd like to create an issue on the JIRA. When I vis

JIRA issue?

2016-04-21 Thread Ben McCann
Hi, I'd like to create an issue on the JIRA. When I visit https://issues.apache.org/jira/browse/TIKA/ and hit Create I don't see Tika as an option. I can only create issues for Zookeeper and other projects Thanks, Ben -- about.me/benmccann

[jira] [Issue Comment Deleted] (TIKA-1513) Add mime detection and parsing for dbf files

2016-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1513: -- Comment: was deleted (was: I won't commit this until we get our corpus results back...perhaps I'll redo

[jira] [Issue Comment Deleted] (TIKA-1953) tika-server NullPointerException while processing rtfs

2016-04-18 Thread Ravi (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi updated TIKA-1953: --- Comment: was deleted (was: I had this issue while using the tika-python API and not the Java APIs. I'm not sure if usi

[jira] [Issue Comment Deleted] (TIKA-1928) Filename detection misses when a # is in a filename

2016-04-11 Thread Jean Coudon (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean Coudon updated TIKA-1928: -- Comment: was deleted (was: I am using Linux Mint 17.1. I couldn't manage to reproduce it with the CLI Ap

[jira] [Issue Comment Deleted] (TIKA-1883) Identification of Mime Type for Empty Files

2016-03-02 Thread Aditya Ramachandra Desai (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Ramachandra Desai updated TIKA-1883: --- Comment: was deleted (was: The updated codes is available at https://github.co

[jira] [Issue Comment Deleted] (TIKA-1884) Updating Tika Mime Repository

2016-03-02 Thread Aditya Ramachandra Desai (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Ramachandra Desai updated TIKA-1884: --- Comment: was deleted (was: The updated codes is available at https://github.co

[jira] [Issue Comment Deleted] (TIKA-1865) Save sender email address in Outlook MSG metadata

2016-02-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1865: -- Comment: was deleted (was: Y, that's my guess exactly. If anyone has actual knowledge or has found some

[jira] [Issue Comment Deleted] (TIKA-741) "Zip bomb" (XML nesting) detection is too strict

2016-02-09 Thread Pascal Essiembre (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pascal Essiembre updated TIKA-741: -- Comment: was deleted (was: Got it. Thanks!) > "Zip bomb" (XML nesting) detection is too strict >

[jira] [Issue Comment Deleted] (TIKA-1850) Tika erroneously detects some versions of jQuery as "text/html"

2016-02-03 Thread Boris Slobodin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Slobodin updated TIKA-1850: - Comment: was deleted (was: Nick, also, according to your comment (https://issues.apache.org/jira/b

[jira] [Issue Comment Deleted] (TIKA-1838) Just a quick question regarding compatibility

2016-01-20 Thread Raymond Cabrera (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Cabrera updated TIKA-1838: -- Comment: was deleted (was: Error occurred at deployment: Caused by: java.lang.ClassCastException

[jira] [Issue Comment Deleted] (TIKA-1329) Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

2015-12-20 Thread Joey Hong (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joey Hong updated TIKA-1329: Comment: was deleted (was: I have added updates to the Tika site to include the RecursiveParserWrapper examp

[jira] [Issue Comment Deleted] (TIKA-1799) Upgrade to POI 3.14-Beta1 when available

2015-12-10 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1799: -- Comment: was deleted (was: Might have found multithreading issue that I can't reproduce within JUnit. Wh

[jira] [Issue Comment Deleted] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-1739: - Comment: was deleted (was: We explicitly don't let you set an {{AutoDetectParser}} in the config, it's som

[jira] [Issue Comment Deleted] (TIKA-1739) cTAKESParser doesn't work in 1.11

2015-09-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-1739: - Comment: was deleted (was: We explicitly don't let you set an {{AutoDetectParser}} in the config, it's som

[jira] [Issue Comment Deleted] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027

2015-09-18 Thread Luca Perico (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Perico updated TIKA-1735: -- Comment: was deleted (was: Thanks for the reply. What do you mean with "along the lines of the various e

[jira] [Issue Comment Deleted] (TIKA-1731) Try to integrate java-hwp into Tika

2015-09-09 Thread mungeol heo (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mungeol heo updated TIKA-1731: -- Comment: was deleted (was: I have contacted the developer of java-hwp. He said he doesn't well know about

[jira] [Issue Comment Deleted] (TIKA-1728) Detection is not working properly for detecting HWP 5.0 file

2015-09-03 Thread mungeol heo (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mungeol heo updated TIKA-1728: -- Comment: was deleted (was: I have tested r1700986. It is working. Thank you. ) > Detection is not worki

[jira] [Issue Comment Deleted] (TIKA-1729) OCR in PDF files

2015-09-03 Thread Loris Bachert (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Loris Bachert updated TIKA-1729: Comment: was deleted (was: Unfortunately i already tried this solution and it's not working aswell.

[jira] [Issue Comment Deleted] (TIKA-330) Better HWP (Hangul Word Processor) detection pattern

2015-09-01 Thread mungeol heo (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mungeol heo updated TIKA-330: - Comment: was deleted (was: HWP file has two file formats now which are HWP 3.0 and HWP 5.0. The signature st

[jira] [Issue Comment Deleted] (TIKA-1713) RTF parser misses text content

2015-08-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1713: -- Comment: was deleted (was: Y. Figured as much. Got it. Oh, Symantec EV...That helps. I might be able

[jira] [Issue Comment Deleted] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-08-20 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann updated TIKA-1607: Comment: was deleted (was: Hi Ray. I'm not sure what shoehorn the index into the group - do y

[jira] [Issue Comment Deleted] (TIKA-1706) Bring back commons-io to tika-core

2015-08-15 Thread Yaniv Kunda (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1706: -- Comment: was deleted (was: A patch to bring back commons-io to tika-core and replace all formerly inline

[jira] [Issue Comment Deleted] (TIKA-1524) Can install Tika-Bundle, missing JUnit dependency

2015-07-27 Thread Bob Paulin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob Paulin updated TIKA-1524: - Comment: was deleted (was: [~talli...@mitre.org] Any chance of getting this in for 1.10? I still can't sp

[jira] [Issue Comment Deleted] (TIKA-1238) Update OutlookExtractor to handle codepage identification more rigorously

2015-07-20 Thread Magesh Tarala (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Magesh Tarala updated TIKA-1238: Comment: was deleted (was: Tim - I've attached a file. Could you download it and we can delete from

[jira] [Issue Comment Deleted] (TIKA-1400) Extract Excel (xls, xlsx) headers and footers

2015-06-30 Thread Aeham Abushwashi (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aeham Abushwashi updated TIKA-1400: --- Comment: was deleted (was: Patch + test files) > Extract Excel (xls, xlsx) headers and footers

[jira] [Issue Comment Deleted] (TIKA-1315) Basic list support in WordExtractor

2015-05-28 Thread Moritz Dorka (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moritz Dorka updated TIKA-1315: --- Comment: was deleted (was: The case of "none" for the numberText of the current level (i.e. its nfc eq

[jira] [Issue Comment Deleted] (TIKA-1315) Basic list support in WordExtractor

2015-05-28 Thread Moritz Dorka (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Moritz Dorka updated TIKA-1315: --- Comment: was deleted (was: The case of "none" for the numberText of the current level (i.e. its nfc eq

[jira] [Issue Comment Deleted] (TIKA-1022) DWG Custom properties not extracted

2015-04-07 Thread Paolo Nacci (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paolo Nacci updated TIKA-1022: -- Comment: was deleted (was: Regression in DWGProperties.java (from Rev. 1407682). 96c96,101 < private

[jira] [Issue Comment Deleted] (TIKA-1589) Mp3 parser does not add duration to metadata if there are no ID3 tags

2015-03-31 Thread Max Daniline (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Daniline updated TIKA-1589: --- Comment: was deleted (was: I've raised a PR to fix this: https://github.com/apache/tika/pull/38) > Mp3

[jira] [Issue Comment Deleted] (TIKA-1440) Auto-Paragraph numbers not extracted from Word Document

2015-03-25 Thread Steve Gullion (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Gullion updated TIKA-1440: Comment: was deleted (was: I guess the comments don't support indentation either, ha.) > Auto-Paragr

[jira] [Issue Comment Deleted] (TIKA-1575) Upgrade to PDFBox 1.8.9 when available

2015-03-19 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tilman Hausherr updated TIKA-1575: -- Comment: was deleted (was: With the pure ExtractText, all is identical. Could you attach the file

[jira] [Issue Comment Deleted] (TIKA-1532) DIF Parser

2015-03-08 Thread Aakarsh Medleri Hire Math (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aakarsh Medleri Hire Math updated TIKA-1532: Comment: was deleted (was: mime-type detection for DIF Parser) > DIF Parser

[jira] [Issue Comment Deleted] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-25 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1561: -- Comment: was deleted (was: sample dif file) > GCMD Directory Interchange Format (.dif) identification >

[jira] [Issue Comment Deleted] (TIKA-1552) Pdf document parser

2015-02-18 Thread Konstantin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin updated TIKA-1552: - Comment: was deleted (was: on the left side is the parsed text in text editor) > Pdf document parser > ---

[jira] [Issue Comment Deleted] (TIKA-1511) Create a parser for SQLite3

2015-02-13 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Gribov updated TIKA-1511: Comment: was deleted (was: [~talli...@mitre.org], r1659547 work fine. Tests for sqlite3 pass.

[jira] [Issue Comment Deleted] (TIKA-1541) StringsParser: a simple strings-based parser for Tika

2015-02-05 Thread Giuseppe Totaro (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giuseppe Totaro updated TIKA-1541: -- Comment: was deleted (was: tika-1541.patch (preliminary work)) > StringsParser: a simple strings

[jira] [Issue Comment Deleted] (TIKA-1538) Wrong mimetype detection

2015-02-03 Thread Miguel (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miguel updated TIKA-1538: - Comment: was deleted (was: Troublesome image file) > Wrong mimetype detection > > >

[jira] [Issue Comment Deleted] (TIKA-1533) PDF parse failing to capture right order of text (2 columns)

2015-01-28 Thread Tamara (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tamara updated TIKA-1533: - Comment: was deleted (was: Thank you for the help Tim, next time I will post directly to them. Here is the issue o

[jira] [Issue Comment Deleted] (TIKA-1530) MP4Parser parses duration but does not set it

2015-01-26 Thread JIRA
[ https://issues.apache.org/jira/browse/TIKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oskar Wickström updated TIKA-1530: -- Comment: was deleted (was: https://github.com/apache/tika/pull/25) > MP4Parser parses duration b

[jira] [Issue Comment Deleted] (TIKA-1517) MIME type selection with probability

2015-01-14 Thread Shuai Liu (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuai Liu updated TIKA-1517: Comment: was deleted (was: Proposed design: The idea of selection is to incorporate probability as weights on

[jira] [Issue Comment Deleted] (TIKA-1504) TikaCoreProperties.DATE not populated for XML files

2015-01-02 Thread Badger (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Badger updated TIKA-1504: - Comment: was deleted (was: I've checked out the Tika source tree and ran the unit tests for DcXmlParser, everythin

[jira] [Issue Comment Deleted] (TIKA-1485) Wrong mimetype detection

2014-11-20 Thread Konstantin (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin updated TIKA-1485: - Comment: was deleted (was: {quote}If Tika doesn't know what something is from the mime magic (or from cont

[jira] [Issue Comment Deleted] (TIKA-1464) Too many open files in system when parsing thousands of files

2014-11-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1464: -- Comment: was deleted (was: On Windows 7 with Tika 1.7-SNAPSHOT, on a batch of 3k msg files that have man

[jira] [Issue Comment Deleted] (TIKA-1435) Update rome dependency to 1.5

2014-10-06 Thread Johannes Mockenhaupt (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Mockenhaupt updated TIKA-1435: --- Comment: was deleted (was: Changed dependencies netcdf 4.2.20 -> 4.3.22) > Update rome

[jira] [Issue Comment Deleted] (TIKA-1435) Update rome dependency to 1.5

2014-10-03 Thread Johannes Mockenhaupt (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Mockenhaupt updated TIKA-1435: --- Comment: was deleted (was: PR: https://github.com/apache/tika/pull/16) > Update rome d

[jira] [Issue Comment Deleted] (TIKA-1303) Parsing Html page (not well formed) containing two title tags results in metadata (title) to be overwritten

2014-06-09 Thread Ashish Sood (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Sood updated TIKA-1303: -- Comment: was deleted (was: I am currently out of the office, returning on Monday 9 June 2014. If your e

[jira] [Issue Comment Deleted] (TIKA-1274) ENVI header parser

2014-04-28 Thread Ann Burgess (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1274: -- Comment: was deleted (was: Hey Chris, How is your week looking? Want to set a time to do a chat? I'm ac

  1   2   3   >