[jira] [Created] (TIKA-1423) Build a parser to extract data from GRIB formats

2014-09-22 Thread Vineet Ghatge (JIRA)
Vineet Ghatge created TIKA-1423: --- Summary: Build a parser to extract data from GRIB formats Key: TIKA-1423 URL: https://issues.apache.org/jira/browse/TIKA-1423 Project: Tika Issue Type: New

[jira] [Comment Edited] (TIKA-1315) Basic list support in WordExtractor

2014-09-22 Thread Moritz Dorka (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142495#comment-14142495 ] Moritz Dorka edited comment on TIKA-1315 at 9/22/14 8:14 AM: -

[jira] [Commented] (TIKA-1421) Tika-Parsers tests fail on CentOS6 if tesseract isn't installed

2014-09-22 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143041#comment-14143041 ] Hong-Thai Nguyen commented on TIKA-1421: Not only CentOS, this test failed also on

[jira] [Updated] (TIKA-1421) Tika-Parsers tests fail on CentOS6 if tesseract isn't installed

2014-09-22 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong-Thai Nguyen updated TIKA-1421: --- Priority: Blocker (was: Major) Tika-Parsers tests fail on CentOS6 if tesseract isn't

[jira] [Commented] (TIKA-1412) NPE in OpenDocumentParser

2014-09-22 Thread Hong-Thai Nguyen (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143043#comment-14143043 ] Hong-Thai Nguyen commented on TIKA-1412: Add a test at r1626706 NPE in

[jira] [Commented] (TIKA-1412) NPE in OpenDocumentParser

2014-09-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143061#comment-14143061 ] Hudson commented on TIKA-1412: -- UNSTABLE: Integrated in tika-trunk-jdk1.6 #202 (See

[jira] [Commented] (TIKA-1412) NPE in OpenDocumentParser

2014-09-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143068#comment-14143068 ] Hudson commented on TIKA-1412: -- UNSTABLE: Integrated in tika-trunk-jdk1.7 #224 (See

RE: NPE on all *.odt, odp, .ods documents

2014-09-22 Thread Hong-Thai Nguyen
Hi, I've added a test for this case at r1626706. We are having TIKA-1421 which blocks the release. Hong-Thai -Message d'origine- De : Ken Krugler [mailto:kkrugler_li...@transpac.com] Envoyé : jeudi 11 septembre 2014 23:07 À : dev@tika.apache.org Objet : RE: NPE on all *.odt, odp, .ods

RE: NPE on all *.odt, odp, .ods documents

2014-09-22 Thread Tyler Palsulich
Hi, TIKA-1422 is related and also a blocker. Both issues are caused by the Tesseract Parser. Once I added the TesseractOCRParser to the META-INF.services list of Parsers in r1626341, the TesseractParser took precedence over the previous ImageParser. I've talked about this with Chris somewhat at

[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

2014-09-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143167#comment-14143167 ] Lewis John McGibbney commented on TIKA-1423: [~vinegh] do you have any

[jira] [Commented] (TIKA-1421) Tika-Parsers tests fail on CentOS6 if tesseract isn't installed

2014-09-22 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143273#comment-14143273 ] Tyler Palsulich commented on TIKA-1421: --- I commented on list, but here is a proposed

[jira] [Updated] (TIKA-1419) Upgrade to PDFBox 1.8.7

2014-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1419: -- Attachment: compare_Tika-trunk-1.7_w_PDFBox1.8.6Vs.1.8.7.csv Upgrade to PDFBox 1.8.7

[jira] [Commented] (TIKA-1419) Upgrade to PDFBox 1.8.7

2014-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143588#comment-14143588 ] Tim Allison commented on TIKA-1419: --- I just finished the run on 50,000 random pdfs from

[jira] [Updated] (TIKA-1423) Build a parser to extract data from GRIB formats

2014-09-22 Thread Ann Burgess (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1423: -- Attachment: GribParser.java Build a parser to extract data from GRIB formats

[jira] [Updated] (TIKA-1423) Build a parser to extract data from GRIB formats

2014-09-22 Thread Ann Burgess (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ann Burgess updated TIKA-1423: -- Attachment: gdas1.forecmwf.2014062612.grib2 Build a parser to extract data from GRIB formats

[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

2014-09-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143605#comment-14143605 ] Chris A. Mattmann commented on TIKA-1423: - [~annieburgess] FTW Build a parser to

[jira] [Created] (TIKA-1424) Clear PDFont's resources after each file to prevent memory leak

2014-09-22 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1424: - Summary: Clear PDFont's resources after each file to prevent memory leak Key: TIKA-1424 URL: https://issues.apache.org/jira/browse/TIKA-1424 Project: Tika Issue

[jira] [Created] (TIKA-1425) Automatic batching of Microsoft service calls

2014-09-22 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created TIKA-1425: -- Summary: Automatic batching of Microsoft service calls Key: TIKA-1425 URL: https://issues.apache.org/jira/browse/TIKA-1425 Project: Tika Issue

[jira] [Updated] (TIKA-1425) Automatic batching of Microsoft service calls

2014-09-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-1425: --- Description: Right now when I use the following code I get the stack trace at the

Tika at ApacheCon Europe - 2 months time!

2014-09-22 Thread Nick Burch
Hi All It's only 2 months to go until ApacheCon Europe in Budapest. I'm simultaneously exciting by all the great Tika stuff going on, and worried by how many talks I need to finish writing... As usual for an ApacheCon, we've a number of talks about Tika going on, and almost certainly a

[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

2014-09-22 Thread Vineet Ghatge (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144013#comment-14144013 ] Vineet Ghatge commented on TIKA-1423: - [~lewismc] I have to take a look at parser

[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

2014-09-22 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144024#comment-14144024 ] Lewis John McGibbney commented on TIKA-1423: Excellent -- *Lewis* Build

[jira] [Comment Edited] (TIKA-1419) Upgrade to PDFBox 1.8.7

2014-09-22 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143588#comment-14143588 ] Tim Allison edited comment on TIKA-1419 at 9/23/14 1:23 AM: I

[jira] [Created] (TIKA-1426) Let's allow users to specify a tika config file on the commandline for tika-app and tika-server

2014-09-22 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1426: - Summary: Let's allow users to specify a tika config file on the commandline for tika-app and tika-server Key: TIKA-1426 URL: https://issues.apache.org/jira/browse/TIKA-1426

[jira] [Commented] (TIKA-1420) Add Metadata Extraction to Arbitrary Parsers

2014-09-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144190#comment-14144190 ] Chris A. Mattmann commented on TIKA-1420: - Nick that's a great idea. Let's work

TikaEntityProcessor stripping all xml tags

2014-09-22 Thread keeblerh
I had posted this on the solr-user forum but have received no replies so I thought I would try here next. thanks. I'm processing a zip file with an xml file. The TikaEntityProcessor opens the zip, reads the file but is stripping the xml tags even though I have supplied the htmlMapper=identity

[jira] [Resolved] (TIKA-1421) Tika-Parsers tests fail on CentOS6 if tesseract isn't installed

2014-09-22 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved TIKA-1421. - Resolution: Fixed - committed in r1626932 thanks to [~tpalsulich] apparently mail tests

[jira] [Commented] (TIKA-1421) Tika-Parsers tests fail on CentOS6 if tesseract isn't installed

2014-09-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144299#comment-14144299 ] Hudson commented on TIKA-1421: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #203 (See

[jira] [Commented] (TIKA-1421) Tika-Parsers tests fail on CentOS6 if tesseract isn't installed

2014-09-22 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144330#comment-14144330 ] Hudson commented on TIKA-1421: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #225 (See