[jira] [Reopened] (TIKA-1329) Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

2015-01-15 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch reopened TIKA-1329: -- I'm re-opening this, as while we have the RecursiveParserWrapper, we don't yet have anything i

Re: [VOTE] Apache Tika 1.7 Release

2015-01-14 Thread Nick Burch
On Wed, 14 Jan 2015, Tyler Palsulich wrote: Nick, thanks for building the site! We still need to rebuild the index, right? You'll need to build the 1.7 index page (based on the changelog), then update the download page + homepage + menu, and finally rebuild the site (All I did was finish off

[jira] [Commented] (TIKA-241) Rar archive support

2015-01-14 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277091#comment-14277091 ] Nick Burch commented on TIKA-241: - In r1651709, I've added the unrar license

Re: [VOTE] Apache Tika 1.7 Release

2015-01-14 Thread Nick Burch
On Fri, 9 Jan 2015, Tyler Palsulich wrote: A candidate for the Tika 1.7 release is available at: https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/1.7-rc3/ All looks good to me (signatures, has

[jira] [Commented] (TIKA-1509) Create configurable strategies for composite parsers

2015-01-14 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276936#comment-14276936 ] Nick Burch commented on TIKA-1509: -- Passing a strategy to CompositeParser, then ha

[jira] [Resolved] (TIKA-241) Rar archive support

2015-01-14 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-241. - Resolution: Fixed Fix Version/s: 1.8 > Rar archive supp

[jira] [Commented] (TIKA-241) Rar archive support

2015-01-14 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276849#comment-14276849 ] Nick Burch commented on TIKA-241: - Thanks, applied with a few tweaks (mostly for the

[jira] [Commented] (TIKA-1509) Create configurable strategies for composite parsers

2015-01-13 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276105#comment-14276105 ] Nick Burch commented on TIKA-1509: -- First up is probably some sort of compo

[jira] [Commented] (TIKA-1515) Old XLS 3 parsing is not working on some documents

2015-01-13 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276053#comment-14276053 ] Nick Burch commented on TIKA-1515: -- Hopefully fixed in Apache POI in r1651517 - it s

[jira] [Commented] (TIKA-1511) Create a parser for SQLite3

2015-01-13 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275508#comment-14275508 ] Nick Burch commented on TIKA-1511: -- If we're going to do a general jdbc opti

[jira] [Commented] (TIKA-1511) Create a parser for SQLite3

2015-01-12 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274159#comment-14274159 ] Nick Burch commented on TIKA-1511: -- Just to be sure, since SQLite doesn't show

[jira] [Commented] (TIKA-1512) WordParser fails on many Word files

2015-01-12 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273681#comment-14273681 ] Nick Burch commented on TIKA-1512: -- What about subsequent runs - I'm wondering

[jira] [Commented] (TIKA-1512) WordParser fails on many Word files

2015-01-12 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273586#comment-14273586 ] Nick Burch commented on TIKA-1512: -- I worry that might be solving the symptom not

[jira] [Commented] (TIKA-1512) WordParser fails on many Word files

2015-01-12 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273486#comment-14273486 ] Nick Burch commented on TIKA-1512: -- Do you have a very small sample file that trig

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2015-01-09 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271599#comment-14271599 ] Nick Burch commented on TIKA-1445: -- Please open a ticket for the excel 3 issue, an

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2015-01-08 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14269765#comment-14269765 ] Nick Burch commented on TIKA-1445: -- If we're going to close this for 1.7, then w

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2015-01-07 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267792#comment-14267792 ] Nick Burch commented on TIKA-1445: -- [~lfcnassif] Longer term we'll have differe

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2015-01-07 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267773#comment-14267773 ] Nick Burch commented on TIKA-1445: -- The only other parser that uses ExternalParse

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2015-01-07 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267756#comment-14267756 ] Nick Burch commented on TIKA-1445: -- I've no idea why the fork parser is failing

[jira] [Commented] (TIKA-1507) Under OSGi, ForkParser failes to send core parser classes like ExternalParser

2015-01-07 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267754#comment-14267754 ] Nick Burch commented on TIKA-1507: -- To reproduce this, remove the try/c

[jira] [Created] (TIKA-1507) Under OSGi, ForkParser failes to send core parser classes like ExternalParser

2015-01-07 Thread Nick Burch (JIRA)
Nick Burch created TIKA-1507: Summary: Under OSGi, ForkParser failes to send core parser classes like ExternalParser Key: TIKA-1507 URL: https://issues.apache.org/jira/browse/TIKA-1507 Project: Tika

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2015-01-07 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267643#comment-14267643 ] Nick Burch commented on TIKA-1445: -- Ah, true, I hadn't thought so much about t

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2015-01-07 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267584#comment-14267584 ] Nick Burch commented on TIKA-1445: -- As of r1650051, I think we're correctly han

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2015-01-07 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267553#comment-14267553 ] Nick Burch commented on TIKA-1445: -- I wonder if it wouldn't be better to d

Re: [VOTE] Apache Tika 1.7 Release

2015-01-06 Thread Nick Burch
On Tue, 6 Jan 2015, Tyler Palsulich wrote: A candidate for the Tika 1.7 release is available at: https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/1.7-rc2/ The SHA1 checksum of the archive is

[jira] [Commented] (TIKA-1504) TikaCoreProperties.DATE not populated for XML files

2015-01-05 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264909#comment-14264909 ] Nick Burch commented on TIKA-1504: -- The file system attributes typically refer to

Re: 1.7 release? | potential blocker?

2015-01-05 Thread Nick Burch
On Mon, 5 Jan 2015, Tyler Palsulich wrote: Works for me. I got stalled midway through the process of getting RC#1 out (authentication issues). But, going to try to finish it right now (best way to upload to dist.apache.org? That's a svn checkout For the RC, assuming it's the same process as fo

[jira] [Commented] (TIKA-879) Detection problem: message/rfc822 file is detected as text/plain.

2014-12-23 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257888#comment-14257888 ] Nick Burch commented on TIKA-879: - I've done something a little different in r164

[jira] [Assigned] (TIKA-879) Detection problem: message/rfc822 file is detected as text/plain.

2014-12-23 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch reassigned TIKA-879: --- Assignee: (was: Nick Burch) > Detection problem: message/rfc822 file is detected as text/pl

[jira] [Assigned] (TIKA-879) Detection problem: message/rfc822 file is detected as text/plain.

2014-12-23 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch reassigned TIKA-879: --- Assignee: Nick Burch > Detection problem: message/rfc822 file is detected as text/pl

[jira] [Resolved] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1502. -- Resolution: Fixed Fix Version/s: 1.7 In r1647489 I've re-ordered the MediaTypeRegistry logi

[jira] [Commented] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256655#comment-14256655 ] Nick Burch commented on TIKA-1502: -- As of r1647486, we now have mime types for SQL

[jira] [Created] (TIKA-1502) Mime magic for database file formats

2014-12-22 Thread Nick Burch (JIRA)
Nick Burch created TIKA-1502: Summary: Mime magic for database file formats Key: TIKA-1502 URL: https://issues.apache.org/jira/browse/TIKA-1502 Project: Tika Issue Type: Improvement

[jira] [Resolved] (TIKA-1490) Basic parser for old Excel files (eg Excel 4)

2014-12-21 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1490. -- Resolution: Fixed Fix Version/s: 1.7 Fixed in r1647243. There are still a few bits not supported

[jira] [Commented] (TIKA-976) Inaccurate XLS detection trough POIFSContainerDetector

2014-12-21 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255450#comment-14255450 ] Nick Burch commented on TIKA-976: - This is now being handled fully through TIKA-

[jira] [Resolved] (TIKA-1469) Upgrade to POI 3.11-beta3 when available

2014-12-21 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1469. -- Resolution: Fixed Fix Version/s: 1.7 Patch applied in r1647234, thanks everyone! We have TIKA

[jira] [Commented] (TIKA-1469) Upgrade to POI 3.11-beta3 when available

2014-12-21 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255329#comment-14255329 ] Nick Burch commented on TIKA-1469: -- I've raised TIKA-1501 for the bundle disa

[jira] [Created] (TIKA-1501) Fix the disabled Tika Bundle OSGi related unit tests

2014-12-21 Thread Nick Burch (JIRA)
Nick Burch created TIKA-1501: Summary: Fix the disabled Tika Bundle OSGi related unit tests Key: TIKA-1501 URL: https://issues.apache.org/jira/browse/TIKA-1501 Project: Tika Issue Type

[jira] [Commented] (TIKA-1390) Create tika-example module

2014-12-19 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254568#comment-14254568 ] Nick Burch commented on TIKA-1390: -- As of r1646923, I've added the main exampl

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2014-12-18 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252973#comment-14252973 ] Nick Burch commented on TIKA-1445: -- In r1646624 I've added what I think shou

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2014-12-18 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252952#comment-14252952 ] Nick Burch commented on TIKA-1445: -- For 1.7, how about we just have the Tesseract Pa

Re: Subsets of tika parsers redux

2014-12-15 Thread Nick Burch
On Mon, 15 Dec 2014, Sergey Beryozkin wrote: I'm not proposing to split tika-parsers in a way that would affect the users, tika-parsers would still be there, except that it would strongly depend on tika-pdf and perhaps, when it is being built, it can have its dependencies like tika-pdf shaded i

Re: Subsets of tika parsers redux

2014-12-15 Thread Nick Burch
On Mon, 15 Dec 2014, Sergey Beryozkin wrote: OSGi users would pick tika + tika-parsers, or tika + tika-parsers-pdf, or tika + tika-parsers-pdf + tika-parsers-mp3 if they want OSGi is nicely contained, and fairly easy to unit test, so let's use that to test out the idea! That also solves the CXF

Re: Subsets of tika parsers redux

2014-12-15 Thread Nick Burch
On Mon, 15 Dec 2014, Sergey Beryozkin wrote: As a first step, I thought we'd still keep the same tika-parser jar, the only difference would be what dependencies ended up in the bundle. If the tika-bundle-pdf has no POI jars included in it, then the Microsoft Office related parsers shouldn't regis

[jira] [Commented] (TIKA-1495) Parser for BPG (Better Portable Graphics) format

2014-12-14 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246404#comment-14246404 ] Nick Burch commented on TIKA-1495: -- As of r1645588, we have support for the very ba

[jira] [Resolved] (TIKA-1494) JAXRS server: allow passing PDF password in the request

2014-12-14 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1494. -- Resolution: Fixed Fix Version/s: 1.7 Support added, along with a unit test, in r1645575. The

One for our XMP experts - Property with indexed closed choice?

2014-12-14 Thread Nick Burch
Hi All I'm trying to add photoshop:ColorMode as a new Metadata Property. It's on page 32 of the XMP spec part 2: http://wwwimages.adobe.com/content/dam/Adobe/en/devnet/xmp/pdfs/cc-201306/XMPSpecificationPart2.pdf photoshop:ColorMode * Closed Choice of Integer * The colour mode. One of: 0 = B

[jira] [Resolved] (TIKA-1491) Identification of BPG (Better Portable Graphics) format

2014-12-12 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1491. -- Resolution: Fixed Test files added, along with a unit test, in r1644868. Thanks for the files! I&#x

[jira] [Created] (TIKA-1495) Parser for BPG (Better Portable Graphics) format

2014-12-12 Thread Nick Burch (JIRA)
Nick Burch created TIKA-1495: Summary: Parser for BPG (Better Portable Graphics) format Key: TIKA-1495 URL: https://issues.apache.org/jira/browse/TIKA-1495 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-1494) JAXRS server: allow passing PDF password in the request

2014-12-12 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243921#comment-14243921 ] Nick Burch commented on TIKA-1494: -- My hunch is that the place to add this support w

[jira] [Commented] (TIKA-1493) Update for JAXRS page with details on passing password

2014-12-12 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243916#comment-14243916 ] Nick Burch commented on TIKA-1493: -- Karma granted, enjoy! > Update for JAXRS pa

[jira] [Commented] (TIKA-1493) Update for JAXRS page with details on passing password

2014-12-11 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243183#comment-14243183 ] Nick Burch commented on TIKA-1493: -- As explained on the front page of the Tika wiki,

[jira] [Commented] (TIKA-1491) Identification of BPG (Better Portable Graphics) format

2014-12-11 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242394#comment-14242394 ] Nick Burch commented on TIKA-1491: -- Thanks, mime magic added in r1644596. Any chance

[jira] [Commented] (TIKA-1491) Identification of BPG (Better Portable Graphics) format

2014-12-09 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240534#comment-14240534 ] Nick Burch commented on TIKA-1491: -- Normally the x- prefix is used for unofficial t

Re: Using Tika to compile glossaries in commercial software

2014-12-06 Thread Nick Burch
On Sat, 6 Dec 2014, Emmanuel Ichbiah wrote: What do weed need to distribute besides the jar file to be compliant with your licence agreement ? The Apache Software License is fairly easy to read as a non-lawyer, so your best answer is likely to come from reading that! The main section of inte

[jira] [Commented] (TIKA-1489) PDF Text extraction without permission

2014-12-03 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233064#comment-14233064 ] Nick Burch commented on TIKA-1489: -- Can someone find an existing, externally well def

[jira] [Commented] (TIKA-1489) PDF Text extraction without permission

2014-12-01 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229867#comment-14229867 ] Nick Burch commented on TIKA-1489: -- Can someone pull together a list of co

[jira] [Commented] (TIKA-1489) PDF Text extraction without permission

2014-12-01 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229768#comment-14229768 ] Nick Burch commented on TIKA-1489: -- If we make the change, then all sorts of things

[jira] [Commented] (TIKA-1490) Basic parser for old Excel files (eg Excel 4)

2014-11-30 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229169#comment-14229169 ] Nick Burch commented on TIKA-1490: -- Codepage support is in there now, as is somet

[jira] [Commented] (TIKA-1490) Basic parser for old Excel files (eg Excel 4)

2014-11-29 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228990#comment-14228990 ] Nick Burch commented on TIKA-1490: -- As of r1642497, there is now a basic parser pre

[jira] [Assigned] (TIKA-1490) Basic parser for old Excel files (eg Excel 4)

2014-11-29 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch reassigned TIKA-1490: Assignee: Nick Burch > Basic parser for old Excel files (eg Exce

[jira] [Created] (TIKA-1490) Basic parser for old Excel files (eg Excel 4)

2014-11-28 Thread Nick Burch (JIRA)
Nick Burch created TIKA-1490: Summary: Basic parser for old Excel files (eg Excel 4) Key: TIKA-1490 URL: https://issues.apache.org/jira/browse/TIKA-1490 Project: Tika Issue Type: Improvement

[jira] [Resolved] (TIKA-1487) Add mime for pre-OLE2 xls file

2014-11-27 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1487. -- Resolution: Fixed Fix Version/s: 1.7 > Add mime for pre-OLE2 xls f

[jira] [Commented] (TIKA-1487) Add mime for pre-OLE2 xls file

2014-11-27 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227664#comment-14227664 ] Nick Burch commented on TIKA-1487: -- Mime magic added in r1642152. I think, based on

[jira] [Commented] (TIKA-1486) Minor issues with the Tika MIME type magic file

2014-11-27 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14227634#comment-14227634 ] Nick Burch commented on TIKA-1486: -- If you'd care to send in a patch for the

[jira] [Commented] (TIKA-1489) PDF Text extraction without permission

2014-11-26 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225988#comment-14225988 ] Nick Burch commented on TIKA-1489: -- I would consider this a feature rather than a

[jira] [Commented] (TIKA-1469) Upgrade to POI 3.11-beta3 when available

2014-11-25 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225373#comment-14225373 ] Nick Burch commented on TIKA-1469: -- We have a slightly worrying number of {{// TODO

Re: Subsets of tika parsers redux

2014-11-25 Thread Nick Burch
On Mon, 24 Nov 2014, Sergey Beryozkin wrote: It is an interesting idea, one that can lead to introducing finer-grained bundles but also providing a mechanism for the (auto-)generation of the import metadata required by each of the parser modules. Besides, introducing several smaller bundles tha

[jira] [Commented] (TIKA-1486) Minor issues with the Tika MIME type magic file

2014-11-25 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224724#comment-14224724 ] Nick Burch commented on TIKA-1486: -- I don't know the history behind the regexp g

[jira] [Commented] (TIKA-1486) Minor issues with the Tika MIME type magic file

2014-11-25 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224661#comment-14224661 ] Nick Burch commented on TIKA-1486: -- I think we have a slightly expanded mimetype fo

[jira] [Commented] (TIKA-1481) TikaJAXRS get metadata calls give different results

2014-11-25 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224417#comment-14224417 ] Nick Burch commented on TIKA-1481: -- If you drop an email to user-su

[jira] [Comment Edited] (TIKA-1469) Upgrade to POI 3.11-beta3 when available

2014-11-23 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216238#comment-14216238 ] Nick Burch edited comment on TIKA-1469 at 11/23/14 9:3

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2014-11-23 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14222510#comment-14222510 ] Nick Burch commented on TIKA-1445: -- I quite like Tim's idea. We can have th

Subsets of tika parsers redux

2014-11-23 Thread Nick Burch
Hi All During ApacheCon, I had a chance to chat with Sergey about the "subset of Tika Parsers" issue that bubbles up from time to time. It seemed to work well, and I think we both now have a better idea of the other's needs and concerns, which is good :) As is shown on our list from time to

[jira] [Commented] (TIKA-1485) Wrong mimetype detection

2014-11-20 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219617#comment-14219617 ] Nick Burch commented on TIKA-1485: -- If Tika doesn't know what something is from

[jira] [Commented] (TIKA-1485) Wrong mimetype detection

2014-11-20 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219561#comment-14219561 ] Nick Burch commented on TIKA-1485: -- Does that still happen if you don't pa

Re: svn commit: r1640535 - /tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaResource. java

2014-11-19 Thread Nick Burch
On Wed, 19 Nov 2014, Tyler Palsulich wrote: It looks like imports are being reordered here. I think we decided (can't find an archive link right now) on java and javax imports before others. Everything we wrote down is here: http://tika.apache.org/contribute.html#Code_Formatting Nothing there

[jira] [Commented] (TIKA-1482) ForkParser throws exceptions when process some large pdf files

2014-11-19 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217609#comment-14217609 ] Nick Burch commented on TIKA-1482: -- That looks like a pdfbox issue Can you try a

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2014-11-18 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216466#comment-14216466 ] Nick Burch commented on TIKA-1445: -- Anyone using tika-parser OOTB has two par

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2014-11-18 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216444#comment-14216444 ] Nick Burch commented on TIKA-1445: -- I think it's fairly common for people to

[jira] [Commented] (TIKA-1469) Upgrade to POI 3.11-beta3 when available

2014-11-18 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216238#comment-14216238 ] Nick Burch commented on TIKA-1469: -- 3.11 beta 3 has finally hit maven central (to

[jira] [Commented] (TIKA-1482) ForkParser throws exceptions when process some large pdf files

2014-11-18 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216034#comment-14216034 ] Nick Burch commented on TIKA-1482: -- What's the full exception and s

[jira] [Commented] (TIKA-1445) Figure out how to add Image metadata extraction to Tesseract parser

2014-11-17 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215292#comment-14215292 ] Nick Burch commented on TIKA-1445: -- > +1 to respect the order of parsers in the

Re: Move definitively from SVN to Git ?

2014-11-17 Thread Nick Burch
On Mon, 17 Nov 2014, Hong-Thai Nguyen wrote: Yes, that's exactly I'm doing. If we move to Git, we'll avoid all SVN stuff. Anyway, this concerns commiters only. If we move to git, people who currently use SVN have to change though! Given that non-committers can already work with Git, could you

Re: Move definitively from SVN to Git ?

2014-11-17 Thread Nick Burch
On Mon, 17 Nov 2014, Hong-Thai Nguyen wrote: I didn't realize that we could commit/push directly into git repo. Could we ? Master source is still SVN. However, committers can (and at least some do) work on a clone of the Git repo, and use GitSVN to push their changes to the SVN repo as commit

Re: Move definitively from SVN to Git ?

2014-11-17 Thread Nick Burch
On Mon, 17 Nov 2014, Hong-Thai Nguyen wrote: Git is implemented everywhere and profit many new features. Should we abandon SVN repo and move to Git forever to facility apply fixes and contribution ? We already have a git mirror - http://git.apache.org/tika.git/ - and a GitHub mirror which acc

[jira] [Commented] (TIKA-1473) Apache Tika is not working for .docx documents

2014-11-11 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207694#comment-14207694 ] Nick Burch commented on TIKA-1473: -- Any chance of the file that triggers the problem,

[jira] [Updated] (TIKA-1472) Warning on Tika Server startup - Failed to load class "org.slf4j.impl.StaticLoggerBinder"

2014-11-11 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-1472: - Summary: Warning on Tika Server startup - Failed to load class "org.slf4j.impl.StaticLoggerBinder&q

[jira] [Updated] (TIKA-1472) Warning on Tika Server startup - Failed to load class "org.slf4j.impl.StaticLoggerBinder"

2014-11-11 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch updated TIKA-1472: - Component/s: server > Warning on Tika Server startup - Failed to load cl

[jira] [Commented] (TIKA-1470) Error installing Tika

2014-11-11 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206349#comment-14206349 ] Nick Burch commented on TIKA-1470: -- I believe you can safely ignore that warning, un

[jira] [Commented] (TIKA-1470) Error installing Tika

2014-11-11 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206184#comment-14206184 ] Nick Burch commented on TIKA-1470: -- If all you want to do is use Tika, then there&

[jira] [Commented] (TIKA-1468) Symbol character handling in WordExtractor

2014-11-09 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203923#comment-14203923 ] Nick Burch commented on TIKA-1468: -- Any chance of a small junit unit test for

[jira] [Commented] (TIKA-1464) Too many open files in system when parsing thousands of files

2014-11-03 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194685#comment-14194685 ] Nick Burch commented on TIKA-1464: -- Firstly, make sure you're closing the In

RE: PDF test failing on trunk

2014-10-30 Thread Nick Burch
On Thu, 30 Oct 2014, Allison, Timothy B. wrote: I think so. Would you like the honors? You're more of a pdf expert than I am, so maybe you'd be best :) Nick

RE: PDF test failing on trunk

2014-10-30 Thread Nick Burch
On Thu, 30 Oct 2014, Allison, Timothy B. wrote: Ha. Works with an older version of 1.6: java version "1.6.0_30" OpenJDK Runtime Environment (IcedTea6 1.13.1) (rhel-3.1.13.1.el6_5-x86_64) OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode) Joy. Full stracktrace below, maybe one that needs r

RE: PDF test failing on trunk

2014-10-30 Thread Nick Burch
On Thu, 30 Oct 2014, Allison, Timothy B. wrote: The build is working for me on linux and Windows with Java 1.7. Can you tell which file is causing the problem? I wonder if the upgrade to PDFBox 1.8.7 caused the issue? I've just tried with Java 7, and that passes! The JVM it's failing on is:

PDF test failing on trunk

2014-10-29 Thread Nick Burch
Hi All Just tried to build trunk, and got a test failure: Tests in error: testSequentialParser(org.apache.tika.parser.pdf.PDFParserTest): Unable to extract PDF content Tests run: 547, Failures: 0, Errors: 1, Skipped: 7 The exception in the log is: Caused by: java.io.IOException: javax.cr

[jira] [Resolved] (TIKA-1461) Bad mime detection of certain JAR file

2014-10-29 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Burch resolved TIKA-1461. -- Resolution: Fixed Fix Version/s: 1.7 Fixed in r1635263. To be a valid PE file, it needs to start

[jira] [Commented] (TIKA-1461) Bad mime detection of certain JAR file

2014-10-29 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188555#comment-14188555 ] Nick Burch commented on TIKA-1461: -- I've just tried with a recent snapshot b

[jira] [Commented] (TIKA-1461) Bad mime detection of certain JAR file

2014-10-29 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188479#comment-14188479 ] Nick Burch commented on TIKA-1461: -- Do you know the license of that file? And/or

<    6   7   8   9   10   11   12   13   14   15   >