[jira] [Created] (TIKA-1533) PDF parse failing to capture right order of text (2 columns)

2015-01-28 Thread Tamara (JIRA)
Tamara created TIKA-1533: Summary: PDF parse failing to capture right order of text (2 columns) Key: TIKA-1533 URL: https://issues.apache.org/jira/browse/TIKA-1533 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-1533) PDF parse failing to capture right order of text (2 columns)

2015-01-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295159#comment-14295159 ] Tim Allison commented on TIKA-1533: --- In the first document, printed page 303/pdf page 152

[jira] [Commented] (TIKA-1511) Create a parser for SQLite3

2015-01-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295165#comment-14295165 ] Tim Allison commented on TIKA-1511: --- Ok, great. We just added the RecursiveParserWrapper

[jira] [Updated] (TIKA-1533) PDF parse failing to capture right order of text (2 columns)

2015-01-28 Thread Tamara (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tamara updated TIKA-1533: - Description: When I am converting a document with two columns the order of the columns are inverted in the text fi

[jira] [Commented] (TIKA-1533) PDF parse failing to capture right order of text (2 columns)

2015-01-28 Thread Tamara (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295290#comment-14295290 ] Tamara commented on TIKA-1533: -- No, not yet. Only tika 1.6, 1.7 and the PDFXStream. I have a

[jira] [Commented] (TIKA-1533) PDF parse failing to capture right order of text (2 columns)

2015-01-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295340#comment-14295340 ] Tim Allison commented on TIKA-1533: --- I'm getting the same "mis"-ordering with PDFBox 1.8.

[jira] [Commented] (TIKA-1521) Handle password protected 7zip files

2015-01-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295376#comment-14295376 ] Tyler Palsulich commented on TIKA-1521: --- Ah, I missed that comment. The test also pas

[jira] [Comment Edited] (TIKA-1521) Handle password protected 7zip files

2015-01-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295376#comment-14295376 ] Tyler Palsulich edited comment on TIKA-1521 at 1/28/15 4:46 PM: -

[jira] [Commented] (TIKA-1521) Handle password protected 7zip files

2015-01-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295380#comment-14295380 ] Tim Allison commented on TIKA-1521: --- That's why I opened COMPRESS-299. :) Not sure, yet

[jira] [Commented] (TIKA-1533) PDF parse failing to capture right order of text (2 columns)

2015-01-28 Thread Tamara (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295460#comment-14295460 ] Tamara commented on TIKA-1533: -- Thank you for the help Tim, next time I will post directly to

[jira] [Commented] (TIKA-1533) PDF parse failing to capture right order of text (2 columns)

2015-01-28 Thread Tamara (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295459#comment-14295459 ] Tamara commented on TIKA-1533: -- Thank you for the help Tim, next time I will post directly to

[jira] [Issue Comment Deleted] (TIKA-1533) PDF parse failing to capture right order of text (2 columns)

2015-01-28 Thread Tamara (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tamara updated TIKA-1533: - Comment: was deleted (was: Thank you for the help Tim, next time I will post directly to them. Here is the issue o

[jira] [Reopened] (TIKA-1518) Docker with Tika Server

2015-01-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tyler Palsulich reopened TIKA-1518: --- Reopening as suggested above. 1. I'm thinking we can place the Dockerfile in trunk/tika-server? Th

[jira] [Commented] (TIKA-1533) PDF parse failing to capture right order of text (2 columns)

2015-01-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295489#comment-14295489 ] Tim Allison commented on TIKA-1533: --- Always happy to pass the buck. ;) But seriously, th

[jira] [Commented] (TIKA-1532) DIF Parser

2015-01-28 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295498#comment-14295498 ] Nick Burch commented on TIKA-1532: -- For the mimetype part, do you have a small sample file

[jira] [Commented] (TIKA-1521) Handle password protected 7zip files

2015-01-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295499#comment-14295499 ] Tim Allison commented on TIKA-1521: --- Take a look at [comment 14295473|https://issues.apa

[jira] [Comment Edited] (TIKA-1521) Handle password protected 7zip files

2015-01-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295499#comment-14295499 ] Tim Allison edited comment on TIKA-1521 at 1/28/15 6:02 PM: Tak

[jira] [Commented] (TIKA-1521) Handle password protected 7zip files

2015-01-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295619#comment-14295619 ] Tim Allison commented on TIKA-1521: --- Added conditional testing in r1655431. Build worked

[jira] [Created] (TIKA-1534) Upgrade to Commons Compress 1.9

2015-01-28 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1534: - Summary: Upgrade to Commons Compress 1.9 Key: TIKA-1534 URL: https://issues.apache.org/jira/browse/TIKA-1534 Project: Tika Issue Type: Improvement Repo

[jira] [Resolved] (TIKA-1534) Upgrade to Commons Compress 1.9

2015-01-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1534. --- Resolution: Fixed r1655433. > Upgrade to Commons Compress 1.9 > --- > >

[jira] [Commented] (TIKA-1534) Upgrade to Commons Compress 1.9

2015-01-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295629#comment-14295629 ] Tim Allison commented on TIKA-1534: --- While waiting for 1.10, may as well upgrade to 1.9

[jira] [Commented] (TIKA-1521) Handle password protected 7zip files

2015-01-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295750#comment-14295750 ] Hudson commented on TIKA-1521: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #457 (See [https://b

[jira] [Commented] (TIKA-1534) Upgrade to Commons Compress 1.9

2015-01-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295749#comment-14295749 ] Hudson commented on TIKA-1534: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #457 (See [https://b

[jira] [Resolved] (TIKA-1329) Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

2015-01-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1329. --- Resolution: Fixed r1655449. Added a few examples. > Add RecursiveParserWrapper aka Jukka's (and Nick'

[jira] [Reopened] (TIKA-1329) Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

2015-01-28 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-1329: --- Wait, do I need to update the webpage, too? Or is that done automatically from tika-examples? > Add Recu

[jira] [Commented] (TIKA-1521) Handle password protected 7zip files

2015-01-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295791#comment-14295791 ] Hudson commented on TIKA-1521: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #442 (See [https://b

[jira] [Commented] (TIKA-1534) Upgrade to Commons Compress 1.9

2015-01-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295789#comment-14295789 ] Hudson commented on TIKA-1534: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #442 (See [https://b

[jira] [Commented] (TIKA-1329) Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

2015-01-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295790#comment-14295790 ] Hudson commented on TIKA-1329: -- SUCCESS: Integrated in tika-trunk-jdk1.6 #442 (See [https://b

[jira] [Commented] (TIKA-1329) Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

2015-01-28 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295800#comment-14295800 ] Nick Burch commented on TIKA-1329: -- Website still needs updating - just use the snippet to

[jira] [Commented] (TIKA-1329) Add RecursiveParserWrapper aka Jukka's (and Nick's) RecursiveMetadataParser

2015-01-28 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295819#comment-14295819 ] Hudson commented on TIKA-1329: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #458 (See [https://b

[jira] [Created] (TIKA-1535) Inheritance modification for the class MIMETypes

2015-01-28 Thread Luke sh (JIRA)
Luke sh created TIKA-1535: - Summary: Inheritance modification for the class MIMETypes Key: TIKA-1535 URL: https://issues.apache.org/jira/browse/TIKA-1535 Project: Tika Issue Type: Improvement

[jira] [Updated] (TIKA-1517) MIME type selection with probability

2015-01-28 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1517: -- Summary: MIME type selection with probability (was: MIME type detection with probability) > MIME type selection

[jira] [Commented] (TIKA-1535) Inheritance modification for the class MIMETypes

2015-01-28 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295922#comment-14295922 ] Luke sh commented on TIKA-1535: --- TIKA-1517, the mime type selection mechanism with probabilit

[jira] [Commented] (TIKA-1517) MIME type selection with probability

2015-01-28 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295928#comment-14295928 ] Luke sh commented on TIKA-1517: --- the probability selection will inherit the class MIMETypes,

[jira] [Comment Edited] (TIKA-1517) MIME type selection with probability

2015-01-28 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14295928#comment-14295928 ] Luke sh edited comment on TIKA-1517 at 1/28/15 11:06 PM: - the proba

[jira] [Commented] (TIKA-1535) Inheritance modification for the class MIMETypes

2015-01-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296084#comment-14296084 ] Tyler Palsulich commented on TIKA-1535: --- Maybe someone else can comment on this too.

[jira] [Commented] (TIKA-1517) MIME type selection with probability

2015-01-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296103#comment-14296103 ] Tyler Palsulich commented on TIKA-1517: --- Hi [~Lukeliush]. Thanks for raising this ide

[jira] [Comment Edited] (TIKA-1517) MIME type selection with probability

2015-01-28 Thread Tyler Palsulich (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296103#comment-14296103 ] Tyler Palsulich edited comment on TIKA-1517 at 1/29/15 12:04 AM:

[jira] [Commented] (TIKA-1423) Build a parser to extract data from GRIB formats

2015-01-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296129#comment-14296129 ] Lewis John McGibbney commented on TIKA-1423: I am working on this and think I h

RE: [jira] [Commented] (TIKA-1535) Inheritance modification for the class MIMETypes

2015-01-28 Thread Luke
Hi Professor and all, Bayesian or machine learning Detector is different from Bayesian Selection mechanism reported in TIKA-1517. It would make sense if we implemented a machine learning algorithm in separate Detector class, I have not gone too far with this design thought, as I am still on th

Re: [jira] [Commented] (TIKA-1535) Inheritance modification for the class MIMETypes

2015-01-28 Thread Mattmann, Chris A (3980)
Hi Luke, -Original Message- From: Luke Date: Wednesday, January 28, 2015 at 7:15 PM To: Chris Mattmann , Chris Mattmann , "dev@tika.apache.org" Cc: NSF Polar CyberInfrastructure DR Students Subject: RE: [jira] [Commented] (TIKA-1535) Inheritance modification for the class MIMETypes >H

RE: [jira] [Commented] (TIKA-1535) Inheritance modification for the class MIMETypes

2015-01-28 Thread Luke
Thanks professor for the prompt and kind response, will keep you updated on the progress and findings. -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Wednesday, January 28, 2015 8:17 PM To: Luke; 'Christian Alan Mattmann'; dev@tika.apache.o

[jira] [Commented] (TIKA-1518) Docker with Tika Server

2015-01-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296439#comment-14296439 ] Chris A. Mattmann commented on TIKA-1518: - Thanks Tyler. Can you raise #2 on infras

[jira] [Comment Edited] (TIKA-1518) Docker with Tika Server

2015-01-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296439#comment-14296439 ] Chris A. Mattmann edited comment on TIKA-1518 at 1/29/15 6:15 AM: ---

[jira] [Comment Edited] (TIKA-1518) Docker with Tika Server

2015-01-28 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296439#comment-14296439 ] Chris A. Mattmann edited comment on TIKA-1518 at 1/29/15 6:15 AM: ---

Re: multiple detect call -> different results (tika 1.7)

2015-01-28 Thread Mattmann, Chris A (3980)
Dear Gabriele, Thanks for your question. It should be sent to dev@tika.apache.org (moving dev-ow...@tika.apache.org to BCC). I’ll take a look tomorrow if someone else hasn’t answered yet. Cheers, Chris ++ Chris Mattmann, Ph.D. Chi

[jira] [Updated] (TIKA-1423) Build a parser to extract data from GRIB formats

2015-01-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-1423: --- Attachment: TIKA-1423v2.patch Patch for trunk which passes all tests including issues e

[jira] [Comment Edited] (TIKA-1423) Build a parser to extract data from GRIB formats

2015-01-28 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296541#comment-14296541 ] Lewis John McGibbney edited comment on TIKA-1423 at 1/29/15 7:54 AM: