[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Attachment: cbor_tika.mimetypes.xml.jpg
rfc_cbor.jpg
CBOR Parser and detection improvement
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Description:
CBOR is a data format whose design goals include the possibility of extremely
small code size,
Luke sh created TIKA-1610:
-
Summary: CBOR Parser and detection improvement
Key: TIKA-1610
URL: https://issues.apache.org/jira/browse/TIKA-1610
Project: Tika
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Description:
CBOR is a data format whose design goals include the possibility of extremely
small code size,
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Description:
CBOR is a data format whose design goals include the possibility of extremely
small code size,
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Description:
CBOR is a data format whose design goals include the possibility of extremely
small code size,
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Attachment: 142440269.html
cbor file dumped by the nutch tool.
CBOR Parser and detection improvement
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Summary: CBOR Parser and detection [improvement] (was: CBOR Parser and
detection improvement)
CBOR Parser and
[
https://issues.apache.org/jira/browse/TIKA-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1611.
---
Resolution: Fixed
r1675159.
Nothing like testing to see behavior, rather than assumptions. :(
Allow
[
https://issues.apache.org/jira/browse/TIKA-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1611:
--
Description:
While parsing embedded documents, currently, if a parser hits an
[
https://issues.apache.org/jira/browse/TIKA-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505335#comment-14505335
]
Tim Allison commented on TIKA-1612:
---
Not sure how we want to fix this. To make this
[
https://issues.apache.org/jira/browse/TIKA-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505358#comment-14505358
]
Hudson commented on TIKA-1611:
--
SUCCESS: Integrated in tika-trunk-jdk1.7 #639 (See
[
https://issues.apache.org/jira/browse/TIKA-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505368#comment-14505368
]
Luis Filipe Nassif commented on TIKA-879:
-
Yes, thank you very much for testing with
[
https://issues.apache.org/jira/browse/TIKA-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505367#comment-14505367
]
Luis Filipe Nassif commented on TIKA-879:
-
Yes, thank you very much for testing with
[
https://issues.apache.org/jira/browse/TIKA-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1611:
--
Description:
While parsing embedded documents, currently, if a parser hits an Exception, the
Exception
Hi Folks,
Whilst addressing NUTCH-1994, I've experienced a dependency problem
(related to unpublished artifacts on Maven Central) which I am working
through right now.
When Kaing the upgrade in Nutch, I get the following
[ivy:resolve] -- artifact edu.ucar#udunits;4.5.5!udunits.jar:
Hi Tika friends,
I am currently engaged in a project funded by National Science Foundation. Our
goal is to develop a research-friendly environment where geoscientists, like
me, can easily find source codes they need. According to a survey, scientists
spend a considerable amount of their time
[
https://issues.apache.org/jira/browse/TIKA-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505377#comment-14505377
]
Luis Filipe Nassif commented on TIKA-1601:
--
Great! Give me more 3 days to submit
Tim Allison created TIKA-1612:
-
Summary: Exceptions getting image data in PPT files
Key: TIKA-1612
URL: https://issues.apache.org/jira/browse/TIKA-1612
Project: Tika
Issue Type: Bug
[
https://issues.apache.org/jira/browse/TIKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504904#comment-14504904
]
Konstantin Gribov commented on TIKA-1532:
-
{{text/\*+xml}} is quite unusual type.
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505057#comment-14505057
]
Luis Filipe Nassif commented on TIKA-1513:
--
No, I did not give a try to 0x03. How
[
https://issues.apache.org/jira/browse/TIKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505051#comment-14505051
]
Hudson commented on TIKA-1501:
--
SUCCESS: Integrated in tika-trunk-jdk1.7 #638 (See
[
https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated TIKA-1607:
---
Summary: Introduce new arbitrary object key/values data structure for
persitsence of
[
https://issues.apache.org/jira/browse/TIKA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeremy B. Merrill updated TIKA-1608:
Attachment: 1534-attachment.doc
document failing under this bug
RuntimeException on
[
https://issues.apache.org/jira/browse/TIKA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505102#comment-14505102
]
Jeremy B. Merrill commented on TIKA-1608:
-
POI bug:
[
https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505008#comment-14505008
]
Tim Allison commented on TIKA-1315:
---
Ha. Ok, but your patch is really well done. Let me
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504996#comment-14504996
]
Luis Filipe Nassif commented on TIKA-1513:
--
Hi Tim,
I am ok with 1) and 2). But I
[
https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505054#comment-14505054
]
Ray Gauss II commented on TIKA-1607:
We've had a few discussions on structured metadata
[
https://issues.apache.org/jira/browse/TIKA-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luis Filipe Nassif closed TIKA-1554.
Resolution: Fixed
Fix Version/s: 1.8
Resolved in r4608ff5. Thanks.
Improve EMF file
[
https://issues.apache.org/jira/browse/TIKA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505093#comment-14505093
]
Jeremy B. Merrill commented on TIKA-1608:
-
Hi Tim,
I added the document. I'm
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505092#comment-14505092
]
Tim Allison commented on TIKA-1513:
---
Completely agree.
Only 2,386 files.
This is the
[
https://issues.apache.org/jira/browse/TIKA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeremy B. Merrill updated TIKA-1608:
Description:
Extracting text from the Word 97-2004 document attached here fails with the
[
https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504999#comment-14504999
]
Sergey Beryozkin commented on TIKA-1607:
Hi,
IMHO it indeed makes sense to keep
[
https://issues.apache.org/jira/browse/TIKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1501.
---
Resolution: Fixed
Fix Version/s: 1.9
r1675121.
Thank you, [~bobpaulin]!
Fix the disabled
Tim Allison created TIKA-1611:
-
Summary: Allow RecursiveParserWrapper to catch exceptions from
embedded documents
Key: TIKA-1611
URL: https://issues.apache.org/jira/browse/TIKA-1611
Project: Tika
[
https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505004#comment-14505004
]
Moritz Dorka commented on TIKA-1315:
Well, the original patch by Filip is essentially
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505006#comment-14505006
]
Tim Allison commented on TIKA-1513:
---
Y, I was concerned by that generally. Are you
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504951#comment-14504951
]
Tim Allison commented on TIKA-1513:
---
From govdocs1, it looks like first byte of 0X03 is a
Yay thanks Tyler!
++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email:
[
https://issues.apache.org/jira/browse/TIKA-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505042#comment-14505042
]
Moritz Dorka commented on TIKA-1315:
I believe I could speed up the process by
[
https://issues.apache.org/jira/browse/TIKA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504871#comment-14504871
]
Tim Allison commented on TIKA-1608:
---
[~jeremybmerrill], thank you for raising this issue.
Thank you, Tyler!
-Original Message-
From: Tyler Palsulich [mailto:tpalsul...@apache.org]
Sent: Monday, April 20, 2015 5:09 PM
To: dev@tika.apache.org; u...@tika.apache.org; annou...@apache.org
Subject: [ANNOUNCE] Apache Tika 1.8 Released
The Apache Tika project is pleased to announce
[
https://issues.apache.org/jira/browse/TIKA-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504884#comment-14504884
]
Tim Allison commented on TIKA-1295:
---
[~lewismc], +1 to adding potential for hierarchical
[
https://issues.apache.org/jira/browse/TIKA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505113#comment-14505113
]
Tim Allison commented on TIKA-1608:
---
In govdocs1, there are 24 of these:
{noformat}
[
https://issues.apache.org/jira/browse/TIKA-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505132#comment-14505132
]
Luis Filipe Nassif commented on TIKA-879:
-
Maybe we could keep the original magics
[
https://issues.apache.org/jira/browse/TIKA-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505172#comment-14505172
]
Luis Filipe Nassif commented on TIKA-1554:
--
Actually r1667661
Improve EMF file
[
https://issues.apache.org/jira/browse/TIKA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505178#comment-14505178
]
Jeremy B. Merrill commented on TIKA-1608:
-
It's the only one I've found so far out
Hi Lewis,
I also tried upgrading Tika in Nutch. But, ran into the same issue
(but, udunits
is found, as expected):
[ivy:retrieve] ::
[ivy:retrieve] :: UNRESOLVED DEPENDENCIES ::
[ivy:retrieve]
GitHub user LukeLiush opened a pull request:
https://github.com/apache/tika/pull/42
add entry for cbor glob extension in the tika-mimetypes.xml
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/LukeLiush/tika cborExtension
[
https://issues.apache.org/jira/browse/TIKA-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505633#comment-14505633
]
Tim Allison commented on TIKA-1601:
---
I don't. That's half the fun of a patch, right. :)
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506214#comment-14506214
]
Tim Allison commented on TIKA-1513:
---
In looking at
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann reassigned TIKA-1610:
---
Assignee: Chris A. Mattmann
CBOR Parser and detection [improvement]
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506359#comment-14506359
]
Chris A. Mattmann commented on TIKA-1610:
-
Applied Pull request #42 thanks
Github user asfgit closed the pull request at:
https://github.com/apache/tika/pull/42
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Thanks Lewis!
++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email:
Thanks Luke.
So I guess all I was asking was could you try it out. Thanks for the
lesson in the RFC.
Cheers,
Chris
Chris Mattmann
chris.mattm...@gmail.com
-Original Message-
From: Luke hanson311...@gmail.com
Date: Wednesday, April 22, 2015 at 1:46 AM
To:
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506414#comment-14506414
]
Hudson commented on TIKA-1610:
--
SUCCESS: Integrated in tika-trunk-jdk1.7 #640 (See
Hi professor,
I think it highly depends on the content being read by tika, e.g. if there is a
sequence of bytes in the file that is being read and is the same as one or more
of mime types being defined in our tika-mimes.xml, I guess that tika will put
those types in its estimation list,
On Tue, 21 Apr 2015, Oh, Ji-Hyun (329F-Affiliate) wrote:
For the first step, I listed up the file formats that widely used in
climate science.
FORTRAN (.f, .f90, f77)
Python (.py)
R (.R)
Matlab (.m)
GrADS (Grid Analysis and Display System)
(.gs)
NCL (NCAR Command Language) (.ncl)
IDL
Hi Folks,
OK, so the final part of this jigsaw is as follows
I've requested a staging area [0] on Sonatype OSSRH to release the MIT
licensed 3rd party bzip2 artifacts.
I had to Mavenize the project. I will submit this patch to the bzip2
project and hopefully they will pull it in. If not then I
Hi Ji-Hyun,
On Tue, Apr 21, 2015 at 4:15 PM, dev-digest-h...@tika.apache.org wrote:
FORTRAN (.f, .f90, f77)
Python (.py)
R (.R)
Matlab (.m)
GrADS (Grid Analysis and Display System)
(.gs)
NCL (NCAR Command Language) (.ncl)
IDL (Interactive Data Language) (.pro)
NICE list
I checked
Hi Folks,
Update
On Tue, Apr 21, 2015 at 10:50 AM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
[ivy:resolve] ::
[ivy:resolve] :: edu.ucar#jj2000;5.2: not found
[ivy:resolve] :: edu.ucar#udunits;4.5.5: not found
Patch for Mavenizing the bzip2 project
https://code.google.com/p/jbzip2/issues/detail?id=3
Lewis
On Tue, Apr 21, 2015 at 4:14 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi Folks,
OK, so the final part of this jigsaw is as follows
I've requested a staging area [0] on Sonatype
[
https://issues.apache.org/jira/browse/TIKA-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505269#comment-14505269
]
Tim Allison commented on TIKA-879:
--
Y, will do. Results probably tomorrow.
Detection
[
https://issues.apache.org/jira/browse/TIKA-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505269#comment-14505269
]
Tim Allison edited comment on TIKA-879 at 4/21/15 5:04 PM:
---
Y,
65 matches
Mail list logo