[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-10 Thread Tilman Hausherr (Jira)
ong sequences of digits from Excel spreadsheets using Tika > 1.20 doesn’t yield the expected results > - > > Key: TIKA-3544 > URL: https:/

[jira] [Closed] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-10 Thread Tilman Hausherr (Jira)
ika > 1.20 doesn’t yield the expected results > - > > Key: TIKA-3544 > URL: https://issues.apache.org/jira/browse/TIKA-3544 >

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-10 Thread Jitin Jindal (Jira)
ces of digits from Excel spreadsheets using Tika > 1.20 doesn’t yield the expected results > - > > Key: TIKA-3544 > URL: https://issues.apac

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Dave Fisher (Jira)
://getcreditcardnumbers.com/] produces invalid numbers. In JSON and Javascript Numbers are always double precision floating point. See [https://www.w3schools.com/js/js_numbers.asp]   > Extraction of long sequences of digits from Excel spreadsheets using Tika > 1.20 doesn’t yield the expected r

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Tim Allison (Jira)
s from Excel spreadsheets using Tika > 1.20 doesn’t yield the expected results > - > > Key: TIKA-3544 > URL: https://issues.apache.org/

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Dave Fisher (Jira)
. Use strings. Just like you have to use for US Zipcodes due to leading '0'. > Extraction of long sequences of digits from Excel spreadsheets using Tika > 1.20 doesn’t yield the expected r

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Tim Allison (Jira)
be wrong 90% of the time... I'm now inclined to propose that we not do anything here. Note: This is Excel for Mac (16.52), your mileage may vary. > Extraction of long sequences of digits from Excel spreadsheets using Tika > 1.20 doesn’t yield the expected r

[jira] [Comment Edited] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Tim Allison (Jira)
for numbers that might start with leading zeros, like credit card #s, etc. You have to be really careful to enter them as strings or, better yet, use an actual database. > Extraction of long sequences of digits from Excel spreadsheets using Tika > 1.20 doesn’t yield the expected r

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Tim Allison (Jira)
uences of digits from Excel spreadsheets using Tika > 1.20 doesn’t yield the expected results > - > > Key: TIKA-3544 > URL: https://issues.apache

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Tim Allison (Jira)
dit Card Numbers (Source: http://www.getcreditcardnumbers.com/) 6480195344642780 30295201231669 30082494556063 344850003945824 358338792630 3587385370593640 > Extraction of long sequences of digits from Excel spreadsheets using Tika > 1.20 doesn’t yield the

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Tim Allison (Jira)
. > Extraction of long sequences of digits from Excel spreadsheets using Tika > 1.20 doesn’t yield the expected results > - > > Key: TIKA-3544 >

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Nick Burch (Jira)
ing Tika > 1.20 doesn’t yield the expected results > - > > Key: TIKA-3544 > URL: https://issues.apache.org/jira/browse/TIKA-3544 >

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Tilman Hausherr (Jira)
cell". > Extraction of long sequences of digits from Excel spreadsheets using Tika > 1.20 doesn’t yield the expected results > - > > Key: TIKA-3544 >

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Nick Burch (Jira)
ing Tika > 1.20 doesn’t yield the expected results > - > > Key: TIKA-3544 > URL: https://issues.apache.org/jira/browse/TIKA-3544 >

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Tilman Hausherr (Jira)
elvetica,Regular"12K00P http://www.getcreditcardnumbers.com/;>http://www.getcreditcardnumbers.com/ {noformat} > Extraction of long sequences of digits from Excel spreadsheets using Tika > 1.20 doesn’t yiel

[jira] [Comment Edited] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Tilman Hausherr (Jira)
egular"12K00P http://www.getcreditcardnumbers.com/;>http://www.getcreditcardnumbers.com/ {noformat} > Extraction of long sequences of digits from Excel spreadsheets using Tika > 1

[jira] [Updated] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-07 Thread Jitin Jindal (Jira)
ets using Tika > 1.20 doesn’t yield the expected results > - > > Key: TIKA-3544 > URL: https://issues.apache.org/jira/browse/TIKA-3544 >

[jira] [Updated] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-07 Thread Jitin Jindal (Jira)
ets using Tika > 1.20 doesn’t yield the expected results > - > > Key: TIKA-3544 > URL: https://issues.apache.org/jira/browse/TIKA-3544 >

[jira] [Updated] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-07 Thread Jitin Jindal (Jira)
ika > 1.20 doesn’t yield the expected results > - > > Key: TIKA-3544 > URL: https://issues.apache.org/jira/browse/TIKA-3544 >

[jira] [Created] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-07 Thread Jitin Jindal (Jira)
Jitin Jindal created TIKA-3544: -- Summary: Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results Key: TIKA-3544 URL: https://issues.apache.org/jira/browse

[jira] [Resolved] (TIKA-2877) Tika 1.20 suffer from 3 separate CVE vulnerabilities

2019-05-19 Thread Tim Allison (JIRA)
to site shortly and announce release of 1.21. > Tika 1.20 suffer from 3 separate CVE vulnerabilities > > > Key: TIKA-2877 > URL: https://issues.apache.org/jira/browse/TIKA-2877 >

[jira] [Commented] (TIKA-2877) Tika 1.20 suffer from 3 separate CVE vulnerabilities

2019-05-16 Thread Tim Allison (JIRA)
/2c027535156cc6862149490b289552d72ba5a9bff985fb7cce794e21@%3Cdev.tika.apache.org%3E I can add a new table for dependency vulnerabilities on our security page. Thank you. > Tika 1.20 suffer from 3 separate CVE vulnerabilit

[jira] [Created] (TIKA-2877) Tika 1.20 suffer from 3 separate CVE vulnerabilities

2019-05-16 Thread Pat cashman (JIRA)
Pat cashman created TIKA-2877: - Summary: Tika 1.20 suffer from 3 separate CVE vulnerabilities Key: TIKA-2877 URL: https://issues.apache.org/jira/browse/TIKA-2877 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-2869) Can't parse pdf in version 1.20 - Pkcs7Parser (DEF length 465542 object truncated by 465479)

2019-05-13 Thread Tim Allison (JIRA)
://lists.apache.org/thread.html/36529c7df113e81ace51301175528120884af73b78edd40764a88cf8@%3Cdev.tika.apache.org%3E > Can't parse pdf in version 1.20 - Pkcs7Parser (DEF length 465542 object > truncated by

[jira] [Resolved] (TIKA-2869) Can't parse pdf in version 1.20 - Pkcs7Parser (DEF length 465542 object truncated by 465479)

2019-05-10 Thread Tim Allison (JIRA)
pdf in version 1.20 - Pkcs7Parser (DEF length 465542 object > truncated by 465479) > > > Key: TIKA-2869 > URL: https://issues.apache.org/jira/browse/TIKA-2869 >

[jira] [Commented] (TIKA-2869) Can't parse pdf in version 1.20 - Pkcs7Parser (DEF length 465542 object truncated by 465479)

2019-05-10 Thread Tim Allison (JIRA)
branch. I'll take a look. Thank you for opening this issue and sharing a triggering file! > Can't parse pdf in version 1.20 - Pkcs7Parser (DEF length 465542 object > truncated by

[jira] [Updated] (TIKA-2869) Can't parse pdf in version 1.20 - Pkcs7Parser (DEF length 465542 object truncated by 465479)

2019-05-10 Thread Edans Sandes (JIRA)
-app-1.20.jar, it stopped working. {{java -jar {color:#ff}tika-app-1.20.jar{color} 0001.127_342_5_7955.pdf}} {{mai 10, 2019 11:36:23 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem}} {{ADVERT╩NCIA: J2KImageReader not loaded. JPEG2000 files

[jira] [Updated] (TIKA-2869) Can't parse pdf in version 1.20 - Pkcs7Parser (DEF length 465542 object truncated by 465479)

2019-05-10 Thread Edans Sandes (JIRA)
-app-1.20.jar, it stopped working. {{java -jar {color:#ff}tika-app-1.20.jar{color} 0001.127_342_5_7955.pdf}} mai 10, 2019 11:36:23 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem ADVERT╩NCIA: J2KImageReader not loaded. JPEG2000 files will not be processed. See

[jira] [Created] (TIKA-2869) Can't parse pdf in version 1.20 - Pkcs7Parser (DEF length 465542 object truncated by 465479)

2019-05-10 Thread Edans Sandes (JIRA)
Edans Sandes created TIKA-2869: -- Summary: Can't parse pdf in version 1.20 - Pkcs7Parser (DEF length 465542 object truncated by 465479) Key: TIKA-2869 URL: https://issues.apache.org/jira/browse/TIKA-2869

[jira] [Resolved] (TIKA-2855) pdfbox version used by both Apache Tika 1.19.1 and 1.20 is vulnerable

2019-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2855. --- Resolution: Duplicate Thank you! > pdfbox version used by both Apache Tika 1.19.1 and 1

[jira] [Created] (TIKA-2855) pdfbox version used by both Apache Tika 1.19.1 and 1.20 is vulnerable

2019-04-18 Thread Abhijit Rajwade (JIRA)
Abhijit Rajwade created TIKA-2855: - Summary: pdfbox version used by both Apache Tika 1.19.1 and 1.20 is vulnerable Key: TIKA-2855 URL: https://issues.apache.org/jira/browse/TIKA-2855 Project: Tika

Re: [VOTE] Release Apache Tika 1.20 Candidate #1

2018-12-22 Thread Tim Allison
t; >> Hi Tim, > >> > >> Thanks for rolling the release. > >> > >> Built & validated on Mac OS X 10.12 > >> > >> Updated flink-crawler, all tests pass. > >> > >> So here’s my +1 > >> > >> — Ken &

[ANNOUNCE] Apache Tika 1.20 released

2018-12-22 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika 1.20. The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon as the mirrors get the syncs. Apache Tika is a toolkit for detecting

[RESULT][VOTE] Release Apache Tika 1.20 Candidate #1

2018-12-22 Thread Tim Allison
gt; >> So here’s my +1 > >> > >> — Ken > >> > >> > >> > On Dec 17, 2018, at 6:14 PM, Tim Allison wrote: > >> > > >> > A candidate for the Tika 1.20 release is available at: > >> > > >

Re: [VOTE] Release Apache Tika 1.20 Candidate #1

2018-12-22 Thread Oleg Tikhonov
gt; Updated flink-crawler, all tests pass. >> >> So here’s my +1 >> >> — Ken >> >> >> > On Dec 17, 2018, at 6:14 PM, Tim Allison wrote: >> > >> > A candidate for the Tika 1.20 release is available at: >> > >> > https:/

Re: [VOTE] Release Apache Tika 1.20 Candidate #1

2018-12-22 Thread Oleg Tikhonov
n Dec 17, 2018, at 6:14 PM, Tim Allison wrote: > > > > A candidate for the Tika 1.20 release is available at: > > > > https://dist.apache.org/repos/dist/dev/tika/ > > > > The release candidate is a zip archive of the sources in: > > h

Re: [VOTE] Release Apache Tika 1.20 Candidate #1

2018-12-21 Thread Ken Krugler
Hi Tim, Thanks for rolling the release. Built & validated on Mac OS X 10.12 Updated flink-crawler, all tests pass. So here’s my +1 — Ken > On Dec 17, 2018, at 6:14 PM, Tim Allison wrote: > > A candidate for the Tika 1.20 release is available at: > > https://dist.ap

Re: 1.20?

2018-12-18 Thread Tim Allison
we're now suppressing the style markup that our parser >> > > was (incorrectly, IMHO, inserting) -- check the values in >> > > "top_10_unique_token_diffs_a", e.g.: rgb: 15 | color: 14 | font: 9 | >> > > 0,0,0: 4 | background: 4 | 147,147,147: 3 | 247,247,247: 3 |

[VOTE] Release Apache Tika 1.20 Candidate #1

2018-12-17 Thread Tim Allison
A candidate for the Tika 1.20 release is available at: https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip archive of the sources in: https://github.com/apache/tika/tree/1.20-rc1/ The SHA-512 checksum of the archive

Re: 1.20?

2018-12-14 Thread Tim Allison
> >> > > I also see that we're losing content in x-java and x-groovy, etc., but >> > > that's because we're now suppressing the style markup that our parser >> > > was (incorrectly, IMHO, inserting) -- check the values in >> > > "top_10_u

Re: 1.20?

2018-12-13 Thread Tim Allison
ting) -- check the values in > > > "top_10_unique_token_diffs_a", e.g.: rgb: 15 | color: 14 | font: 9 | > > > 0,0,0: 4 | background: 4 | 147,147,147: 3 | 247,247,247: 3 | bold: 3 | > > > weight: 3 | family: 2 > > > > > > In short, I think we'r

Re: 1.20?

2018-12-13 Thread Tim Allison
rting) -- check the values in > > "top_10_unique_token_diffs_a", e.g.: rgb: 15 | color: 14 | font: 9 | > > 0,0,0: 4 | background: 4 | 147,147,147: 3 | 247,247,247: 3 | bold: 3 | > > weight: 3 | family: 2 > > > > In short, I think we're good to go.

Re: 1.20?

2018-12-13 Thread Luís Filipe Nassif
0,0,0: 4 | background: 4 | 147,147,147: 3 | 247,247,247: 3 | bold: 3 | > weight: 3 | family: 2 > > In short, I think we're good to go. Will roll rc1 later today or > (more likely) tomorrow unless there are objections. > On Mon, Dec 10, 2018 at 9:37 PM Tim Allison wrote: >

Re: 1.20?

2018-12-13 Thread Chris Mattmann
Roll forward! Yay! From: Tim Allison Reply-To: "dev@tika.apache.org" Date: Thursday, December 13, 2018 at 7:02 AM To: "dev@tika.apache.org" Subject: Re: 1.20? Reports are here: http://162.242.228.174/reports/tika_1_20-pre-rc1.zip I'm going to r

Re: 1.20?

2018-12-13 Thread Tim Allison
ood to go. Will roll rc1 later today or (more likely) tomorrow unless there are objections. On Mon, Dec 10, 2018 at 9:37 PM Tim Allison wrote: > > Any blockers on 1.20? I'm going to kick off the regression tests shortly. > On Fri, Nov 30, 2018 at 7:39 PM wrote: > > > >

Re: 1.20?

2018-12-10 Thread Tim Allison
Any blockers on 1.20? I'm going to kick off the regression tests shortly. On Fri, Nov 30, 2018 at 7:39 PM wrote: > > Hi, > On Wed, 21 Nov 2018 at 13:00, Tim Allison wrote: > > > Dave, > > Should I try to get the Docker plugin working again? > > > > That wou

Re: 1.20?

2018-11-30 Thread loompa
Hi, On Wed, 21 Nov 2018 at 13:00, Tim Allison wrote: > Dave, > Should I try to get the Docker plugin working again? > That would be great. I think I may have went down the wrong path building an image at package time, as there doesn't seem to be an easy way to publish it as an Apache labelled

Re: 1.20?

2018-11-28 Thread Lewis John McGibbney
+1 would be nice to get the recent ENVI work released as well folks. On 2018/11/20 23:04:29, Tim Allison wrote: > All, >POI 4.0.1 will be out shortly with some important bug fixes. What would > you all think of targeting 1st/2nd week of December for 1.20? > > Cheers, > Tim >

Re: 1.20?

2018-11-21 Thread Tim Allison
ay, November 20, 2018 at 3:04 PM > To: "dev@tika.apache.org" > Subject: 1.20? > > > > All, > >POI 4.0.1 will be out shortly with some important bug fixes. What would > > you all think of targeting 1st/2nd week of December for 1.20? > > > > Cheers, > > Tim > > > >

Re: 1.20?

2018-11-20 Thread Chris Mattmann
Love it and I can align tika-python with that too ☺ From: Tim Allison Reply-To: "dev@tika.apache.org" Date: Tuesday, November 20, 2018 at 3:04 PM To: "dev@tika.apache.org" Subject: 1.20? All, POI 4.0.1 will be out shortly with some important bug fixes.

1.20?

2018-11-20 Thread Tim Allison
All, POI 4.0.1 will be out shortly with some important bug fixes. What would you all think of targeting 1st/2nd week of December for 1.20? Cheers, Tim