Fwd: Error when processing doap file http://svn.apache.org/repos/asf/tika/site/src/site/resources/doap.rdf:

2021-12-25 Thread Dave Fisher
FYI- Sent from my iPhone Begin forwarded message: > From: Projects > Date: December 25, 2021 at 6:01:14 PM PST > To: Site Development > Subject: Error when processing doap file > http://svn.apache.org/repos/asf/tika/site/src/site/resources/doap.rdf: > Reply-To: site-...@apache.org > > URL:

Re: 2.2.0 JARs not pushed to Maven Central

2021-12-17 Thread Dave Fisher
Get on #afinfra in slack and look at the scroll back. > On Dec 17, 2021, at 2:59 PM, lewis john mcgibbney wrote: > > I’ve been waiting on the M2 central Repository being updated with the 2.2.0 > jars… > I checked repository.Apache.org and they are NOT staged which I assume > means that the

Re: Log4j 2.16.0 a more complete fix to Log4Shell

2021-12-13 Thread Dave Fisher
You’ll need to evaluate that yourself. Sent from my iPhone > On Dec 13, 2021, at 4:56 PM, Tim Allison wrote: > > Do we have to do a respin of the release candidate or is this marginally > better? > >> On Mon, Dec 13, 2021 at 7:43 PM Dave Fisher wrote: >> >>

Log4j 2.16.0 a more complete fix to Log4Shell

2021-12-13 Thread Dave Fisher
https://lists.apache.org/thread/d6v4r6nosxysyq9rvnr779336yf0woz4

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Dave Fisher (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412084#comment-17412084 ] Dave Fisher commented on TIKA-3544: --- The OP's source [https://getcreditcardnumbers.com|https

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Dave Fisher (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17412035#comment-17412035 ] Dave Fisher commented on TIKA-3544: --- See [https://en.wikipedia.org/wiki/Double-precision_floating

Re: Tika lib is huge.. why?

2020-09-26 Thread Dave Fisher
IIRC - if you know you only want PDF extraction then take a look at Apache PDFBox. PDFBox.apache.org Sent from my iPhone > On Sep 26, 2020, at 10:00 AM, Laurence Vanhelsuwe > wrote: > > Thanks for the explanation. > > I understand the approach.. but in my particular use case, I cannot >

Re: HTML to PDF conversion

2019-10-16 Thread Dave Fisher
Hi - You may want to take a look at Apache FOP which is part of the Apache XML Graphics project. My team had success with that in generating PDF from XML. Regards, Dave > On Oct 16, 2019, at 5:05 AM, Sergey Beryozkin wrote: > > Ken, thanks for the feedback, I meant to reply to your comments,

Re: Guidance to avoid Tika's integration with Solr's ExtractingRequestHandler in production

2018-05-29 Thread Dave Fisher
Having run a Solr service, you are striving to have quick response on queries and want to avoid anything that can pause the JVM. You work hard to make your updates quick and NRT. Text Extractions of XML based documents like Office and big object files like PDF are memory intensive and should be