Re: HTML to PDF conversion

2019-10-16 Thread Dave Fisher
Hi - You may want to take a look at Apache FOP which is part of the Apache XML Graphics project. My team had success with that in generating PDF from XML. Regards, Dave > On Oct 16, 2019, at 5:05 AM, Sergey Beryozkin wrote: > > Ken, thanks for the feedback, I meant to reply to your comments,

Re: Tika lib is huge.. why?

2020-09-26 Thread Dave Fisher
IIRC - if you know you only want PDF extraction then take a look at Apache PDFBox. PDFBox.apache.org Sent from my iPhone > On Sep 26, 2020, at 10:00 AM, Laurence Vanhelsuwe > wrote: > > Thanks for the explanation. > > I understand the approach.. but in my particular use case, I cannot > re

Log4j 2.16.0 a more complete fix to Log4Shell

2021-12-13 Thread Dave Fisher
https://lists.apache.org/thread/d6v4r6nosxysyq9rvnr779336yf0woz4

Re: Log4j 2.16.0 a more complete fix to Log4Shell

2021-12-13 Thread Dave Fisher
You’ll need to evaluate that yourself. Sent from my iPhone > On Dec 13, 2021, at 4:56 PM, Tim Allison wrote: > > Do we have to do a respin of the release candidate or is this marginally > better? > >> On Mon, Dec 13, 2021 at 7:43 PM Dave Fisher wrote: >> >>

Re: 2.2.0 JARs not pushed to Maven Central

2021-12-17 Thread Dave Fisher
Get on #afinfra in slack and look at the scroll back. > On Dec 17, 2021, at 2:59 PM, lewis john mcgibbney wrote: > > I’ve been waiting on the M2 central Repository being updated with the 2.2.0 > jars… > I checked repository.Apache.org and they are NOT staged which I assume > means that the stagi

Fwd: Error when processing doap file http://svn.apache.org/repos/asf/tika/site/src/site/resources/doap.rdf:

2021-12-25 Thread Dave Fisher
FYI- Sent from my iPhone Begin forwarded message: > From: Projects > Date: December 25, 2021 at 6:01:14 PM PST > To: Site Development > Subject: Error when processing doap file > http://svn.apache.org/repos/asf/tika/site/src/site/resources/doap.rdf: > Reply-To: site-...@apache.org > > URL:

Re: Guidance to avoid Tika's integration with Solr's ExtractingRequestHandler in production

2018-05-29 Thread Dave Fisher
Having run a Solr service, you are striving to have quick response on queries and want to avoid anything that can pause the JVM. You work hard to make your updates quick and NRT. Text Extractions of XML based documents like Office and big object files like PDF are memory intensive and should be

[jira] [Commented] (TIKA-2939) Figure out how to allow OCR'ing of large PDFs via tika-server

2020-10-23 Thread Dave Fisher (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219857#comment-17219857 ] Dave Fisher commented on TIKA-2939: --- On your client side PDFBox tools might help

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Dave Fisher (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412035#comment-17412035 ] Dave Fisher commented on TIKA-3544: --- See [https://en.wikipedia.org/wiki/Do

[jira] [Commented] (TIKA-3544) Extraction of long sequences of digits from Excel spreadsheets using Tika 1.20 doesn’t yield the expected results

2021-09-08 Thread Dave Fisher (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412084#comment-17412084 ] Dave Fisher commented on TIKA-3544: --- The OP's source [https://getcreditcardnu