[jira] [Commented] (TIKA-4009) GeoTopic Parser package changed incorrectly from o.a.t.parser.geo from o.a.t.parser.geo.topic

2023-04-03 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708146#comment-17708146 ] Chris Mattmann commented on TIKA-4009: -- ugh, one more time, not `geo.topic`, ins

[jira] [Commented] (TIKA-4009) GeoTopic Parser package changed incorrectly from o.a.t.parser.geo from o.a.t.parser.geo.topic

2023-04-03 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708144#comment-17708144 ] Chris Mattmann commented on TIKA-4009: -- Forgot the config, file, fixed in

[jira] [Resolved] (TIKA-4009) GeoTopic Parser package changed incorrectly from o.a.t.parser.geo from o.a.t.parser.geo.topic

2023-04-03 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Mattmann resolved TIKA-4009. -- Resolution: Fixed Fixed:   {noformat} (base) mattmann@proscuitto:~/git/tika$ git commit -m

[jira] [Commented] (TIKA-4009) GeoTopic Parser package changed incorrectly from o.a.t.parser.geo from o.a.t.parser.geo.topic

2023-04-03 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708070#comment-17708070 ] Chris Mattmann commented on TIKA-4009: -- OK, I have a patch and commit forthco

[jira] [Created] (TIKA-4009) GeoTopic Parser package changed incorrectly from o.a.t.parser.geo from o.a.t.parser.geo.topic

2023-04-03 Thread Chris Mattmann (Jira)
Chris Mattmann created TIKA-4009: Summary: GeoTopic Parser package changed incorrectly from o.a.t.parser.geo from o.a.t.parser.geo.topic Key: TIKA-4009 URL: https://issues.apache.org/jira/browse/TIKA-4009

[jira] [Assigned] (TIKA-4009) GeoTopic Parser package changed incorrectly from o.a.t.parser.geo from o.a.t.parser.geo.topic

2023-04-03 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Mattmann reassigned TIKA-4009: Assignee: Chris Mattmann > GeoTopic Parser package changed incorrectly f

[jira] [Updated] (TIKA-3439) Create new TensorFlow2 backed Tika NLP docker for SentimentAnalysis

2021-06-07 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Mattmann updated TIKA-3439: - Issue Type: New Feature (was: Bug) > Create new TensorFlow2 backed Tika NLP docker

[jira] [Assigned] (TIKA-3439) Create new TensorFlow2 backed Tika NLP docker for SentimentAnalysis

2021-06-07 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Mattmann reassigned TIKA-3439: Assignee: Chris Mattmann > Create new TensorFlow2 backed Tika NLP docker

[jira] [Created] (TIKA-3439) Create new TensorFlow2 backed Tika NLP docker for SentimentAnalysis

2021-06-07 Thread Chris Mattmann (Jira)
Chris Mattmann created TIKA-3439: Summary: Create new TensorFlow2 backed Tika NLP docker for SentimentAnalysis Key: TIKA-3439 URL: https://issues.apache.org/jira/browse/TIKA-3439 Project: Tika

Re: Question on custom tika-python configs for OMB PDF

2021-05-26 Thread Chris Mattmann
Hannah, I am pushing your question upstream to the dev@tika list. I think what you need is for them to look at your config file which I’ve reattached below pasted, and then see if it looks ok. Then in Tika Python you need to give it this config file before your server starts up or outside of Pyth

[jira] [Commented] (TIKA-94) Speech-to-text transcription

2021-05-03 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338675#comment-17338675 ] Chris Mattmann commented on TIKA-94: [~lewismc] congratulations! What an accomplish

[jira] [Resolved] (TIKA-3329) RTG Translator with many-to-eng translation

2021-05-01 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Mattmann resolved TIKA-3329. -- Resolution: Fixed Merged into main! Thanks [~thammegowda]!   {noformat} (base) mattmann

[jira] [Updated] (TIKA-3329) RTG Translator with many-to-eng translation

2021-05-01 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Mattmann updated TIKA-3329: - Fix Version/s: 2.0.0 > RTG Translator with many-to-eng translat

[jira] [Updated] (TIKA-3329) RTG Translator with many-to-eng translation

2021-05-01 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Mattmann updated TIKA-3329: - Labels: memex (was: ) > RTG Translator with many-to-eng translat

[jira] [Assigned] (TIKA-3329) RTG Translator with many-to-eng translation

2021-05-01 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Mattmann reassigned TIKA-3329: Assignee: Chris Mattmann (was: Thamme Gowda) > RTG Translator with many-to-

Re: Python-tika: issues related to memory consumption

2021-03-15 Thread Chris Mattmann
Hi Manish, I think you should ask this one upstream on the Tika Dev lists. I’ve cc’ed them for you. From: manish mathur Date: Monday, March 15, 2021 at 4:41 AM To: Subject: Re: Python-tika: issues related to memory consumption Hi Chris, I am using python-tika library to

Re: Help in tika-python

2021-01-15 Thread Chris Mattmann
l.com" Subject: Help in tika-python Hello Chris Mattmann, I installed your library, it works perfectly. I wonder if it possible to find the position (bounding boxes ) of the texts and images on ppt files. And to discorver which page de of the slides that texts come from. Thanks Nilton

FW: [EXTERNAL] Tika - problem with Polish encoding

2020-12-16 Thread Chris Mattmann
Copying the Tika dev list where I think you will find the help you are looking for 😊 From: Mariusz G Date: Wednesday, December 16, 2020 at 7:04 AM To: "Mattmann, Chris A (US 1740)" Subject: [EXTERNAL] Tika - problem with Polish encoding Hello Sir, I'm writing to you because I tri

Re: [ANNOUNCE] Welcome Peter Lee as Tika PMC member and committer

2020-11-25 Thread Chris Mattmann
Welcome Peter! 😊 From: Peter Lee Reply-To: Date: Wednesday, November 25, 2020 at 6:08 PM To: "dev@tika.apache.org" , "talli...@apache.org" Cc: "u...@tika.apache.org" Subject: Re: [ANNOUNCE] Welcome Peter Lee as Tika PMC member and committer Many thanks to you, Tim. :) Hi, all

Re: [EXTERNAL] Tika - Issues extracting Arabic script

2020-11-24 Thread Chris Mattmann
Christian thank you for reaching out. I am copying dev@tika.apache.org as I think your question is best directed there since tika python is downstream of the processing that happens there. Best of luck! Cheers Chris From: Christian Faggionato Date: Tuesday, November 24, 2020 at 1

Re: [EXTERNAL] I have some questions about tika-python

2020-08-29 Thread Chris Mattmann
Thanks for reaching out Aditya and for using Tika Python. This issue is best solved upstream in dev@tika.apache.org so I am copying that list and making it the reply to. The issue likely lies in the PDFBox algorithm. There are PDFBox folks on this list. They can help you. Hopefully there is a

Re: [EXTERNAL] Tika 2.0 modularization

2020-08-14 Thread Chris Mattmann
Haha  I’m down and supportive! Time’s TIME FOR 2.x 😊 From: Tim Allison Reply-To: "dev@tika.apache.org" , "Allison, Tim (US 174B-Affiliate)" Date: Friday, August 14, 2020 at 6:06 AM To: "" Subject: [EXTERNAL] Tika 2.0 modularization All, I _think_ I might have some time to

[jira] [Commented] (TIKA-3119) General upgrades for 1.25

2020-06-19 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140963#comment-17140963 ] Chris Mattmann commented on TIKA-3119: -- [~agibsonccc] can you help see a

Re: [EXTERNAL] renaming master?

2020-06-16 Thread Chris Mattmann
How about just development? We use that  on OODT … though we have a master too that  needs to get removed … From: Tim Allison Reply-To: "dev@tika.apache.org" , "Allison, Tim (US 1740-Affiliate)" Date: Tuesday, June 16, 2020 at 10:31 AM To: "" Subject: [EXTERNAL] renaming master?

[jira] [Commented] (TIKA-3093) Enable tika-server to forward parse results to another endpoint

2020-04-24 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091708#comment-17091708 ] Chris Mattmann commented on TIKA-3093: -- yea we have lots of pipelines with OODT

Re: [EXTERNAL] Re: Issue with > 200% CPU after bulk usage

2020-04-16 Thread Chris Mattmann
Yes, some of us have been developing an Elastic scaling stack for Tika server… That does just that with AWS. Don’t have it ready to push upstream yet. Cheers, Chris From: Eric Pugh Reply-To: "dev@tika.apache.org" Date: Thursday, April 16, 2020 at 7:09 AM To: "dev@tika.apache.org" S

[jira] [Commented] (TIKA-2368) Clean up SentimentParser dependencies

2020-04-06 Thread Chris Mattmann (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076659#comment-17076659 ] Chris Mattmann commented on TIKA-2368: -- I have a TensorFlow version of Senti

Re: [EXTERNAL] Re: JDK 12 build issues

2020-03-18 Thread Chris Mattmann
eleases. Keep you updated. Cheers, Oleg On Wed, Mar 18, 2020 at 4:35 PM Chris Mattmann wrote: So I was able to get past my issues with Tesseract by reinstalling the latest version with Brew. I have a new issue! I’ve tried in JDK12 and JDK13 to build tika-dl, but

Re: [EXTERNAL] Re: JDK 12 build issues

2020-03-18 Thread Chris Mattmann
Date: Wednesday, March 18, 2020 at 2:35 AM To: "dev@tika.apache.org" Subject: [EXTERNAL] Re: JDK 12 build issues Haven’t tried...we should add java 12-14 to Jenkins. Wait, are we up to 18 yet... Will look into it... On Tue, Mar 17, 2020 at 10:07 PM Chris Mattmann wrote:

JDK 12 build issues

2020-03-17 Thread Chris Mattmann
Hey Tim et al., Do the tests fail for you with Java 12? [INFO] Running org.apache.tika.parser.pkg.GzipParserTest [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.397 s - in org.apache.tika.parser.pkg.GzipParserTest [INFO] Running org.apache.tika.TestXMLEntityExpa

Re: [EXTERNAL] question about Tika

2020-02-10 Thread Chris Mattmann
Thanks.  Please make sure dev@tika.apache.org is where you are addressing  these questions to. From: Max Franklin Date: Monday, February 10, 2020 at 10:59 AM To: Chris Mattmann Subject: Re: [EXTERNAL] question about Tika Hi Chris, The Tika Server seems to work okay for me

FW: [EXTERNAL] question about Tika

2020-02-10 Thread Chris Mattmann
Max,  does Tika Server work OK for you? Is there a different behavior with Tika Python than simply posting the PDF to Tika server? Try first and then I am redirecting you to the Tika dev list for help. Thanks, Chris From: Max Franklin Date: Monday, February 10, 2020 at 9:37 AM T

Re: [EXTERNAL] Regarding unicodeencode Error

2020-01-08 Thread Chris Mattmann
OK can you please post an issue http://issues.apache.org/jira/browse/TIKA and attach your document and specific error? Thanks! From: "Gowda,Sumanth" Date: Wednesday, January 8, 2020 at 9:36 PM To: Chris Mattmann Subject: RE: [EXTERNAL] Regarding unicodeencode Error T

Re: [EXTERNAL] Do we have a community supported approach for deploying Tika Server in production?

2020-01-08 Thread Chris Mattmann
browse/TIKA-3010>> >>> > >>> > And a WIP progress PR is at https://github.com/apache/tika/pull/305 <https://github.com/apache/tika/pull/305> < https://github.com/apache/tika/pull/305 < https://github.com/apache/tika/pull/305>> >>&g

Re: [EXTERNAL] Regarding unicodeencode Error

2020-01-08 Thread Chris Mattmann
Hi Sumanth, Are you using Tika Python? Or plain Tika in Java? Can you file a ticket and share the PDF? Cheers, Chris From: "Gowda,Sumanth" Date: Wednesday, January 8, 2020 at 12:58 AM To: "Mattmann, Chris A (US 1760)" Subject: [EXTERNAL] Regarding unicodeencode Error

Re: [EXTERNAL] Do we have a community supported approach for deploying Tika Server in production?

2019-12-04 Thread Chris Mattmann
Thanks for bringing this conversation up Eric. Historically if you look over the last 5 years, I think what you are asking below has sort of already become the de facto truth. Most people are in fact using Tika server, whether they are individual devs, govvies, commercial folk and the like.

Re: [EXTERNAL] Docker image along with 1.23?

2019-11-20 Thread Chris Mattmann
the existing Dockerfile that LogicalSpark has published. I don’t know how other projects at ASF handle the image publishing. On Nov 20, 2019, at 7:02 PM, Chris Mattmann wrote: Nick, TBH, I don’t get it. If we ship the “Dockerfile” we are simply shipping text file, code. Under a l

Re: [EXTERNAL] Re: Docker image along with 1.23?

2019-11-20 Thread Chris Mattmann
Nick, TBH, I don’t get it. If we ship the “Dockerfile” we are simply shipping text file, code. Under a license. If we create a “docker image” and then publish it to the ASF hub then I agree with you. My suggestion and my interpretation of Tim’s is to ship a standard “Dockerfile”. Do you ag

Re: [EXTERNAL] Tika 1.23?

2019-11-20 Thread Chris Mattmann
+1 ship it From: Tim Allison Reply-To: "dev@tika.apache.org" , "Allison, Timothy B (US 1760-Affiliate)" Date: Wednesday, November 20, 2019 at 9:07 AM To: "" Subject: [EXTERNAL] Tika 1.23? All, I've abandoned hope of getting the contenthandler factory configuration stuff into

Re: [EXTERNAL] How to set the page segmentation for TIKA python

2019-11-13 Thread Chris Mattmann
Hi Aswathi, Please check with dev@tika.apache.org. Cheers, Chris From: Aswathi Nambiar Date: Wednesday, November 13, 2019 at 7:39 AM To: "Mattmann, Chris A (US 1760)" Subject: [EXTERNAL] How to set the page segmentation for TIKA python Hi Chris, I am using Apache TI

Re: [EXTERNAL] Extracting font information from xml

2019-10-15 Thread Chris Mattmann
Hi Jay, yes, I believe so. Tika Python is just a thin client to Tika Server and it provides this functionality. CC’ing dev@tika From: Jay Chuk Date: Tuesday, October 15, 2019 at 3:47 PM To: "Mattmann, Chris A (US 1761)" Subject: [EXTERNAL] Extracting font information from xml Hi Ch

Re: [EXTERNAL] Extracting font information from xml

2019-10-15 Thread Chris Mattmann
When you do a parse, do this: from tika import parser parsed = parser.from_file(‘/path/to/file’, xmlContent=True) xmlContent = parsed[“content”] print(xmlContent) G’luck! Cheers Chris From: Jay Chuk Date: Tuesday, October 15, 2019 at 3:54 PM To: Chris Mattmann Cc

Re: [EXTERNAL] Tika Python questions

2019-10-08 Thread Chris Mattmann
Hi, Thanks for your question. Yes, the same way you set the byte size property in Tika-App (I think through parser configuration) is how you would do it for Tika-Server. You would just start the Tika Server yourself with a custom config file that set this property and then start it on the d

Re: [EXTERNAL] Urgent!!! Tika-python

2019-08-19 Thread Chris Mattmann
I was able to compress the files in a single zip file and extract, this worked but the extracted text where saved in a single file, i need the files to be saved in their individual files so I can use them as input to another program. Please what is the best method to go about this. Thank

Re: [EXTERNAL] TIKA

2019-08-11 Thread Chris Mattmann
Victor, please send your email to dev@tika.apache.org, which I’ve CC’ed… From: Victor Olaiya Date: Tuesday, August 6, 2019 at 1:37 PM To: "Mattmann, Chris A (US 1761)" Subject: [EXTERNAL] TIKA Hello chris, I am building an information retrieval system and i need apache tika to auto

Re: [EXTERNAL] Re: Merge flow

2019-07-10 Thread Chris Mattmann
I’ve also got some new stuff I’m getting ready to contribute, in the following ML/Deep Learning areas: Some Basic models using Tensorflow stable 1.13 CIFAR-10 image classifier using a CNN ~86% accuracy – obviously different than Inception-v3/v4 and VGG-16 which we currently have available, but

Re: [EXTERNAL] Re: Tika 1.22?

2019-06-25 Thread Chris Mattmann
Looks good… From: Oleg Tikhonov Reply-To: "dev@tika.apache.org" Date: Tuesday, June 25, 2019 at 7:57 AM To: "dev@tika.apache.org" Subject: [EXTERNAL] Re: Tika 1.22? Would be great!!! Cheers, Oleg On Tue, Jun 25, 2019, 17:45 Tim Allison wrote: All, The vote for the ne

Re: [EXTERNAL] Re: DL4JVGG16NetTest failures

2019-05-08 Thread Chris Mattmann
ling to confirm that my commit/fix is sane, I'd appreciate it. Thank you!!! Cheers, Tim On Wed, May 8, 2019 at 11:32 AM Chris Mattmann wrote: Thejan, Thamme any ideas? From: Tim Allison R

Re: [EXTERNAL] Re: DL4JVGG16NetTest failures

2019-05-08 Thread Chris Mattmann
On Wed, May 8, 2019 at 11:32 AM Chris Mattmann wrote: Thejan, Thamme any ideas? From: Tim Allison Reply-To: "dev@tika.apache.org" Date: Wednesday, May 8, 2019 at 7:50 AM To: "dev@tika.apache.org" Subject: [EXTERNAL] Re: DL4JVGG16N

Re: [EXTERNAL] DL4JVGG16NetTest failures

2019-05-08 Thread Chris Mattmann
I will test this out From: Tim Allison Reply-To: "dev@tika.apache.org" Date: Wednesday, May 8, 2019 at 6:58 AM To: "dev@tika.apache.org" Subject: [EXTERNAL] DL4JVGG16NetTest failures All, Apologies for the broken builds...I'm not able to reproduce this test failure on my mac or

Re: [EXTERNAL] Re: DL4JVGG16NetTest failures

2019-05-08 Thread Chris Mattmann
Thejan, Thamme any ideas? From: Tim Allison Reply-To: "dev@tika.apache.org" Date: Wednesday, May 8, 2019 at 7:50 AM To: "dev@tika.apache.org" Subject: [EXTERNAL] Re: DL4JVGG16NetTest failures Any recommendations? java.lang.IllegalStateException: Number of indices (got 2) must

Re: [EXTERNAL] Tika script

2019-04-26 Thread Chris Mattmann
Hi, This would be a good question to ask on the dev@tika.a.o list so I’m CC’ing them. Cheers, Chris From: Djari Imene Date: Friday, April 26, 2019 at 9:45 AM To: "Mattmann, Chris A (1761)" Subject: [EXTERNAL] Tika script Good evening sir I am writing you to request more infor

Re: [EXTERNAL] Wiki migration

2019-03-21 Thread Chris Mattmann
+1 from me! From: Konstantin Gribov Reply-To: "dev@tika.apache.org" Date: Thursday, March 21, 2019 at 10:02 AM To: "dev@tika.apache.org" Subject: [EXTERNAL] Wiki migration Hi, folks What do you think about starting wiki migration (from moin to confluence)? I can try it via

Re: 1.20?

2018-12-13 Thread Chris Mattmann
Roll forward! Yay! From: Tim Allison Reply-To: "dev@tika.apache.org" Date: Thursday, December 13, 2018 at 7:02 AM To: "dev@tika.apache.org" Subject: Re: 1.20? Reports are here: http://162.242.228.174/reports/tika_1_20-pre-rc1.zip I'm going to revert the mp4 parser, and comm

Re: 1.20?

2018-11-20 Thread Chris Mattmann
Love it and I can align tika-python with that too ☺ From: Tim Allison Reply-To: "dev@tika.apache.org" Date: Tuesday, November 20, 2018 at 3:04 PM To: "dev@tika.apache.org" Subject: 1.20? All, POI 4.0.1 will be out shortly with some important bug fixes. What would you all thin

Re: ***UNCHECKED*** Fwd: MODERATE for annou...@apache.org

2018-09-26 Thread Chris Mattmann
+1 from me please update the wiki once you do From: Tim Allison Reply-To: "dev@tika.apache.org" Date: Wednesday, September 26, 2018 at 5:47 AM To: "dev@tika.apache.org" Cc: Craig Russell Subject: Re: ***UNCHECKED*** Fwd: MODERATE for annou...@apache.org All, It is ok to includ

Re: 1.19.1?

2018-09-25 Thread Chris Mattmann
Sounds great! From: Tim Allison Reply-To: "dev@tika.apache.org" Date: Tuesday, September 25, 2018 at 9:40 AM To: "dev@tika.apache.org" Subject: Re: 1.19.1? Given the mp3 issue and some other items, let's go with 1.19.1 rc1 today or tomorrow? On Mon, Sep 24, 2018 at 3:07 PM Nick B

Re: 1.19.1?

2018-09-21 Thread Chris Mattmann
Let’s roll it…. From: Tim Allison Reply-To: "dev@tika.apache.org" Date: Wednesday, September 19, 2018 at 12:14 PM To: "dev@tika.apache.org" Subject: 1.19.1? The mp3 regression is bad. In hindsight, the Tika-eval reports were fairly clear on this but I did some self-hand-waving to

FW: Tika DjVu?

2018-07-31 Thread Chris Mattmann
From: KamilD Date: Tuesday, July 31, 2018 at 11:37 PM To: "dev-ow...@tika.apache.org" Subject: Tika DjVu? Helo, I'm trying to use tika for djvu but is problem. When using app version 1.14 I get empty result, but in version 1.18 I get: C:\Users\>java -jar D:\djvu\tika-app-1.1

Re: image recognition...how do the parts play together?

2018-07-06 Thread Chris Mattmann
s REST + Docker? The upkeep in tika-dl is nontrivial. On Fri, Jul 6, 2018 at 6:15 PM Chris Mattmann wrote: Tim, Thanks. There are multiple modes of integrating deep learning with Tika: The original mode: uses Thamme’s work on REST exposing Tensorflow and Docker to provi

Re: image recognition...how do the parts play together?

2018-07-06 Thread Chris Mattmann
Tim, Thanks. There are multiple modes of integrating deep learning with Tika: The original mode: uses Thamme’s work on REST exposing Tensorflow and Docker to provide a REST Service to Tika to allow for running Tensorflow DL models. We initially did Inception_v3, and a model by Madhav Sharan

Re: Tika 1.19?

2018-07-06 Thread Chris Mattmann
Once tika-dl works again with Inception v4, I’m good ☺ I’m working on adding some more models to tika-dl and other things but those can come after 1.19. Cheers, Chris From: Tim Allison Reply-To: "dev@tika.apache.org" Date: Friday, July 6, 2018 at 8:40 AM To: "dev@tika.apache.or

Re: Branch_1x build broke?

2018-05-24 Thread Chris Mattmann
ect: Re: Branch_1x build broke? Hey Chris, This is happening to me with Tesseract enabled but only on my MacBook. Are you running this on OSX? Been trying to get some time to dig into it as it works perfectly on my Windows and Linux setups. Cheers, Dave On Thu, 24

Branch_1x build broke?

2018-05-24 Thread Chris Mattmann
Tim, Are you seeing this? Results : Failed tests: PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103 pdf_haystack not found in: http://www.w3.org/1999/xhtml";> Outer_hayst

Welcome Thejan Wijesinghe as an Apache Tika PMC and committer!

2018-05-07 Thread Chris Mattmann
Welcome to Thejan Wijesinghe who has joined as a new Tika PMC member and committer! Please say a bit about yourself…thanks! Cheers, Chris

Re: rfc822 updates and 1.18

2018-04-06 Thread Chris Mattmann
Awesomeness From: "Allison, Timothy B." Reply-To: "dev@tika.apache.org" Date: Friday, April 6, 2018 at 11:30 AM To: "dev@tika.apache.org" Subject: rfc822 updates and 1.18 All, I made two updates to our handling of rfc822 files and reran the eval against what Tika 1.18-SNAPSHOT th

Re: message/news; charset=windows-1252 -> message/rfc822

2018-03-28 Thread Chris Mattmann
+1 From: Nick Burch Reply-To: "dev@tika.apache.org" Date: Wednesday, March 28, 2018 at 8:01 AM To: "dev@tika.apache.org" Subject: Re: message/news; charset=windows-1252 -> message/rfc822 On Wed, 28 Mar 2018, Allison, Timothy B. wrote: With the new mime patterns, we've gotten quite

R-Tika API Binding

2018-03-20 Thread Chris Mattmann
Hey Folks, Just found this R-Tika API binding: https://ropensci.github.io/rtika/articles/rtika_introduction.html Very cool! Updated the wiki with it. Cheers, Chris

Re: TIKA-1509 (2.x breaking parser change) - ready for first review!

2018-03-18 Thread Chris Mattmann
Completely agree, awesome job Nick. I will definitely try this week as well. Thank you! Sincerely, Chris On 3/18/18, 2:47 PM, "David Meikle" wrote: Nice one Nick! Will take a look this week. Cheers, Dave On 14 March 2018 at 17:38, Nick Burch wrote: > Hi

Re: Tika 1.18?

2018-03-07 Thread Chris Mattmann
Sounds good to me thanks Tim. Happy to line it up with PDF Box 2.0.9 On 3/7/18, 1:16 PM, "Allison, Timothy B." wrote: All, I think I've made the updates that I wanted to make sure got in to 1.18. It looks like PDFBox is going to start their release cycle shortly. Should we w

Re: Tika 1.18?

2018-03-01 Thread Chris Mattmann
Same: makes perfect sense to me and let's do it ( I just updated (finally) Tika Python down stream to be based on the 1.16 Tika, I guess I should get it based on 1.17 soon too ( https://github.com/chrismattmann/tika-python/blob/master/tika/__init__.py#L17 Cheers, Chris On 3/1/18, 5:16 AM, "Ni

Re: RE : Re: Issue with apache Tika

2018-02-24 Thread Chris Mattmann
No clue - Radhia - perhaps you can enlighten everyone..? On 2/23/18, 6:45 AM, "Allison, Timothy B." wrote: Um, no, that's not great. What's wrong with our current version? 😊 -Original Message----- From: Chris Mattmann [mailto:mattm...@apache.org]

Re: RE : Re: Issue with apache Tika

2018-02-22 Thread Chris Mattmann
Great to hear! From: radhia bezzine Date: Thursday, February 22, 2018 at 12:28 PM To: Chris Mattmann Subject: Re: RE : Re: Issue with apache Tika Hi Chris ! I fixed the issue ! it was not so complicated ! a problem of version ! the recent version doesn t work for me but the

Re: Issue with apache Tika

2018-02-22 Thread Chris Mattmann
Try UTF-8 encoding the URLs or the parameters themselves. If you are using Tika-Python, then use the Python encode library… Cheers, Chris From: radhia bezzine Date: Thursday, February 22, 2018 at 6:03 AM To: "Mattmann, Chris A (1761)" Subject: Issue with apache Tika Hello Dear

Re: Requesting Tika Wiki Page Edit Access

2018-02-17 Thread Chris Mattmann
Added! https://wiki.apache.org/tika/ContributorsGroup Feel free to edit the page From: Prerana Teligi Harapanahalli Math Date: Thursday, February 15, 2018 at 8:35 PM To: "dev@tika.apache.org" , "Mattmann, Chris A (1761)" Subject: Requesting Tika Wiki Page Edit Access preranathm

Re: Not-yet-broken breaking changes for Tika 2?

2018-02-07 Thread Chris Mattmann
eamfactory() method in TikaInputStream, so the user can implement an InputStreamFactory interface with a getInputStream method, if he does not want to pay a performance hit with temp files for everything. Luis Em 5 de fev de 2018 4:52 PM, "Chris Mattmann" escreveu:

Re: Not-yet-broken breaking changes for Tika 2?

2018-02-05 Thread Chris Mattmann
head, but as a start, why not? In short just run through the stream 2x ++++++ Chris Mattmann, Ph.D. Associate Chief Technology and Innovation Officer, OCIO Manager, Advanced IT Research and Open Source Proje

Re: Not-yet-broken breaking changes for Tika 2?

2018-02-05 Thread Chris Mattmann
wrote: > On Thu, 26 Oct 2017, Chris Mattmann wrote: >> On collision, the precedence order defines what key takes precedence and >> _overwrites_ the other. Overwrite is but one option (you could save *all* >> the values it’s a multi-valued key structure so…)

Re: relying on a non-Maven central repo?

2018-02-05 Thread Chris Mattmann
ed to OSSRH and synced On 2/5/18, 9:01 AM, "Chris Mattmann" wrote: Hmmm...the problem here is that Sonatype won't let us publish to Central with the below. It's not even an ASF policy thing - it's a Sonatype thing On 2/5/18, 5:55 AM

Re: relying on a non-Maven central repo?

2018-02-05 Thread Chris Mattmann
Hmmm...the problem here is that Sonatype won't let us publish to Central with the below. It's not even an ASF policy thing - it's a Sonatype thing On 2/5/18, 5:55 AM, "Allison, Timothy B." wrote: Sorry for the duplication, but I wanted to check on this and didn't want it to get lost in a

Re: steps for Tika 2.0

2017-12-13 Thread Chris Mattmann
LGTM On 12/13/17, 5:51 AM, "Allison, Timothy B." wrote: All, I just created branch_1x, where we can put bug fixes and anything else we want to go into 1.17.1 or 1.18. Unless there are objections, I’m going to start making some radical changes to master to prep for 2.0.0-BETA ov

Re: [RESULT] [VOTE] Release Apache Tika 1.17 Candidate #2

2017-12-13 Thread Chris Mattmann
Great job Tim. Sorry I didn’t have time to test it. I’d like to get a simple Tika-Python integration test as some validation of 1.17 I’ll try today and see if I can post results too. Would be great to have this become a standard part of the release process like the regression tests have also bec

Re: 1.17 rc1 and two repos in nexus?!

2017-12-08 Thread Chris Mattmann
d two repos in nexus?! Do we expect only the src to be in nexus, not the jar artifacts (with sigs and digests) for app, server, eval? -Original Message----- From: Chris Mattmann [mailto:mattm...@apache.org] Sent: Friday, December 8, 2017 5:07 PM To: dev@tika.apache.org S

Re: 1.17 rc1 and two repos in nexus?!

2017-12-08 Thread Chris Mattmann
Hey Tim, probably just upload errors on the first one and so it tried again. No worries. Drop and close the first, and just use the 2nd. Cheers, Chris On 12/8/17, 12:05 PM, "Allison, Timothy B." wrote: Not sure what happened, but two repos were created in Nexus: https://reposit

Re: Tika 1.17?

2017-11-29 Thread Chris Mattmann
vember 2017 at 15:19, Mattmann, Chris A (3010) < > chris.a.mattm...@jpl.nasa.gov> wrote: > > > Let’s make it so ( > > > > > ++ > > Chris Mattmann, Ph.D.

Re: Not-yet-broken breaking changes for Tika 2?

2017-10-26 Thread Chris Mattmann
On collision, the precedence order defines what key takes precedence and _overwrites_ the other. Overwrite is but one option (you could save *all* the values it’s a multi-valued key structure so…) Cheers, Chris On 10/26/17, 9:43 AM, "Nick Burch" wrote: On Thu, 26 Oct 2

Re: Not-yet-broken breaking changes for Tika 2?

2017-10-26 Thread Chris Mattmann
maybe in tika-config.xml would be a fine start. On 10/26/17, 9:14 AM, "Nick Burch" wrote: On Thu, 26 Oct 2017, Chris Mattmann wrote: > Why don’t we just store N copies of the stream, and parse it twice? I'm not sure that's the challenge though? Using

Re: Not-yet-broken breaking changes for Tika 2?

2017-10-26 Thread Chris Mattmann
Why don’t we just store N copies of the stream, and parse it twice? Of course that’s the ugly way, but currently the way I’ve hacked this in all of my projects is simply to call Tika N times OUTSIDE of Tika. Why don’t we just use that as the weakest baseline and work backwards from there? Chris

Re: [DISCUSS] Enable specific ContentHandler for tika-server

2017-10-24 Thread Chris Mattmann
This makes sense to me, +1 Giuseppe! On 10/24/17, 6:12 PM, "Giuseppe Totaro" wrote: Hi folks, I am developing the proposed solutions within tika-server for enabling specific ContentHandlers. Basically, I am working to provide the ability of giving the name of the ContentHa

Re: Announcing go-tika, a Go package for Tika

2017-10-06 Thread Chris Mattmann
I saw this Tyler, and it’s awesome. I forked it already though I’m not a Go programmer thank you for increasing the community here ( CC’ing Jim Jag who I know has done some Go programming, Jim spread the word ;) Cheers, Chris On 10/6/17, 10:12 AM, "Tyler Bui-Palsulich" wrote: (Bumping

Re: [DISCUSS] Enable specific ContentHandler for tika-server

2017-09-28 Thread Chris Mattmann
anyway, having a way to specify a handler there can be handy too... Cheers, Sergey On 28/09/17 22:17, Chris Mattmann wrote: > I am +1 for this. Option #2 sounds like a slick way to handle this for me that would > remain back compat with tika-python which is of stron

Re: [DISCUSS] Enable specific ContentHandler for tika-server

2017-09-28 Thread Chris Mattmann
I am +1 for this. Option #2 sounds like a slick way to handle this for me that would remain back compat with tika-python which is of strong interest to me. Cheers, Chris On 9/28/17, 1:35 PM, "Giuseppe Totaro" wrote: Hi folks, if I am not wrong, currently you cannot configure a

Re: TikaIO concerns

2017-09-22 Thread Chris Mattmann
[dropping Beam on this] Tim, another thing is that you can finally download the TREC-DD Polar data either from the NSF Arctic Data Center (70GB zip), or from Amazon S3, as described here: http://github.com/chrismattmann/trec-dd-polar/ In case we want to use as part of our regression. Cheers,

Re: TikaIO concerns

2017-09-21 Thread Chris Mattmann
Hi all, One other thing is that Tika extracts metadata, and language information in which order doesn’t matter (Keys can be out of order). Would this be useful? Cheers, Chris On 9/21/17, 2:10 PM, "Sergey Beryozkin" wrote: Hi Eugene Thank you, very helpful, let me read it few

Re: Integrating Tika with Apache Beam

2017-09-21 Thread Chris Mattmann
te a new > instance of TikaIO pipeline, and point it to the new temp folder where a > new batch of files has been dropped to. > > Thanks, Sergey > On 11/09/17 22:41, Mattmann, Chris A (3010) wrote: >> Amazing work, thank you Sergey!! >> >&g

Re: Tika 2.0?

2017-09-12 Thread Chris Mattmann
0 branch is so I defer to Tim on the risk of going with #1. - Bob On 9/11/2017 5:15 PM, Chris Mattmann wrote: > +1000 > > > > On 9/11/17, 12:03 PM, "Allison, Timothy B." wrote: > > Y, well, I didn't say _

Re: Tika 2.0?

2017-09-11 Thread Chris Mattmann
+1000 On 9/11/17, 12:03 PM, "Allison, Timothy B." wrote: Y, well, I didn't say _which_ September... Given my limited availability to work on this in Sept and POI's decision to move to Java 1.8, I propose releasing Tika 1.17 after the release of POI 3.17 and PDFBox 2.0.8. This w

Re: [ANNOUNCE] Welcome Madhav Sharan as Tika Committer and PMC Member

2017-08-31 Thread Chris Mattmann
Welcome Madhav! Cheers, Chris On 8/31/17, 12:29 PM, "loo...@gmail.com on behalf of Dave Meikle" wrote: Hello Everyone, Please join me in welcoming Madhav Sharan as a PMC Members and Committer to the project! Welcome to the team, Madhav. Feel free to say a bit about

Re: Query related to Apache Tika dependencies

2017-08-08 Thread Chris Mattmann
From: Deepanshu Bhardwaj Date: Tuesday, August 8, 2017 at 2:53 AM To: "dev-ow...@tika.apache.org" Subject: Query related to Apache Tika dependencies Hi Team, I need one help. I need to know the list of libraries (jar files) that are being used in apache tika app 1.14 jar as t

Re: [VOTE] Release Apache Tika 1.16 Candidate #1

2017-07-08 Thread Chris Mattmann
+1 from me SIGS and CHECKSUMS look good. Thanks Tim! Cheers, Chris LMC-053601:apache-tika-1.16-rc1 mattmann$ for type in "" \-app \-eval \-server; do $HOME/bin/stage_apache_rc tika$type 1.16 https://dist.apache.org/repos/dist/dev/tika/; done % Total% Received % Xferd Average Speed Ti

  1   2   3   >