[
https://issues.apache.org/jira/browse/TIKA-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708146#comment-17708146
]
Chris Mattmann commented on TIKA-4009:
--
ugh, one more time, not `geo.topic`, ins
[
https://issues.apache.org/jira/browse/TIKA-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708144#comment-17708144
]
Chris Mattmann commented on TIKA-4009:
--
Forgot the config, file, fixed in
[
https://issues.apache.org/jira/browse/TIKA-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Mattmann resolved TIKA-4009.
--
Resolution: Fixed
Fixed:
{noformat}
(base) mattmann@proscuitto:~/git/tika$ git commit -m
[
https://issues.apache.org/jira/browse/TIKA-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17708070#comment-17708070
]
Chris Mattmann commented on TIKA-4009:
--
OK, I have a patch and commit forthco
Chris Mattmann created TIKA-4009:
Summary: GeoTopic Parser package changed incorrectly from
o.a.t.parser.geo from o.a.t.parser.geo.topic
Key: TIKA-4009
URL: https://issues.apache.org/jira/browse/TIKA-4009
[
https://issues.apache.org/jira/browse/TIKA-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Mattmann reassigned TIKA-4009:
Assignee: Chris Mattmann
> GeoTopic Parser package changed incorrectly f
[
https://issues.apache.org/jira/browse/TIKA-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Mattmann updated TIKA-3439:
-
Issue Type: New Feature (was: Bug)
> Create new TensorFlow2 backed Tika NLP docker
[
https://issues.apache.org/jira/browse/TIKA-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Mattmann reassigned TIKA-3439:
Assignee: Chris Mattmann
> Create new TensorFlow2 backed Tika NLP docker
Chris Mattmann created TIKA-3439:
Summary: Create new TensorFlow2 backed Tika NLP docker for
SentimentAnalysis
Key: TIKA-3439
URL: https://issues.apache.org/jira/browse/TIKA-3439
Project: Tika
Hannah, I am pushing your question upstream to the dev@tika list. I think what
you need is for them to look
at your config file which I’ve reattached below pasted, and then see if it
looks ok. Then in Tika Python you need
to give it this config file before your server starts up or outside of Pyth
[
https://issues.apache.org/jira/browse/TIKA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338675#comment-17338675
]
Chris Mattmann commented on TIKA-94:
[~lewismc] congratulations! What an accomplish
[
https://issues.apache.org/jira/browse/TIKA-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Mattmann resolved TIKA-3329.
--
Resolution: Fixed
Merged into main! Thanks [~thammegowda]!
{noformat}
(base) mattmann
[
https://issues.apache.org/jira/browse/TIKA-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Mattmann updated TIKA-3329:
-
Fix Version/s: 2.0.0
> RTG Translator with many-to-eng translat
[
https://issues.apache.org/jira/browse/TIKA-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Mattmann updated TIKA-3329:
-
Labels: memex (was: )
> RTG Translator with many-to-eng translat
[
https://issues.apache.org/jira/browse/TIKA-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Mattmann reassigned TIKA-3329:
Assignee: Chris Mattmann (was: Thamme Gowda)
> RTG Translator with many-to-
Hi Manish, I think you should ask this one upstream on the Tika Dev lists. I’ve
cc’ed them for you.
From: manish mathur
Date: Monday, March 15, 2021 at 4:41 AM
To:
Subject: Re: Python-tika: issues related to memory consumption
Hi Chris,
I am using python-tika library to
l.com"
Subject: Help in tika-python
Hello Chris Mattmann,
I installed your library, it works perfectly. I wonder if it possible to find
the position (bounding boxes ) of the texts and images on ppt files.
And to discorver which page de of the slides that texts come from.
Thanks
Nilton
Copying the Tika dev list where I think you will find the help you are looking
for 😊
From: Mariusz G
Date: Wednesday, December 16, 2020 at 7:04 AM
To: "Mattmann, Chris A (US 1740)"
Subject: [EXTERNAL] Tika - problem with Polish encoding
Hello Sir,
I'm writing to you because I tri
Welcome Peter! 😊
From: Peter Lee
Reply-To:
Date: Wednesday, November 25, 2020 at 6:08 PM
To: "dev@tika.apache.org" , "talli...@apache.org"
Cc: "u...@tika.apache.org"
Subject: Re: [ANNOUNCE] Welcome Peter Lee as Tika PMC member and committer
Many thanks to you, Tim. :)
Hi, all
Christian thank you for reaching out. I am copying dev@tika.apache.org as
I think your question is best directed there since tika python is downstream
of the processing that happens there.
Best of luck!
Cheers
Chris
From: Christian Faggionato
Date: Tuesday, November 24, 2020 at 1
Thanks for reaching out Aditya and for using Tika Python. This issue is
best solved upstream in dev@tika.apache.org so I am copying that list
and making it the reply to.
The issue likely lies in the PDFBox algorithm. There are PDFBox folks on
this list. They can help you. Hopefully there is a
Haha I’m down and supportive!
Time’s TIME FOR 2.x 😊
From: Tim Allison
Reply-To: "dev@tika.apache.org" , "Allison, Tim (US
174B-Affiliate)"
Date: Friday, August 14, 2020 at 6:06 AM
To: ""
Subject: [EXTERNAL] Tika 2.0 modularization
All,
I _think_ I might have some time to
[
https://issues.apache.org/jira/browse/TIKA-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140963#comment-17140963
]
Chris Mattmann commented on TIKA-3119:
--
[~agibsonccc] can you help see a
How about just development?
We use that on OODT … though we have a master too that needs to get
removed …
From: Tim Allison
Reply-To: "dev@tika.apache.org" , "Allison, Tim (US
1740-Affiliate)"
Date: Tuesday, June 16, 2020 at 10:31 AM
To: ""
Subject: [EXTERNAL] renaming master?
[
https://issues.apache.org/jira/browse/TIKA-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091708#comment-17091708
]
Chris Mattmann commented on TIKA-3093:
--
yea we have lots of pipelines with OODT
Yes, some of us have been developing an Elastic scaling stack for Tika server…
That does just that with AWS. Don’t have it ready to push upstream yet.
Cheers,
Chris
From: Eric Pugh
Reply-To: "dev@tika.apache.org"
Date: Thursday, April 16, 2020 at 7:09 AM
To: "dev@tika.apache.org"
S
[
https://issues.apache.org/jira/browse/TIKA-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076659#comment-17076659
]
Chris Mattmann commented on TIKA-2368:
--
I have a TensorFlow version of Senti
eleases.
Keep you updated.
Cheers,
Oleg
On Wed, Mar 18, 2020 at 4:35 PM Chris Mattmann wrote:
So I was able to get past my issues with Tesseract by reinstalling the
latest version with Brew.
I have a new issue!
I’ve tried in JDK12 and JDK13 to build tika-dl, but
Date: Wednesday, March 18, 2020 at 2:35 AM
To: "dev@tika.apache.org"
Subject: [EXTERNAL] Re: JDK 12 build issues
Haven’t tried...we should add java 12-14 to Jenkins.
Wait, are we up to 18 yet...
Will look into it...
On Tue, Mar 17, 2020 at 10:07 PM Chris Mattmann wrote:
Hey Tim et al.,
Do the tests fail for you with Java 12?
[INFO] Running org.apache.tika.parser.pkg.GzipParserTest
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.397 s
- in org.apache.tika.parser.pkg.GzipParserTest
[INFO] Running org.apache.tika.TestXMLEntityExpa
Thanks. Please make sure dev@tika.apache.org is where you are addressing
these questions to.
From: Max Franklin
Date: Monday, February 10, 2020 at 10:59 AM
To: Chris Mattmann
Subject: Re: [EXTERNAL] question about Tika
Hi Chris,
The Tika Server seems to work okay for me
Max, does Tika Server work OK for you? Is there a different behavior with Tika
Python than simply posting the PDF to Tika server? Try first and then I am
redirecting
you to the Tika dev list for help.
Thanks,
Chris
From: Max Franklin
Date: Monday, February 10, 2020 at 9:37 AM
T
OK can you please post an issue http://issues.apache.org/jira/browse/TIKA and
attach your
document and specific error? Thanks!
From: "Gowda,Sumanth"
Date: Wednesday, January 8, 2020 at 9:36 PM
To: Chris Mattmann
Subject: RE: [EXTERNAL] Regarding unicodeencode Error
T
browse/TIKA-3010>>
>>> >
>>> > And a WIP progress PR is at https://github.com/apache/tika/pull/305
<https://github.com/apache/tika/pull/305> <
https://github.com/apache/tika/pull/305 <
https://github.com/apache/tika/pull/305>>
>>&g
Hi Sumanth,
Are you using Tika Python? Or plain Tika in Java?
Can you file a ticket and share the PDF?
Cheers,
Chris
From: "Gowda,Sumanth"
Date: Wednesday, January 8, 2020 at 12:58 AM
To: "Mattmann, Chris A (US 1760)"
Subject: [EXTERNAL] Regarding unicodeencode Error
Thanks for bringing this conversation up Eric.
Historically if you look over the last 5 years, I think what you are asking
below has sort of already become the de facto
truth. Most people are in fact using Tika server, whether they are individual
devs, govvies, commercial folk and the like.
the existing Dockerfile that LogicalSpark has published.
I don’t know how other projects at ASF handle the image publishing.
On Nov 20, 2019, at 7:02 PM, Chris Mattmann wrote:
Nick, TBH, I don’t get it. If we ship the “Dockerfile” we are simply shipping
text file,
code. Under a l
Nick, TBH, I don’t get it. If we ship the “Dockerfile” we are simply shipping
text file,
code. Under a license. If we create a “docker image” and then publish it to the
ASF
hub then I agree with you.
My suggestion and my interpretation of Tim’s is to ship a standard
“Dockerfile”. Do you
ag
+1 ship it
From: Tim Allison
Reply-To: "dev@tika.apache.org" , "Allison, Timothy B (US
1760-Affiliate)"
Date: Wednesday, November 20, 2019 at 9:07 AM
To: ""
Subject: [EXTERNAL] Tika 1.23?
All,
I've abandoned hope of getting the contenthandler factory configuration
stuff into
Hi Aswathi,
Please check with dev@tika.apache.org.
Cheers,
Chris
From: Aswathi Nambiar
Date: Wednesday, November 13, 2019 at 7:39 AM
To: "Mattmann, Chris A (US 1760)"
Subject: [EXTERNAL] How to set the page segmentation for TIKA python
Hi Chris,
I am using Apache TI
Hi Jay, yes, I believe so. Tika Python is just a thin client to Tika Server and
it
provides this functionality. CC’ing dev@tika
From: Jay Chuk
Date: Tuesday, October 15, 2019 at 3:47 PM
To: "Mattmann, Chris A (US 1761)"
Subject: [EXTERNAL] Extracting font information from xml
Hi Ch
When you do a parse, do this:
from tika import parser
parsed = parser.from_file(‘/path/to/file’, xmlContent=True)
xmlContent = parsed[“content”]
print(xmlContent)
G’luck!
Cheers
Chris
From: Jay Chuk
Date: Tuesday, October 15, 2019 at 3:54 PM
To: Chris Mattmann
Cc
Hi,
Thanks for your question. Yes, the same way you set the byte size property in
Tika-App (I think through
parser configuration) is how you would do it for Tika-Server. You would just
start the Tika Server yourself
with a custom config file that set this property and then start it on the
d
I was able to compress the files in a single zip file and extract, this worked
but the extracted text where saved in a single file, i need the files to be
saved in their individual files so I can use them as input to another program.
Please what is the best method to go about this.
Thank
Victor, please send your email to dev@tika.apache.org, which I’ve CC’ed…
From: Victor Olaiya
Date: Tuesday, August 6, 2019 at 1:37 PM
To: "Mattmann, Chris A (US 1761)"
Subject: [EXTERNAL] TIKA
Hello chris,
I am building an information retrieval system and i need apache tika to auto
I’ve also got some new stuff I’m getting ready to contribute, in the following
ML/Deep Learning
areas:
Some Basic models using Tensorflow stable 1.13
CIFAR-10 image classifier using a CNN ~86% accuracy – obviously different
than Inception-v3/v4 and VGG-16 which we currently have available, but
Looks good…
From: Oleg Tikhonov
Reply-To: "dev@tika.apache.org"
Date: Tuesday, June 25, 2019 at 7:57 AM
To: "dev@tika.apache.org"
Subject: [EXTERNAL] Re: Tika 1.22?
Would be great!!!
Cheers,
Oleg
On Tue, Jun 25, 2019, 17:45 Tim Allison wrote:
All,
The vote for the ne
ling to confirm
that my commit/fix is sane, I'd appreciate it. Thank you!!!
Cheers,
Tim
On Wed, May 8, 2019 at 11:32 AM Chris Mattmann
wrote:
Thejan, Thamme any ideas?
From: Tim Allison
R
On Wed, May 8, 2019 at 11:32 AM Chris Mattmann wrote:
Thejan, Thamme any ideas?
From: Tim Allison
Reply-To: "dev@tika.apache.org"
Date: Wednesday, May 8, 2019 at 7:50 AM
To: "dev@tika.apache.org"
Subject: [EXTERNAL] Re: DL4JVGG16N
I will test this out
From: Tim Allison
Reply-To: "dev@tika.apache.org"
Date: Wednesday, May 8, 2019 at 6:58 AM
To: "dev@tika.apache.org"
Subject: [EXTERNAL] DL4JVGG16NetTest failures
All,
Apologies for the broken builds...I'm not able to reproduce this
test failure on my mac or
Thejan, Thamme any ideas?
From: Tim Allison
Reply-To: "dev@tika.apache.org"
Date: Wednesday, May 8, 2019 at 7:50 AM
To: "dev@tika.apache.org"
Subject: [EXTERNAL] Re: DL4JVGG16NetTest failures
Any recommendations?
java.lang.IllegalStateException: Number of indices (got 2) must
Hi,
This would be a good question to ask on the dev@tika.a.o list so I’m CC’ing
them.
Cheers,
Chris
From: Djari Imene
Date: Friday, April 26, 2019 at 9:45 AM
To: "Mattmann, Chris A (1761)"
Subject: [EXTERNAL] Tika script
Good evening sir I am writing you to request more infor
+1 from me!
From: Konstantin Gribov
Reply-To: "dev@tika.apache.org"
Date: Thursday, March 21, 2019 at 10:02 AM
To: "dev@tika.apache.org"
Subject: [EXTERNAL] Wiki migration
Hi, folks
What do you think about starting wiki migration (from moin to confluence)?
I can try it via
Roll forward! Yay!
From: Tim Allison
Reply-To: "dev@tika.apache.org"
Date: Thursday, December 13, 2018 at 7:02 AM
To: "dev@tika.apache.org"
Subject: Re: 1.20?
Reports are here:
http://162.242.228.174/reports/tika_1_20-pre-rc1.zip
I'm going to revert the mp4 parser, and comm
Love it and I can align tika-python with that too ☺
From: Tim Allison
Reply-To: "dev@tika.apache.org"
Date: Tuesday, November 20, 2018 at 3:04 PM
To: "dev@tika.apache.org"
Subject: 1.20?
All,
POI 4.0.1 will be out shortly with some important bug fixes. What would
you all thin
+1 from me please update the wiki once you do
From: Tim Allison
Reply-To: "dev@tika.apache.org"
Date: Wednesday, September 26, 2018 at 5:47 AM
To: "dev@tika.apache.org"
Cc: Craig Russell
Subject: Re: ***UNCHECKED*** Fwd: MODERATE for annou...@apache.org
All,
It is ok to includ
Sounds great!
From: Tim Allison
Reply-To: "dev@tika.apache.org"
Date: Tuesday, September 25, 2018 at 9:40 AM
To: "dev@tika.apache.org"
Subject: Re: 1.19.1?
Given the mp3 issue and some other items, let's go with 1.19.1 rc1
today or tomorrow?
On Mon, Sep 24, 2018 at 3:07 PM Nick B
Let’s roll it….
From: Tim Allison
Reply-To: "dev@tika.apache.org"
Date: Wednesday, September 19, 2018 at 12:14 PM
To: "dev@tika.apache.org"
Subject: 1.19.1?
The mp3 regression is bad. In hindsight, the Tika-eval reports were fairly
clear on this but I did some self-hand-waving to
From: KamilD
Date: Tuesday, July 31, 2018 at 11:37 PM
To: "dev-ow...@tika.apache.org"
Subject: Tika DjVu?
Helo,
I'm trying to use tika for djvu but is problem.
When using app version 1.14 I get empty result, but in version 1.18 I get:
C:\Users\>java -jar D:\djvu\tika-app-1.1
s REST + Docker? The upkeep in tika-dl
is nontrivial.
On Fri, Jul 6, 2018 at 6:15 PM Chris Mattmann wrote:
Tim,
Thanks. There are multiple modes of integrating deep learning with Tika:
The original mode: uses Thamme’s work on REST exposing Tensorflow
and Docker to provi
Tim,
Thanks. There are multiple modes of integrating deep learning with Tika:
The original mode: uses Thamme’s work on REST exposing Tensorflow
and Docker to provide a REST Service to Tika to allow for running Tensorflow
DL models. We initially did Inception_v3, and a model by Madhav Sharan
Once tika-dl works again with Inception v4, I’m good ☺
I’m working on adding some more models to tika-dl and other things
but those can come after 1.19.
Cheers,
Chris
From: Tim Allison
Reply-To: "dev@tika.apache.org"
Date: Friday, July 6, 2018 at 8:40 AM
To: "dev@tika.apache.or
ect: Re: Branch_1x build broke?
Hey Chris,
This is happening to me with Tesseract enabled but only on my MacBook.
Are you running this on OSX?
Been trying to get some time to dig into it as it works perfectly on my
Windows and Linux setups.
Cheers,
Dave
On Thu, 24
Tim,
Are you seeing this?
Results :
Failed tests:
PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103
pdf_haystack not found in:
http://www.w3.org/1999/xhtml";>
Outer_hayst
Welcome to Thejan Wijesinghe who has joined as a new Tika PMC member and
committer!
Please say a bit about yourself…thanks!
Cheers,
Chris
Awesomeness
From: "Allison, Timothy B."
Reply-To: "dev@tika.apache.org"
Date: Friday, April 6, 2018 at 11:30 AM
To: "dev@tika.apache.org"
Subject: rfc822 updates and 1.18
All,
I made two updates to our handling of rfc822 files and reran the eval against
what Tika 1.18-SNAPSHOT th
+1
From: Nick Burch
Reply-To: "dev@tika.apache.org"
Date: Wednesday, March 28, 2018 at 8:01 AM
To: "dev@tika.apache.org"
Subject: Re: message/news; charset=windows-1252 -> message/rfc822
On Wed, 28 Mar 2018, Allison, Timothy B. wrote:
With the new mime patterns, we've gotten quite
Hey Folks,
Just found this R-Tika API binding:
https://ropensci.github.io/rtika/articles/rtika_introduction.html
Very cool! Updated the wiki with it.
Cheers,
Chris
Completely agree, awesome job Nick.
I will definitely try this week as well.
Thank you!
Sincerely,
Chris
On 3/18/18, 2:47 PM, "David Meikle" wrote:
Nice one Nick! Will take a look this week.
Cheers,
Dave
On 14 March 2018 at 17:38, Nick Burch wrote:
> Hi
Sounds good to me thanks Tim. Happy to line it up with PDF Box 2.0.9
On 3/7/18, 1:16 PM, "Allison, Timothy B." wrote:
All,
I think I've made the updates that I wanted to make sure got in to 1.18.
It looks like PDFBox is going to start their release cycle shortly. Should we
w
Same: makes perfect sense to me and let's do it ( I just updated (finally) Tika
Python down
stream to be based on the 1.16 Tika, I guess I should get it based on 1.17 soon
too (
https://github.com/chrismattmann/tika-python/blob/master/tika/__init__.py#L17
Cheers,
Chris
On 3/1/18, 5:16 AM, "Ni
No clue - Radhia - perhaps you can enlighten everyone..?
On 2/23/18, 6:45 AM, "Allison, Timothy B." wrote:
Um, no, that's not great. What's wrong with our current version? 😊
-Original Message-----
From: Chris Mattmann [mailto:mattm...@apache.org]
Great to hear!
From: radhia bezzine
Date: Thursday, February 22, 2018 at 12:28 PM
To: Chris Mattmann
Subject: Re: RE : Re: Issue with apache Tika
Hi Chris !
I fixed the issue ! it was not so complicated ! a problem of version ! the
recent version doesn t work for me but the
Try UTF-8 encoding the URLs or the parameters themselves. If you are using
Tika-Python, then use the Python
encode library…
Cheers,
Chris
From: radhia bezzine
Date: Thursday, February 22, 2018 at 6:03 AM
To: "Mattmann, Chris A (1761)"
Subject: Issue with apache Tika
Hello Dear
Added! https://wiki.apache.org/tika/ContributorsGroup
Feel free to edit the page
From: Prerana Teligi Harapanahalli Math
Date: Thursday, February 15, 2018 at 8:35 PM
To: "dev@tika.apache.org" , "Mattmann, Chris A (1761)"
Subject: Requesting Tika Wiki Page Edit Access
preranathm
eamfactory() method in TikaInputStream, so the
user can implement an InputStreamFactory interface with a getInputStream
method, if he does not want to pay a performance hit with temp files for
everything.
Luis
Em 5 de fev de 2018 4:52 PM, "Chris Mattmann"
escreveu:
head, but as a start, why not?
In short just run through the stream 2x
++++++
Chris Mattmann, Ph.D.
Associate Chief Technology and Innovation Officer, OCIO Manager, Advanced
IT Research and Open Source Proje
wrote:
> On Thu, 26 Oct 2017, Chris Mattmann wrote:
>> On collision, the precedence order defines what key takes precedence and
>> _overwrites_ the other. Overwrite is but one option (you could save
*all*
>> the values it’s a multi-valued key structure so…)
ed to OSSRH and synced
On 2/5/18, 9:01 AM, "Chris Mattmann" wrote:
Hmmm...the problem here is that Sonatype won't let us publish to Central
with
the below. It's not even an ASF policy thing - it's a Sonatype thing
On 2/5/18, 5:55 AM
Hmmm...the problem here is that Sonatype won't let us publish to Central with
the below. It's not even an ASF policy thing - it's a Sonatype thing
On 2/5/18, 5:55 AM, "Allison, Timothy B." wrote:
Sorry for the duplication, but I wanted to check on this and didn't want it
to get lost in a
LGTM
On 12/13/17, 5:51 AM, "Allison, Timothy B." wrote:
All,
I just created branch_1x, where we can put bug fixes and anything else we
want to go into 1.17.1 or 1.18. Unless there are objections, I’m going to
start making some radical changes to master to prep for 2.0.0-BETA ov
Great job Tim. Sorry I didn’t have time to test it. I’d like to get a simple
Tika-Python integration
test as some validation of 1.17 I’ll try today and see if I can post results
too. Would be great to
have this become a standard part of the release process like the regression
tests have also bec
d two repos in nexus?!
Do we expect only the src to be in nexus, not the jar artifacts (with sigs
and digests) for app, server, eval?
-Original Message-----
From: Chris Mattmann [mailto:mattm...@apache.org]
Sent: Friday, December 8, 2017 5:07 PM
To: dev@tika.apache.org
S
Hey Tim, probably just upload errors on the first one and so it tried again. No
worries. Drop and close
the first, and just use the 2nd.
Cheers,
Chris
On 12/8/17, 12:05 PM, "Allison, Timothy B." wrote:
Not sure what happened, but two repos were created in Nexus:
https://reposit
vember 2017 at 15:19, Mattmann, Chris A (3010) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
>
> > Let’s make it so (
> >
> >
> ++
> > Chris Mattmann, Ph.D.
On collision, the precedence order defines what key takes precedence and
_overwrites_ the
other. Overwrite is but one option (you could save *all* the values it’s a
multi-valued key structure
so…)
Cheers,
Chris
On 10/26/17, 9:43 AM, "Nick Burch" wrote:
On Thu, 26 Oct 2
maybe in tika-config.xml
would be a fine
start.
On 10/26/17, 9:14 AM, "Nick Burch" wrote:
On Thu, 26 Oct 2017, Chris Mattmann wrote:
> Why don’t we just store N copies of the stream, and parse it twice?
I'm not sure that's the challenge though? Using
Why don’t we just store N copies of the stream, and parse it twice?
Of course that’s the ugly way, but currently the way I’ve hacked this in all of
my projects is simply to call Tika N times OUTSIDE of Tika. Why don’t we just
use
that as the weakest baseline and work backwards from there?
Chris
This makes sense to me, +1 Giuseppe!
On 10/24/17, 6:12 PM, "Giuseppe Totaro" wrote:
Hi folks,
I am developing the proposed solutions within tika-server for enabling
specific ContentHandlers. Basically, I am working to provide the ability of
giving the name of the ContentHa
I saw this Tyler, and it’s awesome. I forked it already though I’m not a Go
programmer thank you
for increasing the community here (
CC’ing Jim Jag who I know has done some Go programming, Jim spread the word ;)
Cheers,
Chris
On 10/6/17, 10:12 AM, "Tyler Bui-Palsulich"
wrote:
(Bumping
anyway, having a way to specify a handler there can be handy too...
Cheers, Sergey
On 28/09/17 22:17, Chris Mattmann wrote:
> I am +1 for this. Option #2 sounds like a slick way to handle this for me
that would
> remain back compat with tika-python which is of stron
I am +1 for this. Option #2 sounds like a slick way to handle this for me that
would
remain back compat with tika-python which is of strong interest to me.
Cheers,
Chris
On 9/28/17, 1:35 PM, "Giuseppe Totaro" wrote:
Hi folks,
if I am not wrong, currently you cannot configure a
[dropping Beam on this]
Tim, another thing is that you can finally download the TREC-DD Polar data
either
from the NSF Arctic Data Center (70GB zip), or from Amazon S3, as described
here:
http://github.com/chrismattmann/trec-dd-polar/
In case we want to use as part of our regression.
Cheers,
Hi all,
One other thing is that Tika extracts metadata, and language information in
which order
doesn’t matter (Keys can be out of order).
Would this be useful?
Cheers,
Chris
On 9/21/17, 2:10 PM, "Sergey Beryozkin" wrote:
Hi Eugene
Thank you, very helpful, let me read it few
te a new
> instance of TikaIO pipeline, and point it to the new temp folder where a
> new batch of files has been dropped to.
>
> Thanks, Sergey
> On 11/09/17 22:41, Mattmann, Chris A (3010) wrote:
>> Amazing work, thank you Sergey!!
>>
>&g
0 branch is so I defer to Tim on the risk of going with #1.
- Bob
On 9/11/2017 5:15 PM, Chris Mattmann wrote:
> +1000
>
>
>
> On 9/11/17, 12:03 PM, "Allison, Timothy B." wrote:
>
> Y, well, I didn't say _
+1000
On 9/11/17, 12:03 PM, "Allison, Timothy B." wrote:
Y, well, I didn't say _which_ September...
Given my limited availability to work on this in Sept and POI's decision to
move to Java 1.8, I propose releasing Tika 1.17 after the release of POI 3.17
and PDFBox 2.0.8. This w
Welcome Madhav!
Cheers,
Chris
On 8/31/17, 12:29 PM, "loo...@gmail.com on behalf of Dave Meikle"
wrote:
Hello Everyone,
Please join me in welcoming Madhav Sharan as a PMC Members and Committer to
the project!
Welcome to the team, Madhav. Feel free to say a bit about
From: Deepanshu Bhardwaj
Date: Tuesday, August 8, 2017 at 2:53 AM
To: "dev-ow...@tika.apache.org"
Subject: Query related to Apache Tika dependencies
Hi Team,
I need one help. I need to know the list of libraries (jar files) that are
being used in apache tika app 1.14 jar as t
+1 from me SIGS and CHECKSUMS look good.
Thanks Tim!
Cheers,
Chris
LMC-053601:apache-tika-1.16-rc1 mattmann$ for type in "" \-app \-eval \-server;
do $HOME/bin/stage_apache_rc tika$type 1.16
https://dist.apache.org/repos/dist/dev/tika/; done
% Total% Received % Xferd Average Speed Ti
1 - 100 of 222 matches
Mail list logo