[jira] [Commented] (TIKA-2434) Language detection slow, cpu intensive, CLI interrupts work

2017-08-18 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133468#comment-16133468 ] Mattmann, Chris A (388J) commented on TIKA-2434: Hi Everyone, I will be on paternity

[jira] [Updated] (TIKA-1804) Tika use no free json.org

2017-05-30 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mattmann, Chris A (388J) updated TIKA-1804: --- Hi Everyone, I will be out of the office 5/29 – 6/6 on Vacation. During

[jira] [Commented] (TIKA-1885) Tika MIME updates for *.cdf and *.xar and custom zero length file detector based on TREC-DD-Polar

2016-05-02 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268058#comment-15268058 ] Mattmann, Chris A (388J) commented on TIKA-1885: Hello, I am on vacation and will return

[jira] [Commented] (TIKA-1696) Language Identification with Text Processing Toolkit from MITLL

2015-07-23 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639531#comment-14639531 ] Mattmann, Chris A (388J) commented on TIKA-1696: It's fine to discuss

[jira] [Commented] (TIKA-1619) SHA1 and MD5 verification hashes for v1.8 still show old v1.7 hashes

2015-04-29 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520536#comment-14520536 ] Mattmann, Chris A (388J) commented on TIKA-1619: Hey Rishi yes it's under

[jira] [Commented] (TIKA-605) Tika GDAL parser

2014-10-11 Thread Mattmann, Chris A (388J) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168238#comment-14168238 ] Mattmann, Chris A (388J) commented on TIKA-605: --- Great +1 please update in SVN

Google Summer of Code: GDAL parser

2013-03-23 Thread Mattmann, Chris A (388J)
Hey Guys, I tagged TIKA-605 GDAL parser [1] as a Google Summer of Code 2013 project. I'm available to help mentor. I'm copying the SIS list on this since people like Adam Estrada (SIS, VP), and Joe White have offered to help mentor as well. Note: Uli is going to send the Google Summer of Code

FW: GSoC 2013

2013-03-18 Thread Mattmann, Chris A (388J)
[Apologies for cross post] Guys, to play in the GSoC 2013 spec, we just need to tag issues in JIRA with the gsoc2013 tag. I'll try and come up with few projects soon :) Cheers, Chris On 3/15/13 11:15 AM, Luciano Resende luckbr1...@gmail.com wrote: On Fri, Mar 15, 2013 at 11:01 AM, Manish

FW: [Tika Wiki] Update of RecursiveMetadata by domtheo

2013-03-06 Thread Mattmann, Chris A (388J)
Guys I reverted this spammer but don't know how to block him. Help? Cheers, Chris On 3/6/13 7:12 PM, Apache Wiki wikidi...@apache.org wrote: Dear Wiki user, You have subscribed to a wiki page or wiki category on Tika Wiki for change notification. The RecursiveMetadata page has been changed by

Re: [DISCUSS] Should Tika require Java6? (was Re: Build failed in Jenkins: Tika-trunk #977)

2013-02-12 Thread Mattmann, Chris A (388J)
, Feb 8, 2013 at 6:54 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Guys, Just to summarize, the question on the table is whether or not Tika should require Java6. We had some discussions on this previously (if I get time, will dig up the threads -- ok found time

Re: Build failed in Jenkins: Tika-trunk #980

2013-02-12 Thread Mattmann, Chris A (388J)
Thanks Mike! On 2/12/13 10:14 AM, Michael McCandless luc...@mikemccandless.com wrote: Hmm, that didn't work. It looks like we have to fix our JAVA_HOME to point to a 1.6+ java: http://stackoverflow.com/questions/11328677/error-when-using-javac-javac-i nvalid-flag-s OK I managed to log in to

Re: Jenkins build is back to normal : Tika-trunk #981

2013-02-12 Thread Mattmann, Chris A (388J)
Yay! On 2/12/13 12:01 PM, Michael McCandless luc...@mikemccandless.com wrote: Yay, Java 1.6 :) Mike McCandless http://blog.mikemccandless.com On Tue, Feb 12, 2013 at 2:59 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: See https://builds.apache.org/job/Tika-trunk/981/

FW: [GSoC Mentors] Google Summer of Code 2013

2013-02-11 Thread Mattmann, Chris A (388J)
[Sorry for cross posting] Guys, FYI please note that you can participate as a mentor from a PMC via Apache as they are a GSoC org. ComDev will coordinate our participation but start thinking about what projects we may want to do. Cheers, Chris From: Carol Smith

Re: svn commit: r1443963 - in /tika/trunk/tika-server/src/main/java/org/apache/tika/server: CSVMessageBodyWriter.java JSONMessageBodyWriter.java

2013-02-08 Thread Mattmann, Chris A (388J)
Thanks Mike! On 2/8/13 3:54 AM, mikemcc...@apache.org mikemcc...@apache.org wrote: Author: mikemccand Date: Fri Feb 8 11:54:26 2013 New Revision: 1443963 URL: http://svn.apache.org/r1443963 Log: comment out @Overrides Modified:

Re: Build failed in Jenkins: Tika-trunk #977

2013-02-07 Thread Mattmann, Chris A (388J)
Hey Mike, Weird. I did notice in the patch for: https://issues.apache.org/jira/browse/TIKA-1047 That there were some JDK7 stuff -- I went ahead and fixed it to be JDK6 compat and updated the patch and committed that version as I noted in the issue comments. I wonder if there was something I

Re: Crawler-Commons 0.2 released

2013-02-03 Thread Mattmann, Chris A (388J)
Thanks Ken! Cheers, Chris On 2/3/13 7:56 AM, Ken Krugler kkrugler_li...@transpac.com wrote: Hi Chris, On Feb 2, 2013, at 7:34pm, Mattmann, Chris A (388J) wrote: Awesome thanks Ken. Any pointers to the release? Sorry, should have included those detailsŠ - Project is at: https

Re: Crawler-Commons 0.2 released

2013-02-02 Thread Mattmann, Chris A (388J)
Awesome thanks Ken. Any pointers to the release? Cheers, Chris On 2/2/13 7:08 PM, Ken Krugler kkrugler_li...@transpac.com wrote: Just a heads-up that we released version 0.2. This might be of interest to the Tika community, since it contains parsers for both robots.txt and sitemaps. -- Ken

Re: buildbot failure in ASF Buildbot on tika-trunk

2013-01-27 Thread Mattmann, Chris A (388J)
The latest SVN commit in r1439145 fixes this. Cheers, Chris On 1/27/13 11:19 AM, build...@apache.org build...@apache.org wrote: The Buildbot has detected a new failure on builder tika-trunk while building ASF Buildbot. Full details are available at:

Re: [VOTE] Apache Tika 1.3 Release Candidate #1

2013-01-20 Thread Mattmann, Chris A (388J)
Hey Dave, On 1/18/13 8:30 PM, Dave Meikle loo...@gmail.com wrote: Hi Guys, A candidate for the Tika 1.3 release is available at: http://people.apache.org/~dmeikle/apache-tika-1.3-rc1/ The release candidate is a zip archive of the sources in:

Re: [VOTE] Apache Tika 1.3 Release Candidate #1

2013-01-20 Thread Mattmann, Chris A (388J)
Hey Mike, I found the same thing -- scope the KEYS file here in case you need it: curl -O http://www.apache.org/dist/tika/KEYS gpg --import KEYS Cheers, Chris On 1/20/13 3:35 AM, Michael McCandless luc...@mikemccandless.com wrote: +1, but I think you need to add the KEYS file? Tests passed

Re: [VOTE] Apache Tika 1.3 Release Candidate #1

2013-01-20 Thread Mattmann, Chris A (388J)
push the release out, we include KEYS. Mike McCandless http://blog.mikemccandless.com On Sun, Jan 20, 2013 at 3:52 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Mike, I found the same thing -- scope the KEYS file here in case you need it: curl -O http://www.apache.org

Re: KEYS file and dist.apache.org (Re: [VOTE] Apache Tika 1.3 Release Candidate #1)

2013-01-20 Thread Mattmann, Chris A (388J)
Thanks Jukka for the FYI... Cheers, Chris On 1/20/13 10:11 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: Hi, On Sun, Jan 20, 2013 at 11:24 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: +1 to that -- Dave feel free to simply copy the one out of dist into the RC dir

Re: [DISCUSS] Release Candidate for 1.3?

2013-01-17 Thread Mattmann, Chris A (388J)
Hey Jukka, I'll roll an RC #1 for 1.3 by the week-end if that works for everyone. Dave, I know you mentioned you wanted to give it a go. If you do that's fine too. Just saying I have time to do it if you'd like. To start, I've created a 1.4 version in JIRA and moved all unresolved 1.3s to 1.4.

Re: [DISCUSS] Release Candidate for 1.3?

2013-01-17 Thread Mattmann, Chris A (388J)
Hey Dave, No worries! There is more value in getting more people doing this. So all yours this weekend! :) If you need any help let me know. Cheers, Chris On 1/17/13 7:40 AM, Dave Meikle loo...@gmail.com wrote: Hi Chris, On 17 Jan 2013, at 15:31, Mattmann, Chris A (388J) chris.a.mattm

Re: [DISCUSS] Release Candidate for 1.3?

2013-01-08 Thread Mattmann, Chris A (388J)
Agreed +1 from me. Dave, I think it would be great for you to rock the release too. Any help I can provide I'd be happy to! Here's the last Tika 1.2 release ANNOUNCE email for pointers: http://s.apache.org/UEA Cheers, Chris On 1/8/13 5:21 PM, Michael McCandless luc...@mikemccandless.com

Re: [jira] [Updated] (TIKA-1048) XMLParser should add whitespace between elements

2012-12-20 Thread Mattmann, Chris A (388J)
+1... Cheers, Chris On 12/20/12 4:23 AM, Michael McCandless luc...@mikemccandless.com wrote: Hi Oleg, UIMA could be useful for extracting text from XML (I'm not familiar enough with it...), but I think we should still fix Tika's own XML extraction. Mike McCandless

Re: Contribution of parser for FITS file format to Apache Tika

2012-12-06 Thread Mattmann, Chris A (388J)
Hey Rahul, This is great and I'm totally willing to work with you to shepherd this in. The first step would be to create a JIRA issue for your parser, and then to submit a patch to incorporate it into the tika-parsers module. Of course, you can start with changing the namespace to org.apache.*

Re: MimeTypes.java final?

2012-10-29 Thread Mattmann, Chris A (388J)
Hi Ryan, I think #1 has been suggested before, in a thread called Appending MIME Types: http://s.apache.org/TVe As for #2, I think that's the type of information we're trying to hide through the class interface. I like the adding more URL information and URI stuff to the MIME registry though

Re: MimeTypes.java final?

2012-10-29 Thread Mattmann, Chris A (388J)
Thanks Ryan you the man. Appreciate it. I will take a look at the issues and try to help shepherd them in! Cheers, Chris On Oct 29, 2012, at 6:52 PM, Ryan McKinley wrote: On Mon, Oct 29, 2012 at 2:03 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi Ryan, I think #1 has

Re: Standard practice with @author in comments

2012-08-30 Thread Mattmann, Chris A (388J)
Hey Ken, I personally don't care too much about having @author tags, or not having them, but I know there are others more passionate (for example about NOT having them) :) Cheers, Chris On Aug 30, 2012, at 2:03 PM, Ken Krugler wrote: Hi all, I'm wondering if we've got any convention for

Welcome to our new Tika PMC chair!

2012-08-19 Thread Mattmann, Chris A (388J)
Hey Folks, I decided to step down as chair of the Apache Tika PMC. We have a new chair, who graciously volunteered to step up and handle the chair duties, Dave Meikle. Dave's nomination was recently confirmed at the last Apache board meeting, on recommendation from the Tika PMC. Dave, welcome!

[VOTE] Graduate Apache Any23 from the Apache Incubator

2012-08-03 Thread Mattmann, Chris A (388J)
Hi Folks, Based on prior positive discussions: http://s.apache.org/W1C http://s.apache.org/dw4 http://s.apache.org/xN I'm now going to call for a community VOTE (before heading to the Incubator to make it official) for Any23 to graduate from the Incubator. VOTEs are open to Any23 and Tika

[DISCUSS] Any23 Graduation to TLP

2012-07-26 Thread Mattmann, Chris A (388J)
Hey Tika PMC'ers, The Any23 podling is preparing to hold a graduation VOTE. The community feels that it would be best to graduate to a TLP. We've made a release, added new committers, communicated on list and in the spirit of the Apache way. Since the Tika PMC agreed to sponsor the Any23

[DISCUSS] Including tika-server WAR in 1.3 artifacts?

2012-07-20 Thread Mattmann, Chris A (388J)
Hey Guys, Now that we have tika-server, etc., I was thinking of including it like we do tika-app as a release artifact in 1.3-on. That sound OK? Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion

[DISCUSS] Tika Hardener?

2012-07-19 Thread Mattmann, Chris A (388J)
Hey Jerome, I noticed on TIKA-815 that you mentioned you had a Tika hardener -- would you be willing to contribute that upstream to the Tika project? We appreciate your contributions to date and were just wondering? Thanks! Cheers, Chris

Fwd: Call for Papers for ApacheCon Europe 2012 now open!

2012-07-19 Thread Mattmann, Chris A (388J)
FYI... Begin forwarded message: From: Nick Burch nick.bu...@alfresco.com Date: July 19, 2012 1:14:57 PM CDT To: committ...@apache.org Subject: Call for Papers for ApacheCon Europe 2012 now open! Reply-To: apachecon-disc...@apache.org Hi All We're pleased to announce that the Call for

Re: Can't build javadocs for 1.2 API site docs

2012-07-17 Thread Mattmann, Chris A (388J)
-Original Message- From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Dienstag, 17. Juli 2012 07:39 To: dev@tika.apache.org Subject: Can't build javadocs for 1.2 API site docs Hey Guys, When I run mvn javadoc:aggregate which normally works fine and builds

Re: Can't build javadocs for 1.2 API site docs

2012-07-17 Thread Mattmann, Chris A (388J)
, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Dienstag, 17. Juli 2012 07:39 To: dev@tika.apache.org Subject: Can't build javadocs for 1.2 API site docs Hey Guys, When I run mvn javadoc:aggregate which normally works fine and builds the API docs for the website for me

[RESULT] [VOTE] Apache Tika 1.2 release rc #1

2012-07-16 Thread Mattmann, Chris A (388J)
Hi Everyone, This VOTE has passed with the following tallies: +1 Chris Mattmann* Alex Ott Mike McCandless* Zabrane Mickael Joerg Ehrlich Dave Meikle* Jukka Zitting* Oleg Tikhonov* Ken Krugler* I'll push the bits out and announce the release. Thanks to all who VOTEd! Cheers, Chris * -

[ANNOUNCE] Apache Tika 1.2 released

2012-07-16 Thread Mattmann, Chris A (388J)
(...apologies for the cross posting...) The Apache Tika project is pleased to announce the release of Apache Tika 1.2. The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon as the mirrors get the syncs.

Can't build javadocs for 1.2 API site docs

2012-07-16 Thread Mattmann, Chris A (388J)
Hey Guys, When I run mvn javadoc:aggregate which normally works fine and builds the API docs for the website for me to push up to the site publish directory, in 1.2 I now get an error: /Users/mattmann/tmp/tika1.2/tika-xmp/src/main/java/org/apache/tika/xmp/XMPMetadata.java:75: warning - Tag

Re: [VOTE] Apache Tika 1.2 release rc #1

2012-07-12 Thread Mattmann, Chris A (388J)
Hey Jukka, On Jul 11, 2012, at 4:48 PM, Jukka Zitting wrote: Hi, On Wed, Jul 11, 2012 at 4:27 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: On Jul 11, 2012, at 6:43 AM, Michael McCandless wrote: Why are there original-tika-app* files in the RC directory? Good

Re: [VOTE] Apache Tika 1.2 release rc #1

2012-07-11 Thread Mattmann, Chris A (388J)
Thanks Mike! On Jul 11, 2012, at 6:43 AM, Michael McCandless wrote: +1 I smoke tested, extracting text for the Lucene in Action PDF (looked good), and verified TIKA-948 is fixed. Why are there original-tika-app* files in the RC directory? Good question: this is the first time I've seen

[VOTE] Apache Tika 1.2 release rc #1

2012-07-10 Thread Mattmann, Chris A (388J)
Hi Folks, A candidate for the Tika 1.2 release is available at: http://people.apache.org/~mattmann/apache-tika-1.2/rc1/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/1.2/ The SHA1 checksum of the archive is

Re: svn commit: r1355877 - in /tika/trunk: ./ tika-dll/ tika-dll/src/ tika-dll/src/main/ tika-dll/src/main/csharp/ tika-dll/src/main/csharp/Apache/

2012-07-01 Thread Mattmann, Chris A (388J)
WOW nice Jukka, you did it! Cheers, Chris On Jul 1, 2012, at 6:04 AM, ju...@apache.org ju...@apache.org wrote: Author: jukka Date: Sun Jul 1 13:04:00 2012 New Revision: 1355877 URL: http://svn.apache.org/viewvc?rev=1355877view=rev Log: TIKA-773: .NET version of Tika Add a basic

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Mattmann, Chris A (388J)
Hey Jukka, On Jul 1, 2012, at 12:01 PM, Jukka Zitting wrote: Hi, On Sun, Jul 1, 2012 at 6:27 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: On Jul 1, 2012, at 5:09 AM, Jukka Zitting wrote: Sergey Beryozkin (who I'm CC'ing on this email since I'm not sure he's

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Mattmann, Chris A (388J)
Hey Nick, On Jul 1, 2012, at 2:52 PM, Nick Burch wrote: On Sun, 1 Jul 2012, Mattmann, Chris A (388J) wrote: I also plan to spin a 1.2 release candidate at some point in the next week or so. I realize the metadata stuff isn't done yet, but it's better to simply release early and often

Re: JAX-RS overhead in tika-server

2012-07-01 Thread Mattmann, Chris A (388J)
: On Sun, 1 Jul 2012, Mattmann, Chris A (388J) wrote: It can be a big pain if an in-progress API is suddenly effectively frozen by the need to be compatible into the future... Agreed -- so, what do you think? How much longer do we need to wrap up the API changes or whatever going

Re: svn commit: r1343137 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/pkg/PackageExtractor.java test/java/org/apache/tika/parser/pkg/ArParserTest.java

2012-05-27 Thread Mattmann, Chris A (388J)
s/Josh/John/ Sorry John! Cheers, Chris On May 27, 2012, at 9:18 PM, mattm...@apache.org mattm...@apache.org wrote: Author: mattmann Date: Mon May 28 04:18:21 2012 New Revision: 1343137 URL: http://svn.apache.org/viewvc?rev=1343137view=rev Log: - fix for TIKA-935 TikaException thrown

Re: A plan to improve the metadata property definitions

2012-05-16 Thread Mattmann, Chris A (388J)
Thanks Nick, +1. I'll try and follow and see if I can help in places. Cheers, Chris On May 16, 2012, at 5:50 AM, Nick Burch wrote: Hi All I've just been brainstorming with Ray Gauss, and we think we've come up with a way to move towards cleaner and clearer metadata property definitions

Re: [metadata] Input on reorganization of Metadata interfaces

2012-05-08 Thread Mattmann, Chris A (388J)
Hi Jörg, On May 8, 2012, at 5:39 AM, Joerg Ehrlich wrote: Hi Chris, I'm OK with the code-level implications of that, but I will just have to scope out the patch and so forth. Thanks for pushing this. I really appreciate your help here. Sorry, I am not a native speaker: Does that you

Re: [metadata] Input on reorganization of Metadata interfaces

2012-05-04 Thread Mattmann, Chris A (388J)
Hi Jörg, On May 4, 2012, at 6:43 AM, Joerg Ehrlich wrote: Hi, I wanted to start submitting patches for the following and would like your input on that: Create one Core Properties interface for the Metadata class which contains just the keys for the properties which should be directly

Re: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Mattmann, Chris A (388J)
Hi Jörg, Thanks for your email, comments below: On Apr 26, 2012, at 3:35 AM, Joerg Ehrlich wrote: Hi Chris, Those are all valid points and I agree that you could do everything with a Hashmap. Having the parsers fill the Metadata class and its Hashmap with all needed information which

Re: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Mattmann, Chris A (388J)
structured object. That approach should be able to maintain backwards compatibility for existing implementations and allow for structured and namespaced metadata. Just a thought, Ray On Apr 26, 2012, at 11:37 AM, Mattmann, Chris A (388J) wrote: Hi Jörg, Thanks for your email, comments

Re: [metadata] roadmap proposal available on the wiki

2012-04-26 Thread Mattmann, Chris A (388J)
, at 2:30 PM, Antoni Mylka wrote: 2012/04/26 Mattmann, Chris A (388J) napisał/wrote: Hi Guys, One comment RE: the below too -- this is precisely where I see Any23 coming into play and why there is a strong relationship between it and Tika: http://incubator.apache.org/any23/ I'm

Re: [metadata] roadmap proposal available on the wiki

2012-04-25 Thread Mattmann, Chris A (388J)
Hi Jörg, First off, thanks for taking the time to put your thoughts down on the Wiki. I will try to leverage that for helping push these ideas forward. I am +1 on most of the things you proposed. Regarding: {quote} Use XMP instead of Hashmap in Metadata class The idea is to have just one

Git Pull question?

2012-04-25 Thread Mattmann, Chris A (388J)
Hey Guys, I saw a Git pull request come through the other day and followed it to: http://s.apache.org/l9t I commented there asking Kyle if he would be interested in joining our dev list and telling him I'd be happy to figure out how to get his patch in from there. I know Jukka has been working

Re: [metadata] roadmap proposal available on the wiki

2012-04-25 Thread Mattmann, Chris A (388J)
Hi Jörg, On Apr 25, 2012, at 10:27 AM, Joerg Ehrlich wrote: I am not strongly supportive of of changing the HashMap internal representation in Metadata out. A couple of things I like about the HashMap: * It's simple. * It doesn't require dependency on any external libraries and helps

Re: Server component in Jira

2012-04-24 Thread Mattmann, Chris A (388J)
Done! Cheers, Chris On Apr 24, 2012, at 4:00 PM, Ingo Renner wrote: Hi all, could we add server as a component in Jira? thanks Ingo -- Ingo Renner TYPO3 Core Developer, Release Manager TYPO3 4.2, Admin Google Summer of Code TYPO3 Open Source Enterprise Content Management

Re: Pluggable language detection

2012-04-08 Thread Mattmann, Chris A (388J)
Hi Jan, It probably makes sense to provide pluggable language detection in Tika, since it's the lower level library, so I am +1 for figuring out a solution to implement it in Tika ville. If no one has started on this in the next few weeks I'll give it a go. Cheers, Chris On Apr 8, 2012, at

Re: Metadata situation and XMP support in Tika

2012-04-05 Thread Mattmann, Chris A (388J)
Hi Jörg, Great summary! I would be in favor of option #2 as well, with the caveat that if we take it slow, I think there might be a way to not really have as much of a client/API impact, using deprecations and other techniques as you suggested. Looking forward to your participation! Cheers,

Re: PUT vs. POST in tika-server

2012-04-05 Thread Mattmann, Chris A (388J)
Hi Guys, Yeah, I am happy to annotate the code with @POST too like Max suggested. I opened https://issues.apache.org/jira/browse/TIKA-891 to track this. Thanks! Cheers, Chris On Apr 5, 2012, at 1:29 AM, Jukka Zitting wrote: Hi, I notice the tika-server component (nice work documenting

Re: Build failed in Jenkins: Tika-trunk #820

2012-03-26 Thread Mattmann, Chris A (388J)
Hi Max, I will hopefully have a patch in the next day or so that migrates us to CXF with little to no changes (except for the Server and test components of tika-server, as you mentioned). I think this will help out in this regard. Cheers, Chris On Mar 26, 2012, at 9:26 AM, Maxim Valyanskiy

[RESULT] [VOTE] Apache Tika 1.1 release rc #1

2012-03-23 Thread Mattmann, Chris A (388J)
Hi Everyone, OK, this VOTE has passed with the following tallies: +1 PMC Chris Mattmann Ken Krugler Markus Jelsma Jukka Zitting Mike McCandless Dave Meikle +1 Community Zabrane Mickael Alex Ott Sorry took me a while to tally! :) I'll now push the dists out, and then push to Maven Central and

[VOTE] Apache Tika 1.1 release rc #1

2012-03-07 Thread Mattmann, Chris A (388J)
Hi Folks, A candidate for the Tika 1.1 release is available at: http://people.apache.org/~mattmann/apache-tika-1.1/rc1/ The release candidate is a zip archive of the sources in: http://svn.apache.org/repos/asf/tika/tags/1.1/ The SHA1 checksum of the archive is

Re: [VOTE] Apache Tika 1.1 release rc #1

2012-03-07 Thread Mattmann, Chris A (388J)
Hey Ken, Sorry about that! Forgot to include the link to the staged Maven2 repo, here: https://repository.apache.org/content/repositories/orgapachetika-066/ There ya go. Cheers, Chris On Mar 7, 2012, at 4:36 PM, Ken Krugler wrote: Hi Chris, On Mar 7, 2012, at 1:35pm, Mattmann, Chris

Fwd: Google Summer of Code 2012 upcoming

2012-03-04 Thread Mattmann, Chris A (388J)
Guys, FYI...in case anyone is thinking of GSoC, deadlines are approaching. Process is described below... Thanks! Cheers, Chris Begin forwarded message: From: Ulrich Stärk u...@apache.org Date: March 4, 2012 9:01:07 AM PST To: p...@apache.org p...@apache.org Cc: d...@community.apache.org

Re: Tika 1.1 release

2012-03-01 Thread Mattmann, Chris A (388J)
Guys, +1 here. I'll create a 1.1 RC this weekend if no one beats me to it. Thanks! Cheers, Chris On Mar 1, 2012, at 11:51 AM, Jukka Zitting wrote: Hi, On Thu, Mar 1, 2012 at 7:01 PM, Daniel Malmer daniel.mal...@markit.com wrote: First, thanks for all the hard work you've put into this

Re: Gdal Integration (TIKA 605)

2012-02-26 Thread Mattmann, Chris A (388J)
Hi Joe, Awesome! Thanks for picking this up and getting interested in this work. Right now, the only use cases we've had so far is to represent lats and lons (WGS84). It would be great to extract more information and come up with a policy for representing more WKTs and so forth. We should

Re: Gdal Integration (TIKA 605)

2012-02-26 Thread Mattmann, Chris A (388J)
, Mattmann, Chris A (388J) wrote: Hi Joe, Awesome! Thanks for picking this up and getting interested in this work. Right now, the only use cases we've had so far is to represent lats and lons (WGS84). It would be great to extract more information and come up with a policy for representing

Re: Gdal Integration (TIKA 605)

2012-02-26 Thread Mattmann, Chris A (388J)
geospatial imagery. Has there been any discussion about using Tika on any of the geospatial vector formats? I would think they would go hand in hand, and OGR recognizes many of them. Joe On Feb 26, 2012, at 1:10 PM, Mattmann, Chris A (388J) wrote: Hi Joe, Awesome! Thanks for picking

TF-IDF parser and ContentHandler?

2012-02-07 Thread Mattmann, Chris A (388J)
Hey Guys, I've been toying around with the idea of writing a simple Tika Parser Decorator that extends the Text Parser, but that generates TDF-IDF metadata maybe top word count (summarized) and frequencies/term map. I was also thinking of then writing a similar ContentHandler as well so it could

Fwd: [Announce] Google Summer of Code 2012

2012-02-05 Thread Mattmann, Chris A (388J)
Anyone interested in mentoring a GSoC student for Tika? Begin forwarded message: From: Luciano Resende luckbr1...@gmail.com Date: February 4, 2012 10:40:03 AM PST To: d...@community.apache.org d...@community.apache.org, code-awards code-awa...@apache.org Subject: Fwd: [Announce] Google

Fwd: [Announce] Google Summer of Code 2012

2012-02-05 Thread Mattmann, Chris A (388J)
FYI Begin forwarded message: From: Ross Gardler rgard...@opendirective.com Date: February 5, 2012 1:45:18 PM PST To: d...@community.apache.org d...@community.apache.org Subject: RE: [Announce] Google Summer of Code 2012 Reply-To: d...@community.apache.org d...@community.apache.org For

Re: % of different content types out there on the web

2012-01-31 Thread Mattmann, Chris A (388J)
, we also explicitly filter out all/most unwanted suffixes. We do have a lot of suffixes that we encountered so far. On Saturday 28 January 2012 03:01:26 Mattmann, Chris A (388J) wrote: (sorry for the cross post) Hey Guys, I'm trying to find a good citation or estimate (if anyone has

Re: [ANNOUNCEMENT][THANKS] Apache ODF Toolkit(Incubating) 0.5-incubating Release

2012-01-16 Thread Mattmann, Chris A (388J)
Congrats guys! Cheers, Chris On Jan 16, 2012, at 4:59 AM, Devin Han wrote: Hi all, Thanks all of the voters from this list. Now there is a result ;) The Apache ODF Toolkit(Incubating) team is pleased to announce the release of 0.5-incubating. This is our first Apache release. The

Re: Pushing parsers upstream

2011-12-13 Thread Mattmann, Chris A (388J)
Hey Jukka, For places like POI and PDFBox I think this could definitely work. And then for places where we have Parsers, but aren't ready to push upstream yet (I can think of two examples of this relevant to me, NetCDF/HDF and GDAL), we can just leave the Parser in tika-parsers I think. In

Re: Tesseract OCR engine

2011-11-30 Thread Mattmann, Chris A (388J)
fonts and languages. In addition, we also have to add an option for image preprocessing (skewing + filtering etc). BR, Oleg On Wed, Nov 30, 2011 at 8:59 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Guys, FYI: http://code.google.com/p/tesseract-ocr/ I

Re: Possible re-opening of resolved issue TIKA-738?

2011-11-26 Thread Mattmann, Chris A (388J)
On Sat, Nov 26, 2011 at 1:54 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Guys, Just an FYI my personal preference on things like this are to leave the original issue closed, open up a new issue and to link back to the original one. This is mainly from a release

Re: Multilingual Tika

2011-11-14 Thread Mattmann, Chris A (388J)
Hi Ingo, Great meeting you at ApacheCon and we'd love to have your PHP skillz on board! Contribute away :-) The best start would probably be to file an issue along the lines of TIKA-773 [1], and get a Tika wrapper for PHP going. Cheers, Chris [1]

Re: [VOTE] Apache Tika 1.0 release rc #1

2011-11-04 Thread Mattmann, Chris A (388J)
P.S. Here's my +1. Cheers, Chris On Nov 4, 2011, at 8:42 AM, Mattmann, Chris A (388J) wrote: Hi Folks, A candidate for the Tika 1.0 release is available at: http://people.apache.org/~mattmann/apache-tika-1.0/rc1/ The release candidate is a zip archive of the sources in: http

Re: Tika 1.0 RC?

2011-11-01 Thread Mattmann, Chris A (388J)
I'll get going on the RC then thanks guys! Cheers, Chris On Nov 1, 2011, at 11:32 AM, Jukka Zitting wrote: Hi, On Thu, Oct 27, 2011 at 6:42 PM, Jukka Zitting jukka.zitt...@gmail.com wrote: How about if we leave the trunk open still for the weekend, and cut the 1.0 release candidate at

Tika 1.0 RC?

2011-10-25 Thread Mattmann, Chris A (388J)
Hey Guys, I created a 1.1 version in JIRA and pushed all open (~13) issues for 1.0 to 1.1. We now have 32 issues resolved in the current 1.0. WDYT? Good enough for a 1.0 release? I'm happy to spin the RC tonight or in the next day (PDT). Any objections? Cheers, Chris

Re: Google's Compact Language Detector

2011-10-24 Thread Mattmann, Chris A (388J)
Hi Jerome, Nice to hear from you my friend! I haven't taken a look at Mike's blog post or the LD code, but it sounds interesting and worth a look. I'll check it out! Cheers, Chris On Oct 24, 2011, at 6:18 AM, Jérôme Charron wrote: Hi, I just find this blog post from Mike McCandless about

Re: Updating CHANGES.txt?

2011-10-19 Thread Mattmann, Chris A (388J)
Yah I agree with Jukka here, but don't worry too much Mike if you're verbose (or anyone else for that matter). The RM (aka moi ;) ) always can take a look at CHANGES.txt at the end of a release cycle (speaking of which, 1.0, ApacheCon, 1.0, ApacheCon ;) ) and distill the information to a

[RESULT] [VOTE] Add Any23 to the Apache Incubator

2011-10-01 Thread Mattmann, Chris A (388J)
Hi Everyone, This VOTE has passed with the following tallies: +1 IPMC (binding) Chris Mattmann Christian Grobmeier Tommaso Teofili Julien Nioche Jukka Zitting Bertrand Delacreatz Nick Kew Olivier Lamy Paul Ramirez +1 Community Lewis John McGibbney Raffaele P. Guidi Andy Seaborne Michele

[HEADS UP] Added Tika ApacheCon NA 2011 news item

2011-10-01 Thread Mattmann, Chris A (388J)
Hey Guys, I updated the TIka website to mention the ApacheCon NA 2011 talk I'm giving. Just a heads up, thanks! Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA

[ANNOUNCE] Apache Tika 0.10 released

2011-09-30 Thread Mattmann, Chris A (388J)
(...apologies for the cross posting...) The Apache Tika project is pleased to announce the release of Apache Tika 0.10 The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon as the mirrors get the syncs.

Re: [ANNOUNCE] Apache Tika 0.10 released

2011-09-30 Thread Mattmann, Chris A (388J)
, Chris A (388J) wrote: (...apologies for the cross posting...) The Apache Tika project is pleased to announce the release of Apache Tika 0.10 The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon

Re: [PROPOSAL] Any23 to join the incubator

2011-09-26 Thread Mattmann, Chris A (388J)
Hi All, OK, since the chatter about this proposal has died down and since I've agreed to champion it, I'll call a formal VOTE tomorrow afternoon and let it run through the rest of the week. The Tika PMC has not registered any objections to sponsoring the proposal, so I will go ahead and update

Re: [VOTE] Apache Tika 0.10 release rc #1

2011-09-26 Thread Mattmann, Chris A (388J)
file and then respinning a new RC? I have no problem doing it, and I can use it as an opportunity to address the other small nits brought up. Cheers, Chris On Sep 26, 2011, at 5:41 AM, Nick Burch wrote: On Sun, 25 Sep 2011, Mattmann, Chris A (388J) wrote: A first release candidate

Re: [VOTE] Apache Tika 0.10 release rc #1

2011-09-26 Thread Mattmann, Chris A (388J)
there. Thanks guys. Cheers, Chris On Sep 26, 2011, at 8:35 AM, Jukka Zitting wrote: Hi, [x] +1 Release this package as Apache Tika 0.10 [ ] -1 Do not release this package because... On Mon, Sep 26, 2011 at 3:37 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Would

Re: Support for Open Graph meta tags

2011-09-23 Thread Mattmann, Chris A (388J)
Hey Jukka, This sounds like a good approach. Cheers, Chris On Sep 23, 2011, at 3:24 AM, Jukka Zitting wrote: Hi, On Fri, Sep 23, 2011 at 2:23 AM, Ken Krugler kkrugler_li...@transpac.com wrote: The reason why is that Open Graph uses RDFa Instead of mapping the RDFa meta tags to Tika's

Re: Support for Open Graph meta tags

2011-09-22 Thread Mattmann, Chris A (388J)
Hey Ken, Super +1, this sounds like a great idea. Cheers, Chris On Sep 22, 2011, at 6:23 PM, Ken Krugler wrote: We were recently using Tika to process HTML pages that might have Open Graph meta tags. The issue is that these tags get stripped out, and also aren't put into the metadata

Re: Release date of tika 1.0 or 0.10

2011-09-21 Thread Mattmann, Chris A (388J)
Hey Jukka, If everyone is cool with me doing it over the weekend, I'll bust it out, no worries. Thanks for getting the RC all prepped up and thanks to everyone for the hard work. Cheers, Chris On Sep 21, 2011, at 11:19 AM, Jukka Zitting wrote: Hi, On Wed, Sep 21, 2011 at 2:28 PM,

Re: 1.0 RC in next 2 weeks

2011-09-16 Thread Mattmann, Chris A (388J)
+1 Jukka, sounds great. Cheers, Chris On Sep 16, 2011, at 1:32 AM, Jukka Zitting wrote: Hi, On Fri, Sep 16, 2011 at 5:09 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: That said, I'm happy when the dev community of Tika is ready to cut a release, and will gladly RC

Re: 1.0 RC in next 2 weeks

2011-09-15 Thread Mattmann, Chris A (388J)
, 2011, at 3:32 PM, Kevin Clark wrote: In light of the recent file handle bug (via parseToString) woudl it be possible to get a point release in the meantime? On Thu, Sep 15, 2011 at 3:06 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hi there Jan, I was hoping to have

Re: svn commit: r1163970 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/extractor/ tika-core/src/main/java/org/apache/tika/io/ tika-core/src/main/java/org/apache/tika/parser/ tika-core/src/

2011-09-01 Thread Mattmann, Chris A (388J)
On Sep 1, 2011, at 8:08 AM, Michael McCandless wrote: OK thanks Jukka. We might want to mark APIs like TemporaryResources internal in the javadocs, ie, that we reseve the right to suddenly change them and they are just public so that the sub-packages in Tika can use them. In Lucene we

Re: Failed test: testBMP(org.apache.tika.parser.image.ImageParserTest)

2011-08-19 Thread Mattmann, Chris A (388J)
(NEON) 1685 38th Street Boulder, CO 80301 (720) 746-4855 saulenb...@neoninc.org On Tue, Aug 16, 2011 at 7:02 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: Hey Steve, Interesting -- can you confirm your JDK version, and your Maven version by typing: mvn

Re: Failed test: testBMP(org.apache.tika.parser.image.ImageParserTest)

2011-08-16 Thread Mattmann, Chris A (388J)
Hey Steve, Interesting -- can you confirm your JDK version, and your Maven version by typing: mvn --version java --version And then showing the output? Cheers, Chris On Aug 16, 2011, at 2:38 PM, Steve Aulenbach wrote: Hi, After updating to revision 1158448 and running a maven clean

  1   2   >