Re: 0.8 release: latest status

2010-11-09 Thread Jan Høydahl / Cominvent
No prob Chriß :) And thanks for including 537. Makes me wanna continue contributing! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 7. nov. 2010, at 01.01, Mattmann, Chris A (388J) wrote: > Hi Jan, > > Sorry I misspelled your name below. Apologies. In the rush to

[jira] Commented: (TIKA-545) While trying to extract meta data(Created date,Modified date) from .docx,.xlsx files it returns only current date.

2010-11-09 Thread samraj (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930108#action_12930108 ] samraj commented on TIKA-545: - Hi all, For your reference i am posting the current output i am

Re: [ANNOUNCE] Welcome Maxim Valyanskiy as Tika PMC/Committer

2010-11-09 Thread Maxim Valyanskiy
Hello! 08.11.2010 10:20, Mattmann, Chris A (388J) пишет: A while back the Tika PMC nominated Maxim Valyanskiy for Tika committership and PMC membership. The VOTE tallies in Tika PMC-ville have occurred and I'm happy to announce that Max is now Tika committer! Max, feel free to say a little bit

[jira] Resolved: (TIKA-510) Use POI API for text extraction from XSLF shape

2010-11-09 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-510. --- Resolution: Fixed Committed revision 1032924. > Use POI API for text extraction from XSLF sha

[jira] Commented: (TIKA-545) While trying to extract meta data(Created date,Modified date) from .docx,.xlsx files it returns only current date.

2010-11-09 Thread samraj (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930107#action_12930107 ] samraj commented on TIKA-545: - Thanks Nick, I am using tika 0.7 . In that the java code prints

[jira] Resolved: (TIKA-511) NPE when POI is configured to prefer event extractors

2010-11-09 Thread Maxim Valyanskiy (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Valyanskiy resolved TIKA-511. --- Resolution: Fixed Committed revision 1032925. > NPE when POI is configured to prefer event ex

[jira] Commented: (TIKA-545) While trying to extract meta data(Created date,Modified date) from .docx,.xlsx files it returns only current date.

2010-11-09 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930109#action_12930109 ] Nick Burch commented on TIKA-545: - Looks like it's a bug that's in 0.7 but has subsequently b

[jira] Commented: (TIKA-545) While trying to extract meta data(Created date,Modified date) from .docx,.xlsx files it returns only current date.

2010-11-09 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930104#action_12930104 ] Nick Burch commented on TIKA-545: - I've just tried your two files using tika-app, and it says

[jira] Commented: (TIKA-545) While trying to extract meta data(Created date,Modified date) from .docx,.xlsx files it returns only current date.

2010-11-09 Thread samraj (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930110#action_12930110 ] samraj commented on TIKA-545: - Thanks Nick, I will wait until the release of 0.8.. > While try

[jira] Updated: (TIKA-547) Can't extract PDF text

2010-11-09 Thread Igor Spasic (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Spasic updated TIKA-547: - Description: I have created a simply pdf by using Bullzip PDF printer (virtual Windows printer). Tika is n

[jira] Created: (TIKA-547) Can't extract PDF text

2010-11-09 Thread Igor Spasic (JIRA)
Can't extract PDF text -- Key: TIKA-547 URL: https://issues.apache.org/jira/browse/TIKA-547 Project: Tika Issue Type: Bug Components: parser Affects Versions: 0.7, 0.8 Reporter: Igor Spasic I h

[jira] Updated: (TIKA-547) Can't extract PDF text

2010-11-09 Thread Igor Spasic (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Spasic updated TIKA-547: - Attachment: test.pdf > Can't extract PDF text > -- > > Key: TIKA-547 >

[jira] Commented: (TIKA-547) Can't extract PDF text

2010-11-09 Thread Daan de Wit (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930128#action_12930128 ] Daan de Wit commented on TIKA-547: -- Can you try to extract the text using PDFBox (http://htt

[jira] Commented: (TIKA-547) Can't extract PDF text

2010-11-09 Thread Igor Spasic (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930138#action_12930138 ] Igor Spasic commented on TIKA-547: -- thanx, feel free to close this issue. > Can't extract P

[jira] Commented: (TIKA-547) Can't extract PDF text

2010-11-09 Thread Igor Spasic (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930132#action_12930132 ] Igor Spasic commented on TIKA-547: -- Tried, PDFBox returns some strange chars. Will fire issu

[jira] Commented: (TIKA-547) Can't extract PDF text

2010-11-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930136#action_12930136 ] Chris A. Mattmann commented on TIKA-547: Hi Igor: no, please don't download it yet. I

[jira] Resolved: (TIKA-547) Can't extract PDF text

2010-11-09 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved TIKA-547. Resolution: Won't Fix Assignee: Chris A. Mattmann - reporter will file issue in PDFBOX

[jira] Commented: (TIKA-461) RFC822 messages not parsed

2010-11-09 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930175#action_12930175 ] Nick Burch commented on TIKA-461: - Julien - Did you have any luck knocking up some tests to g

[jira] Commented: (TIKA-461) RFC822 messages not parsed

2010-11-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930180#action_12930180 ] Julien Nioche commented on TIKA-461: Nope. I was planning to refactor the parser first al

[jira] Commented: (TIKA-461) RFC822 messages not parsed

2010-11-09 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930182#action_12930182 ] Nick Burch commented on TIKA-461: - No worries, we'll look forward to the new patch and the te

XML parsing hang

2010-11-09 Thread Ken Krugler
Hi all, Just a heads-up that we tracked down a serious issue we were having, while parsing about 100M docs. A handful of these documents caused Tika's parsing to hang. We've got a FutureTask that we use to detect and (try to) terminate hung parses. But for some of these parse attempts, we

Re: 0.8 release: latest status

2010-11-09 Thread Mattmann, Chris A (388J)
Hey Guys, I'm just going to call the VOTE anyways right now. That way, if it passes, we'll just leave 0.8 on central as is. Stand by... Cheers, Chris On 11/9/10 4:47 AM, "Jan Høydahl / Cominvent" wrote: No prob Chriß :) And thanks for including 537. Makes me wanna continue contributing! -

[VOTE] Apache Tika 0.8 Release Candidate #1

2010-11-09 Thread Mattmann, Chris A (388J)
Hi Folks, I have posted a candidate for the Apache Tika 0.8 release. The source code is at: http://people.apache.org/~mattmann/apache-tika-0.8/rc1/ See the included CHANGES.txt file for details on release contents and latest changes. The release was made using the Maven2 release plugin, accordin