Re: [VOTE] Apache Tika 1.2 release rc #1

2012-07-11 Thread Alex Ott
66B, Mailstop: 171-246 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > +++++

Re: [VOTE] Apache Tika 1.2 release rc #1

2012-07-11 Thread Alex Ott
looks ok for me, +1 On Wed, Jul 11, 2012 at 12:05 PM, Alex Ott wrote: > downloaded sources, compiled, tests passed, tested on several files, > works ok. OS: Debian Testing, JVM Sun 1.7.0.05 > > On Tue, Jul 10, 2012 at 10:29 PM, Mattmann, Chris A (388J) > wrote: >> Hi Fol

toString in BoilerpipeContentHandler

2014-07-02 Thread Alex Ott
ame way as BodyContentHandler -> call toString to get the extracted text Something like: @Override public String toString() { return delegate.toString(); }; P.S. I can create JIRA & provide a patch if necessary -- With best wishes, Alex Ott http://ale

Re: Starting Advice

2014-08-06 Thread Alex Ott
like to > learn more. If anyone is willing to provide reccomendations for resources > or detail their experiences in learning Tika, I would be most grateful. > > Thanks, > Roger > -- With best wishes,Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian) Skype: alex.ott

Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages documents

2010-05-31 Thread Alex Ott
ile, which makes it much easier to determine the format. This is only method to distinguish these mime types - I use same approach in my media type detection library -- With best wishes, Alex Ott, MBA http://alexott.blogspot.com/ http://alexott.net http://alexott-ru.blogspot.com/

Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages documents

2010-05-31 Thread Alex Ott
??.pages/QuickLook/Thumbnail.jpg - --- 112681 6 files -- With best wishes, Alex Ott, MBA http://alexott.blogspot.com/ http://alexott.net http://alexott-ru.blogspot.com/

Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages documents

2010-05-31 Thread Alex Ott
code. I can add them to you -- With best wishes, Alex Ott, MBA http://alexott.blogspot.com/ http://alexott.net http://alexott-ru.blogspot.com/

Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages documents

2010-05-31 Thread Alex Ott
of links (or files) to description of different formats, supported by it? -- With best wishes, Alex Ott, MBA http://alexott.blogspot.com/http://alexott.net/ http://alexott-ru.blogspot.com/

Re: [jira] Updated: (TIKA-402) Support for Keynote and Pages documents

2010-06-01 Thread Alex Ott
s links to libraries, that implement support for concrete formats, not format themselves. I mean something like - http://msdn.microsoft.com/en-us/library/cc313118%28office.12%29.aspx -- for MS Office file formats, pointer to ODF spec, etc. -- With best wishes, Alex Ott, MBA http://alexott.blogspot.co

Re: Detecting container formats

2010-06-15 Thread Alex Ott
ful to force media type detection by magic only, not by extension (for example, file could be renamed)... -- With best wishes, Alex Ott, MBA http://alexott.blogspot.com/http://alexott.net/ http://alexott-ru.blogspot.com/ Skype: alex.ott

Re: Detecting container formats

2010-06-15 Thread Alex Ott
ature somewhere, and then use mix of getByte(offset) to check other values) For source code it's better to use something like naive bayes - it works well (as I remember from tests, that we made 6 years ago)... -- With best wishes, Alex Ott, MBA http://alexott.blogspot.com/http://alexott.net/ http://alexott-ru.blogspot.com/ Skype: alex.ott

Re: Detecting container formats

2010-06-16 Thread Alex Ott
Re Nick Burch at "Wed, 16 Jun 2010 12:01:48 +0100 (BST)" wrote: NB> On Tue, 15 Jun 2010, Alex Ott wrote: >> Hmmm, WordDocument stream in .doc could be only under / directory entry, >> but yes - it >> could anywhere in list of OLE2 entries... NB> And

Re: Packages and attributes

2010-07-12 Thread Alex Ott
plementations, something like: collector of metadata for all embedded objects, or collector only of top-level metadata, etc. This could allow to improve performance in some cases (imho), because in some task people could need only top-level metadata, etc. -- With best wishes, Ale

MS Lectures on office file formats

2010-11-12 Thread Alex Ott
-presentations.aspx -- With best wishes,                    Alex Ott, MBA http://alexott.net/ Tiwtter: alexott_en (English), alexott (Russian) Skype: alex.ott

Re: [VOTE] Apache Tika 0.9 Release Candidate #1

2011-02-15 Thread Alex Ott
word docs, but otherwise everything ran fine and the extracted MM> text looks good. -- With best wishes, Alex Ott, MBA http://alexott.blogspot.com/http://alexott.net/ http://alexott-ru.blogspot.com/ Skype: alex.ott

Re: [VOTE] Apache Tika 1.0 release rc #1

2011-11-04 Thread Alex Ott
ckage as Apache Tika 1.0 >     [ ] -1 Do not release this package because... > > Signatures, build, etc. OK. Thanks! > > BR, > > Jukka Zitting > -- With best wishes,                    Alex Ott http://alexott.net/ Tiwtter: alexott_en (English), alexott (Russian) Skype: alex.ott

review board?

2011-11-28 Thread Alex Ott
Hello I see, that some of Apache projects are using ReviewBoard (https://reviews.apache.org) to review incoming patches. Do Tika's developers plan to use it? It could be useful especially for patches, submitted by non-core developers -- With best wishes,                    Alex Ott

Re: Tesseract OCR engine

2011-12-01 Thread Alex Ott
>>>> >>>> >> >> >> ++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: chris.a.mattm...@nasa.gov >> WWW:   http://sunset.usc.edu/~mattmann/ >> ++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++ >> >> > > > > -- > > Sincerely, > Albert Law > Senior Software Engineer > Logik.com -- With best wishes,                    Alex Ott http://alexott.net/ Tiwtter: alexott_en (English), alexott (Russian) Skype: alex.ott

Re: [VOTE] Apache Tika 1.1 release rc #1

2012-03-08 Thread Alex Ott
t; ++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++ > -- With best wishes,                    Alex Ott http://alexott.net/ Tiwtter: alexott_en (English), alexott (Russian) Skype: alex.ott

[jira] [Commented] (TIKA-948) Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc

2012-07-06 Thread Alex Ott (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408180#comment-13408180 ] Alex Ott commented on TIKA-948: --- Maybe you also reuse information from prop stream nearb

[jira] [Comment Edited] (TIKA-948) Embedded PDF extracted incorrectly as MS Works file from Word 97-2003 doc

2012-07-06 Thread Alex Ott (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408180#comment-13408180 ] Alex Ott edited comment on TIKA-948 at 7/6/12 5:49 PM: --- Maybe

[jira] Created: (TIKA-441) Sometimes, tika not working (crashed) because of null classloader

2010-06-15 Thread Alex Ott (JIRA)
Components: general Environment: MS Windows with tika running under Apache Commons Daemon (procrun) Reporter: Alex Ott Priority: Minor Fix For: 0.8 Attachments: classloader-fix.diff I used Tika inside application, that should run as MS Windows

[jira] Updated: (TIKA-441) Sometimes, tika not working (crashed) because of null classloader

2010-06-15 Thread Alex Ott (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Ott updated TIKA-441: -- Attachment: classloader-fix.diff proposed patch to fix this issue > Sometimes, tika not working (cras

[jira] Commented: (TIKA-441) Sometimes, tika not working (crashed) because of null classloader

2010-06-16 Thread Alex Ott (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879463#action_12879463 ] Alex Ott commented on TIKA-441: --- Thank you. I understand, that my patch isn't pe

[jira] Commented: (TIKA-447) Container aware mimetype detection

2010-08-02 Thread Alex Ott (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894501#action_12894501 ] Alex Ott commented on TIKA-447: --- 2Nick: does this will allow to implement support for

[jira] Commented: (TIKA-447) Container aware mimetype detection

2010-08-02 Thread Alex Ott (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894507#action_12894507 ] Alex Ott commented on TIKA-447: --- It's better to have some flag, that will say "

[jira] Commented: (TIKA-447) Container aware mimetype detection

2010-08-02 Thread Alex Ott (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894511#action_12894511 ] Alex Ott commented on TIKA-447: --- Ah, sorry Nick - I hadn't looked into code yet.

[jira] [Created] (TIKA-2964) Upgrade Jackson Databind dependency to 2.9.10.1 or 2.10.0 to fix latest CVEs

2019-10-13 Thread Alex Ott (Jira)
Alex Ott created TIKA-2964: -- Summary: Upgrade Jackson Databind dependency to 2.9.10.1 or 2.10.0 to fix latest CVEs Key: TIKA-2964 URL: https://issues.apache.org/jira/browse/TIKA-2964 Project: Tika

[jira] [Commented] (TIKA-2960) Detected 1 vulnerable components: [ERROR] com.fasterxml.jackson.core:jackson-databind:jar:2.9.8

2019-10-13 Thread Alex Ott (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950248#comment-16950248 ] Alex Ott commented on TIKA-2960: the changes are already in master > Det

[jira] [Commented] (TIKA-697) Tika reports the content type of AR archives as "text/plain"

2011-11-07 Thread Alex Ott (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145308#comment-13145308 ] Alex Ott commented on TIKA-697: --- I think, that following magic in tika-mimetypes.xml wil

[jira] [Updated] (TIKA-697) Tika reports the content type of AR archives as "text/plain"

2011-11-07 Thread Alex Ott (Updated) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Ott updated TIKA-697: -- Attachment: tika-697.diff This patch adds signature for Unix Archive files (.a) I think, that signature for

[jira] [Commented] (TIKA-697) Tika reports the content type of AR archives as "text/plain"

2011-11-07 Thread Alex Ott (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145496#comment-13145496 ] Alex Ott commented on TIKA-697: --- No problem, just add: after ... But I really never

[jira] [Commented] (TIKA-789) Microsoft Project (MPP) basic support

2011-11-25 Thread Alex Ott (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157175#comment-13157175 ] Alex Ott commented on TIKA-789: --- Detection should be pretty straightforward. MS Projec

[jira] [Commented] (TIKA-806) MS Word Detection magics are a bit overzealous

2011-12-09 Thread Alex Ott (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166262#comment-13166262 ] Alex Ott commented on TIKA-806: --- The only reliable method to determine .doc/.xls/

[jira] [Commented] (TIKA-823) Detect StarOffice files

2011-12-21 Thread Alex Ott (Commented) (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173940#comment-13173940 ] Alex Ott commented on TIKA-823: --- for .sdw and .sdc you can just look onto names of stream