looking to contribute

2015-12-16 Thread Joey Hong
Hi Tika Developers, My name is Joey. I am a college freshmen with programming experience looking to get into the world of open-source. I was hoping to contribute to the Tika project, and was wondering if there were any tasks that a beginner like me could tackle. I am willing to do anything,

[jira] [Created] (TIKA-1814) Add a standalone XMPScannerParser

2015-12-16 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1814: - Summary: Add a standalone XMPScannerParser Key: TIKA-1814 URL: https://issues.apache.org/jira/browse/TIKA-1814 Project: Tika Issue Type: Improvement

Re: looking to contribute

2015-12-16 Thread Nick Burch
On Wed, 16 Dec 2015, Joey Hong wrote: My name is Joey. I am a college freshmen with programming experience looking to get into the world of open-source. I was hoping to contribute to the Tika project, and was wondering if there were any tasks that a beginner like me could tackle. I am willing

[jira] [Updated] (TIKA-1813) Figure out file types for several unknown OLE files in Common Crawl

2015-12-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1813: -- Attachment: 225HYXAEU2DKSBNQ3SVD3HXCYMSHXVTB 25JIANLV77U645GUSJ2E67YSM4B2TNSP

[jira] [Created] (TIKA-1813) Figure out file types for several unknown OLE files in Common Crawl

2015-12-16 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1813: - Summary: Figure out file types for several unknown OLE files in Common Crawl Key: TIKA-1813 URL: https://issues.apache.org/jira/browse/TIKA-1813 Project: Tika

[jira] [Commented] (TIKA-1813) Figure out file types for several unknown OLE files in Common Crawl

2015-12-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1505#comment-1505 ] Tim Allison commented on TIKA-1813: --- {{file}} yields: {{Composite Document File V2 Document, corrupt:

[jira] [Commented] (TIKA-1799) Upgrade to POI 3.14-Beta1 when available

2015-12-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060029#comment-15060029 ] Tim Allison commented on TIKA-1799: --- [~kiwiwings], any recommendations for what I need to change in our

[GitHub] tika pull request: fix for TIKA-1803 contributed by msha...@usc.ed...

2015-12-16 Thread smadha
GitHub user smadha opened a pull request: https://github.com/apache/tika/pull/65 fix for TIKA-1803 contributed by msha...@usc.edu You can merge this pull request into a Git repository by running: $ git pull https://github.com/smadha/tika TIKA-1803 Alternatively you can

[jira] [Commented] (TIKA-1813) Figure out file types for several unknown OLE files in Common Crawl

2015-12-16 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060203#comment-15060203 ] Nick Burch commented on TIKA-1813: -- My best guess is that these have been truncated. Having a look with

[jira] [Commented] (TIKA-1803) Use lucene-geo-gazetteer REST API in GeoTopicParser

2015-12-16 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060182#comment-15060182 ] ASF GitHub Bot commented on TIKA-1803: -- GitHub user smadha opened a pull request:

[jira] [Comment Edited] (TIKA-1813) Figure out file types for several unknown OLE files in Common Crawl

2015-12-16 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060203#comment-15060203 ] Nick Burch edited comment on TIKA-1813 at 12/16/15 3:58 PM: My best guess is

[jira] [Updated] (TIKA-1813) Figure out file types for several unknown OLE files in Common Crawl

2015-12-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1813: -- Attachment: 27BYDLE36XWCDZXA3PPV6MF524UQ6KAF This looks like a Revit project file (block 2). > Figure

[jira] [Commented] (TIKA-1813) Figure out file types for several unknown OLE files in Common Crawl

2015-12-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060254#comment-15060254 ] Tim Allison commented on TIKA-1813: --- Duh...I initially posted the exceptions on the theory that we may be

[jira] [Updated] (TIKA-1813) Figure out file types for several unknown OLE files in Common Crawl

2015-12-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1813: -- Attachment: unidentified_ole_docs_in_common_crawl_slice.csv Rather than posting files, here's the list