[ 
https://issues.apache.org/jira/browse/TIKA-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532211
 ] 

Keith R. Bennett commented on TIKA-35:
--------------------------------------

Rida -

Please close the ReadableInputStream after you're finished using it.  That 
deletes the temporary file, if one was created.  Without this, the user's disk 
could be filled up while processing documents.

Sorry, I should have included the close() call in the unit test I provided.  
(Would you modify that too please?)

- Keith


> Extract MsOffice properties
> ---------------------------
>
>                 Key: TIKA-35
>                 URL: https://issues.apache.org/jira/browse/TIKA-35
>             Project: Tika
>          Issue Type: Improvement
>    Affects Versions: 0.1-incubator
>            Reporter: Rida Benjelloun
>            Assignee: Rida Benjelloun
>             Fix For: 0.1-incubator
>
>         Attachments: RereadableInputStream.java, 
> RereadableInputStreamTest.java, tika35.patch, tika35.patch
>
>
> Hi,
> I have developed a patch that allows MsOffice properties extraction. I wasn't 
> able to extract the MsOffice properties and full text from a single 
> inputstream, I always get this error : java.io.IOException Source code of 
> java.io.IOException: Unable to read entire header; -1 bytes read;
> expected 512 bytes. 
> I don't know how they make it work in Nutch (any ideas ?).
> To get it work, I have added "filePath" variable in the parser class, and I 
> populate it from ParseUtils class. After that I create an inputStream from 
> filePath or Url and I use it to extract properties and I use the default 
> inputstream to extract full text.
> I didn't commit this modification; I would like to have your opinions before.
> Regards.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to