[ 
https://issues.apache.org/jira/browse/TIKA-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated TIKA-870:
------------------------------------

    Attachment: TIKA-870.patch

Patch, with the sample code plus a test case.

The test case failed at first!  Ie, the returned string was over the specified 
limit... I dug and discovered WriteOutContentHandler wasn't overriding/counting 
ignorableWhitespace, so I added that override and now the test passes.

I think it's ready...
                
> Allow to use call parseToString with a additional parameter of 
> MaxStringLength, so it can be changed per call
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-870
>                 URL: https://issues.apache.org/jira/browse/TIKA-870
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Shay Banon
>            Assignee: Michael McCandless
>         Attachments: TIKA-870.patch
>
>
> It would be great to be able to call parseToString with an additional 
> parameter of the maxStringLength, instead of having to set it on the Tika 
> instance. This allows to set it per parse call. Sample code:
> {code}
> public String parseToString(InputStream stream, Metadata metadata, int 
> maxStringLength)
>         throws IOException, TikaException {
>     WriteOutContentHandler handler =
>         new WriteOutContentHandler(maxStringLength);
>     try {
>         ParseContext context = new ParseContext();
>         context.set(Parser.class, parser);
>         parser.parse(
>                 stream, new BodyContentHandler(handler), metadata, context);
>     } catch (SAXException e) {
>         if (!handler.isWriteLimitReached(e)) {
>             // This should never happen with BodyContentHandler...
>             throw new TikaException("Unexpected SAX processing failure", e);
>         }
>     } finally {
>         stream.close();
>     }
>     return handler.toString();
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to