[
https://issues.apache.org/jira/browse/TIKA-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529676
]
Keith R. Bennett commented on TIKA-20:
--------------------------------------
A patch addressing this issue has been attached to issue #17. That patch
addresses issues #17, #20, and #24.
> A convenience method for getting a document's text in a single method would
> be helpful.
> ---------------------------------------------------------------------------------------
>
> Key: TIKA-20
> URL: https://issues.apache.org/jira/browse/TIKA-20
> Project: Tika
> Issue Type: New Feature
> Components: general
> Affects Versions: 0.1-incubator
> Reporter: Keith R. Bennett
> Priority: Minor
> Fix For: 0.1-incubator
>
>
> A convenience method for getting a document's text in a single method would
> be helpful.
> This would address the common use case of wanting the string content, but not
> the document metadata.
> Sample methods are below:
> ------------------------------------------------------------------
> /**
> * Gets the full text (but not other properties of the document
> * at the specified URL.
> *
> * @param documentUrl URL of the resource to parse
> * @param configUrl url of Tika configuration object
> * @return the document's full text
> */
> public static String getStrContent(URL documentUrl, URL configUrl)
> throws LiusException, IOException {
> return getStrContent(documentUrl,
> LiusConfig.getInstance(configUrl));
> }
> /**
> * Gets the full text (but not other properties of the document
> * at the specified URL.
> *
> * @param documentUrl URL of the resource to parse
> * @param config Tika configuration object
> * @return the document's full text
> */
> public static String getStrContent(URL documentUrl, LiusConfig config)
> throws LiusException, IOException {
> String fulltext = null;
> if (documentUrl != null) {
> Parser parser = ParserFactory.getParser(documentUrl, config);
> fulltext = parser.getStrContent();
> }
> return fulltext;
> }
> =========================
> This code assumes changes to the code base that are not (yet) committed that
> will enable us to use URL's for input document specifiers. (See TIKA-17.)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.