Re:Re: proceed with the limitation of character length

2014-10-14 Thread imyuka
= new ParseContext(); context.set(Parser.class, parser); parser.parse(is, handler, metadata, context); return new TikaBox(metadata,handler);{ At 2014-10-14 17:56:20, "Nick Burch" wrote: >On Tue, 14 Oct 2014, imyuka wrote:

proceed with the limitation of character length

2014-10-13 Thread imyuka
Hi all, I catch a 'more than 10 characters' exception while processing a document, to avoid this, I can either use the abridged text or increase the maximum limit. In these cases, how can I increase the limit or retrieve only the first 10 characters of the document without throwing

Re:Re: Formatted Content Extraction and Title Detection

2014-10-09 Thread imyuka
, while I have no idea about the Java coding implementation. Are there any instructions or tutorials I can refer to? Thanks! At 2014-10-09 20:46:01, "Nick Burch" wrote: >On Thu, 9 Oct 2014, imyuka wrote: >> Here is my problem: I have extracted plain texts from a s

Formatted Content Extraction and Title Detection

2014-10-09 Thread imyuka
Hi all, Here is my problem: I have extracted plain texts from a serious of doc(x) documents and their titles via the "dc:title" label of metadata, but I'm not sure this is the right way to attain a title of a document. In many cases, a title inside a document could be of the largest font-s