I got the sense from Paul's post that he wanted a solution that didn't
require changing his index, although I'm not sure there is one. Paul if
you're willing to re-index, you could also store the length of the text
as a numeric field, retrieve that and use it to drive the decision about
whethe
Simply have two fields, "full_body" and "limited_body". The former would
index but not store the full document text from Tika (the "content"
metadata.) The latter would store but not necessarily index the first 10K or
so characters of the full text. Do searches on the full body field and
highli
Uwe,
thank you for the advice. I updated my code.
On Sat, Jun 23, 2012 at 3:15 AM, Uwe Schindler wrote:
>> I found the main issue.
>> I was using ByteRef without the length. This fixed the problem.
>>
>> String word = new
> String(ref.bytes,ref.offset,ref.length);
>
> Pleas
> I found the main issue.
> I was using ByteRef without the length. This fixed the problem.
>
> String word = new
String(ref.bytes,ref.offset,ref.length);
Please see my other mail, using no character set here is the second problem
of your code, this is the correct way to do:
Don't ever do this:
String word = new String(ref.bytes);
This has following problems:
- ignores character set!!! (in general: never ever use new String(byte[])
without specifying the 2nd charset parameter!). byte[] != String. Depending
on the default charset on your computer this would return bul