Ok Jukka, I'll give that a try. I'm assuming what I'd need to do is send the entire JSON string as one token for Lucene.
Assuming I have anywhere from 100 to 10,000 nodes (10,000 would be rare) where each node can contain one of these big string properties (anywhere from 1,000 to 1,000,000 chars), do you think it is feasible to use Lucene in this fashion? In other words, will each query essentially take forever or will the index space be huge? I figure asking can only make me look silly:) Also, I'm pretty sure I understand a lot of why this works how it does, but it seems like the implementation doesn't meet the spec for jcr:like. Am I misreading it? On 3/29/11, Paco Avila <[email protected]> wrote: > OK, I understand. So, the only problem are "very big words". Nice to know :) > > On Tue, Mar 29, 2011 at 3:34 PM, Jukka Zitting <[email protected]> wrote: >> Hi, >> >> Paco Avila asked: >>> this means that we can't index string properties bigger than >>> 255 characters, isn't it? >> >> No, just that a single token (word, number, etc.) won't be included in the >> index if it's longer than that. Most normal string properties consist of >> many smaller tokens. >> >> If you do have such very long tokens and you need them to be searchable, >> you can configure Jackrabbit to use a custom analyzer for such properties. >> See the Index Analyzers section in [1] for more details. >> >> [1] http://wiki.apache.org/jackrabbit/IndexingConfiguration >> >> -- >> Jukka Zitting >> >> > > > > -- > OpenKM > http://www.openkm.com >
