Thank Toke your quick response. All your suggestions seem to be very good idea. 
I found the capital letters also strange because of the names, places so I will 
skip this part as I do not need an absolute measure just a ranked order among 
my documents,

cheers,
Roland



2015. okt. 21. dátummal, 11:25 időpontban Toke Eskildsen 
<t...@statsbiblioteket.dk> írta:

> Roland Szűcs <roland.sz...@booknwalk.com> wrote:
>> My use case is that I have to calculate the LIX readability index for my
>> documents.
> [...]
>> *B* = Number of periods (defined by period, colon or capital first letter)
> [...]
>> Does anybody have idea how to get the number of "periods"?
> 
> As the positions does not matter, you could make a copyField containing only 
> punctuation. And maybe extended with a replace filter so that you have dot, 
> comma, color, bang, question ect. instead of .,:!?
> 
> The capital first letter seems a bit strange to me - what about names? But 
> anyway, you could do it with a PatternReplaceCharFilter, matching on 
> something like 
> ([^.,:!?]\p{Space}*\p{Upper})|(^\p{Upper})
> and replacing with 'capital' (the regexp above probably fails - it was just 
> from memory).
> 
> - Toke Eskildsen

Reply via email to