Can you reload all the content? If so, I would calculate this in an update request processor and put the result in its own field.
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 21, 2015, at 2:53 AM, Roland Szűcs <roland.sz...@booknwalk.com> wrote: > > Thank Toke your quick response. All your suggestions seem to be very good > idea. I found the capital letters also strange because of the names, places > so I will skip this part as I do not need an absolute measure just a ranked > order among my documents, > > cheers, > Roland > > > > 2015. okt. 21. dátummal, 11:25 időpontban Toke Eskildsen > <t...@statsbiblioteket.dk> írta: > >> Roland Szűcs <roland.sz...@booknwalk.com> wrote: >>> My use case is that I have to calculate the LIX readability index for my >>> documents. >> [...] >>> *B* = Number of periods (defined by period, colon or capital first letter) >> [...] >>> Does anybody have idea how to get the number of "periods"? >> >> As the positions does not matter, you could make a copyField containing only >> punctuation. And maybe extended with a replace filter so that you have dot, >> comma, color, bang, question ect. instead of .,:!? >> >> The capital first letter seems a bit strange to me - what about names? But >> anyway, you could do it with a PatternReplaceCharFilter, matching on >> something like >> ([^.,:!?]\p{Space}*\p{Upper})|(^\p{Upper}) >> and replacing with 'capital' (the regexp above probably fails - it was just >> from memory). >> >> - Toke Eskildsen