Hi Wunder, Yes I can reload the documents it takes max 2-3 hours. I have never used the update request proccessor but I will check it on the Solr Wiki. Thanks your help
Cheers, Roland 2015. okt. 21. dátummal, 17:25 időpontban Walter Underwood <wun...@wunderwood.org> írta: > Can you reload all the content? > > If so, I would calculate this in an update request processor and put the > result in its own field. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >> On Oct 21, 2015, at 2:53 AM, Roland Szűcs <roland.sz...@booknwalk.com> wrote: >> >> Thank Toke your quick response. All your suggestions seem to be very good >> idea. I found the capital letters also strange because of the names, places >> so I will skip this part as I do not need an absolute measure just a ranked >> order among my documents, >> >> cheers, >> Roland >> >> >> >> 2015. okt. 21. dátummal, 11:25 időpontban Toke Eskildsen >> <t...@statsbiblioteket.dk> írta: >> >>> Roland Szűcs <roland.sz...@booknwalk.com> wrote: >>>> My use case is that I have to calculate the LIX readability index for my >>>> documents. >>> [...] >>>> *B* = Number of periods (defined by period, colon or capital first letter) >>> [...] >>>> Does anybody have idea how to get the number of "periods"? >>> >>> As the positions does not matter, you could make a copyField containing >>> only punctuation. And maybe extended with a replace filter so that you have >>> dot, comma, color, bang, question ect. instead of .,:!? >>> >>> The capital first letter seems a bit strange to me - what about names? But >>> anyway, you could do it with a PatternReplaceCharFilter, matching on >>> something like >>> ([^.,:!?]\p{Space}*\p{Upper})|(^\p{Upper}) >>> and replacing with 'capital' (the regexp above probably fails - it was just >>> from memory). >>> >>> - Toke Eskildsen >