I'm using ExtractingRequestHandler to do the indexing. Do I have to implement the UpdateProcessor method at the ExtractingRequestHandler or as a separate method?
Regards, Edwin On 26 May 2015 at 23:42, Alessandro Benedetti <benedetti.ale...@gmail.com> wrote: > I think this is still in topic, > Assuming we are using the Extract Update handler, I think the update > processor approach still applies. > But is it not possible to strip them directly with some extract request > handler param? > > > 2015-05-26 16:33 GMT+01:00 Jack Krupansky <jack.krupan...@gmail.com>: > > > Neither - it removes the characters before indexing. The distinction is > > that if you remove them during indexing they will still appear in the > > stored field values even if they are removed from the indexed values, but > > by removing them before indexing, they will not appear in the stored > field > > values. Again, the distinction is between indexed field values and stored > > field values. > > > > -- Jack Krupansky > > > > On Tue, May 26, 2015 at 10:25 AM, Zheng Lin Edwin Yeo < > > edwinye...@gmail.com> > > wrote: > > > > > It is showing up in the search results. Just to confirm, does this > > > UpdateProcessor method remove the characters during indexing or only > > after > > > indexing has been done? > > > > > > Regards, > > > Edwin > > > > > > On 26 May 2015 at 21:30, Upayavira <u...@odoko.co.uk> wrote: > > > > > > > > > > > > > > > On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote: > > > > > Hi, > > > > > > > > > > Is there a way to remove the special characters like \n during > > indexing > > > > > of > > > > > the rich text documents. > > > > > > > > > > I have quite alot of leading \n \n in front of my indexed content > of > > > rich > > > > > text documents due to the space and empty lines with the original > > > > > documents, and it's causing the content to be flooded with '\n \n' > at > > > the > > > > > start before the actual content comes in. This causes the content > to > > > look > > > > > ugly, and also takes up unnecessary bandwidth in the system. > > > > > > > > Where is this showing up? > > > > > > > > If it is in search results, you must use an UpdateProcessor, as these > > > > happen before fields are stored (E.g. RegexpReplaceProcessorFactory). > > > > > > > > If you are concerned about facet results, then you can do it in an > > > > analysis chain, for example with a RegexpFilterFactory. > > > > > > > > Upayavira > > > > > > > > > > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card : http://about.me/alessandro_benedetti > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >