I tried to follow the example here https://wiki.apache.org/solr/UpdateRequestProcessor, by putting the updateRequestProcessorChain in my solrconfig.xml
But I'm getting the following error when I tried to reload the core. Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.CustomUpdateRequestProcessorFactory' Is there anything I might have missed out? I'm using Solr 5.1. Regards, Edwin On 27 May 2015 at 10:13, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > I'm using ExtractingRequestHandler to do the indexing. Do I have to > implement the UpdateProcessor method at the ExtractingRequestHandler or > as a separate method? > > Regards, > Edwin > > On 26 May 2015 at 23:42, Alessandro Benedetti <benedetti.ale...@gmail.com> > wrote: > >> I think this is still in topic, >> Assuming we are using the Extract Update handler, I think the update >> processor approach still applies. >> But is it not possible to strip them directly with some extract request >> handler param? >> >> >> 2015-05-26 16:33 GMT+01:00 Jack Krupansky <jack.krupan...@gmail.com>: >> >> > Neither - it removes the characters before indexing. The distinction is >> > that if you remove them during indexing they will still appear in the >> > stored field values even if they are removed from the indexed values, >> but >> > by removing them before indexing, they will not appear in the stored >> field >> > values. Again, the distinction is between indexed field values and >> stored >> > field values. >> > >> > -- Jack Krupansky >> > >> > On Tue, May 26, 2015 at 10:25 AM, Zheng Lin Edwin Yeo < >> > edwinye...@gmail.com> >> > wrote: >> > >> > > It is showing up in the search results. Just to confirm, does this >> > > UpdateProcessor method remove the characters during indexing or only >> > after >> > > indexing has been done? >> > > >> > > Regards, >> > > Edwin >> > > >> > > On 26 May 2015 at 21:30, Upayavira <u...@odoko.co.uk> wrote: >> > > >> > > > >> > > > >> > > > On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote: >> > > > > Hi, >> > > > > >> > > > > Is there a way to remove the special characters like \n during >> > indexing >> > > > > of >> > > > > the rich text documents. >> > > > > >> > > > > I have quite alot of leading \n \n in front of my indexed content >> of >> > > rich >> > > > > text documents due to the space and empty lines with the original >> > > > > documents, and it's causing the content to be flooded with '\n >> \n' at >> > > the >> > > > > start before the actual content comes in. This causes the content >> to >> > > look >> > > > > ugly, and also takes up unnecessary bandwidth in the system. >> > > > >> > > > Where is this showing up? >> > > > >> > > > If it is in search results, you must use an UpdateProcessor, as >> these >> > > > happen before fields are stored (E.g. >> RegexpReplaceProcessorFactory). >> > > > >> > > > If you are concerned about facet results, then you can do it in an >> > > > analysis chain, for example with a RegexpFilterFactory. >> > > > >> > > > Upayavira >> > > > >> > > >> > >> >> >> >> -- >> -------------------------- >> >> Benedetti Alessandro >> Visiting card : http://about.me/alessandro_benedetti >> >> "Tyger, tyger burning bright >> In the forests of the night, >> What immortal hand or eye >> Could frame thy fearful symmetry?" >> >> William Blake - Songs of Experience -1794 England >> > >