Re: Removing characters like '\n \n' from indexing

Zheng Lin Edwin Yeo Tue, 26 May 2015 21:17:23 -0700

I tried to follow the example here
https://wiki.apache.org/solr/UpdateRequestProcessor, by putting
the updateRequestProcessorChain in my solrconfig.xml


But I'm getting the following error when I tried to reload the core.

Caused by: org.apache.solr.common.SolrException: Error loading class
'solr.CustomUpdateRequestProcessorFactory'

Is there anything I might have missed out? I'm using Solr 5.1.


Regards,
Edwin


On 27 May 2015 at 10:13, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote:

> I'm using ExtractingRequestHandler to do the indexing. Do I have to
> implement the UpdateProcessor method at the ExtractingRequestHandler or
> as a separate method?
>
> Regards,
> Edwin
>
> On 26 May 2015 at 23:42, Alessandro Benedetti <benedetti.ale...@gmail.com>
> wrote:
>
>> I think this is still in topic,
>> Assuming we are using the Extract Update handler, I think the update
>> processor approach still applies.
>> But is it not possible to strip them directly with some extract request
>> handler param?
>>
>>
>> 2015-05-26 16:33 GMT+01:00 Jack Krupansky <jack.krupan...@gmail.com>:
>>
>> > Neither - it removes the characters before indexing. The distinction is
>> > that if you remove them during indexing they will still appear in the
>> > stored field values even if they are removed from the indexed values,
>> but
>> > by removing them before indexing, they will not appear in the stored
>> field
>> > values. Again, the distinction is between indexed field values and
>> stored
>> > field values.
>> >
>> > -- Jack Krupansky
>> >
>> > On Tue, May 26, 2015 at 10:25 AM, Zheng Lin Edwin Yeo <
>> > edwinye...@gmail.com>
>> > wrote:
>> >
>> > > It is showing up in the search results. Just to confirm, does this
>> > > UpdateProcessor method remove the characters during indexing or only
>> > after
>> > > indexing has been done?
>> > >
>> > > Regards,
>> > > Edwin
>> > >
>> > > On 26 May 2015 at 21:30, Upayavira <u...@odoko.co.uk> wrote:
>> > >
>> > > >
>> > > >
>> > > > On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote:
>> > > > > Hi,
>> > > > >
>> > > > > Is there a way to remove the special characters like \n during
>> > indexing
>> > > > > of
>> > > > > the rich text documents.
>> > > > >
>> > > > > I have quite alot of leading \n \n in front of my indexed content
>> of
>> > > rich
>> > > > > text documents due to the space and empty lines with the original
>> > > > > documents, and it's causing the content to be flooded with '\n
>> \n' at
>> > > the
>> > > > > start before the actual content comes in. This causes the content
>> to
>> > > look
>> > > > > ugly, and also takes up unnecessary bandwidth in the system.
>> > > >
>> > > > Where is this showing up?
>> > > >
>> > > > If it is in search results, you must use an UpdateProcessor, as
>> these
>> > > > happen before fields are stored (E.g.
>> RegexpReplaceProcessorFactory).
>> > > >
>> > > > If you are concerned about facet results, then you can do it in an
>> > > > analysis chain, for example with a RegexpFilterFactory.
>> > > >
>> > > > Upayavira
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> --------------------------
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>
>

Re: Removing characters like '\n \n' from indexing

Reply via email to