I'm using ExtractingRequestHandler to do the indexing. Do I have to
implement the UpdateProcessor method at the ExtractingRequestHandler or as
a separate method?

Regards,
Edwin

On 26 May 2015 at 23:42, Alessandro Benedetti <benedetti.ale...@gmail.com>
wrote:

> I think this is still in topic,
> Assuming we are using the Extract Update handler, I think the update
> processor approach still applies.
> But is it not possible to strip them directly with some extract request
> handler param?
>
>
> 2015-05-26 16:33 GMT+01:00 Jack Krupansky <jack.krupan...@gmail.com>:
>
> > Neither - it removes the characters before indexing. The distinction is
> > that if you remove them during indexing they will still appear in the
> > stored field values even if they are removed from the indexed values, but
> > by removing them before indexing, they will not appear in the stored
> field
> > values. Again, the distinction is between indexed field values and stored
> > field values.
> >
> > -- Jack Krupansky
> >
> > On Tue, May 26, 2015 at 10:25 AM, Zheng Lin Edwin Yeo <
> > edwinye...@gmail.com>
> > wrote:
> >
> > > It is showing up in the search results. Just to confirm, does this
> > > UpdateProcessor method remove the characters during indexing or only
> > after
> > > indexing has been done?
> > >
> > > Regards,
> > > Edwin
> > >
> > > On 26 May 2015 at 21:30, Upayavira <u...@odoko.co.uk> wrote:
> > >
> > > >
> > > >
> > > > On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote:
> > > > > Hi,
> > > > >
> > > > > Is there a way to remove the special characters like \n during
> > indexing
> > > > > of
> > > > > the rich text documents.
> > > > >
> > > > > I have quite alot of leading \n \n in front of my indexed content
> of
> > > rich
> > > > > text documents due to the space and empty lines with the original
> > > > > documents, and it's causing the content to be flooded with '\n \n'
> at
> > > the
> > > > > start before the actual content comes in. This causes the content
> to
> > > look
> > > > > ugly, and also takes up unnecessary bandwidth in the system.
> > > >
> > > > Where is this showing up?
> > > >
> > > > If it is in search results, you must use an UpdateProcessor, as these
> > > > happen before fields are stored (E.g. RegexpReplaceProcessorFactory).
> > > >
> > > > If you are concerned about facet results, then you can do it in an
> > > > analysis chain, for example with a RegexpFilterFactory.
> > > >
> > > > Upayavira
> > > >
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Reply via email to