Hi Alex

The business use case for the field is
- exact match
- singular-plural stemmingon each terms in the field
Eg. search for "dvd cases" must match "dvd case"and "dvds case".

This is the field type currently and It satisfy the business use case.
The 1 drawback of this is I need to add those words that cannot be singular-plural stemmed correctly by EnglishMinimalStemFilter to the 'plural-singular.txt' of StemmerOverrideFilter as and when users reported on those words.

<fieldType class="solr.TextField" name="gs_keyword_pattern" positionIncrementGap="100">
   <analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="^(.*)$" replacement="z01x $1 z01x" />
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StemmerOverrideFilterFactory" dictionary="plural_singular.txt" />
      <filter class="solr.EnglishMinimalStemFilterFactory" />
   </analyzer>
</fieldType>

I am wondering if it is advisable to let Solr append the code 'z01x' during indexing or append the code at source data end and feed to Solr. For the query aspect, I will let Solr append the code to the query search words.

On 3/30/2017 7:28 PM, Alexandre Rafalovitch wrote:
What's you actual business use case?

On 30 Mar 2017 1:53 AM, "Derek Poh" <d...@globalsources.com> wrote:

Hi Erick

So I could also not use the query analyzer stage to append the code to the
search keyword?
Have the front-end application append the code for every query it issue
instead?


On 3/30/2017 12:20 PM, Erick Erickson wrote:

I generally prefer index-time work to query-time work on the theory
that the index-time work is done once and the query time work is done
for each query.

That said, for a corpus this size (and presumably without a large
query rate) I doubt you'd be able to measure any difference.

So basically choose the easiest to implement IMO.

Best,
Erick

On Wed, Mar 29, 2017 at 8:43 PM, Alexandre Rafalovitch
<arafa...@gmail.com> wrote:

I am not sure I can tell how to decide on one or another. However, I
wanted to mention that you also have an option of doing in in the
UpdateRequestProcessor chain. That's still within Solr (and therefore
is consistent with multiple clients feeding into Solr) but is before
individual field processing (so will survive - for example - a
copyField).

Regards,
     Alex.
----
http://www.solr-start.com/ - Resources for Solr users, new and
experienced


On 29 March 2017 at 23:38, Derek Poh <d...@globalsources.com> wrote:

Hi

Ineed to create afield that will be prefix and suffix with code
'z01x'.This
field needs to have the code in the index and during query.
I can either
1.
have the source data of the field formatted with the code before
indexing
(outside solr).
use a charFilter in the query stage of the field typeto add the
codeduring
query.

<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="^(.*)$"
replacement="z01x $1 z01x" />

OR

2.
use the charFilter before tokenizerclass during the index and query
analyzer
stage of the field type.

The collection has between 100k - 200k documents currentlybut it may
increase in the future.
Theindexing time with option 2 and current indexing time is almost the
same,
between 2-3 minutes.

Which option would you advice?

Derek

----------------------
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential and/or
privileged information. If you are not the intended recipient or have
received this e-mail in error, please inform the sender immediately and
delete this e-mail (including any attachments) from your computer, and
you
must not use, disclose to anyone else or copy this e-mail (including any
attachments), whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal,
regulatory compliance and/or other appropriate reasons.

----------------------
CONFIDENTIALITY NOTICE
This e-mail (including any attachments) may contain confidential and/or
privileged information. If you are not the intended recipient or have
received this e-mail in error, please inform the sender immediately and
delete this e-mail (including any attachments) from your computer, and you
must not use, disclose to anyone else or copy this e-mail (including any
attachments), whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal,
regulatory compliance and/or other appropriate reasons.


----------------------
CONFIDENTIALITY NOTICE This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part.
This e-mail and any reply to it may be monitored for security, legal, 
regulatory compliance and/or other appropriate reasons.

Reply via email to