[ 
https://issues.apache.org/jira/browse/SOLR-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498492
 ] 

Ryan McKinley commented on SOLR-248:
------------------------------------

> 
> 1) would it make sense for the keep option to refer to a file, using the same 
> format as StopFilter ... that way it's easy to reuse the same file (which 
> seems like it would be a common case.
> 

probably.  that is a good idea


> 2) what is the point of forceFirstLetter="true" ? ... if you want to force 
> capitalization, what's the point of making hte keep list?
> 

This is one that came of necessity!

with keep="the ..."  and input:
 "Grand army of the Republic", "the arts"

I want: "Grand Army of the Republic" and "The Arts"

"forceFirstLetter" only applies to the first character in the token, not to 
each word.


> 3) is okPrefix going to force the case for things that have that prefix in an 
> alternate case, or only allow that casing to remain (ie: if i index McKeen, 
> Mckeen, mckeen and MCKEEN what tokens do i wind up with?)
> 

As written, if the prefix matches, it assumes the word capitalization is 
correct.  For my input data, this is sufficient -- but it should problem do 
something smarter.

So, if you index "McKeen, Mckeen, mckeen, MCKEEN and McKEEN", you would get:

 "McKeen, Mckeen, Mckeen, Mckeen And McKEEN"

If "okPrefix" was treated as *the* capitalization for input where the lowercase 
prefix matches "mck", it would give:

 "McKeen, McKeen, McKeen, McKeen And McKeen"



> Capitalization Filter Factory
> -----------------------------
>
>                 Key: SOLR-248
>                 URL: https://issues.apache.org/jira/browse/SOLR-248
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Ryan McKinley
>            Priority: Minor
>         Attachments: SOLR-248-CapitalizationFilter.patch
>
>
> For tokens that are used in faceting, it is nice to have standard 
> capitalization.  
> I want "Aerial views" and "Aerial Views" to both be: "Aerial Views"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to