What do you mean by "url case"? No, I'm not being snarky.....

The value returned in a doc is very different than the value searched.
The stored data is the original input without going through any
filters.

If you mean the value _returned_ by Solr from a stored field, then the
case is exactly whatever was input originally. To get it a consistent
case, I'd change it on the client side before sending  to Solr, or
use, say, a  ScriptUpdateProcessor to change it on the way in to Solr.

If you're talking about _searching_ the URL, you need to put the
appropriate filters in your analysis chain. Most distributions have a
"lowercase" type that is a keywordtokenizer and lowercasefilter That
still treats the searchable text as a single token, so for instance
you wouldn't be able to search for url:com with pre-and-post wildcards
which is not a good pattern. If you want to search sub-parts of a url,
you'll use one of the text-based types to break it up into tokens.
Even in this case, though, the returned data is still the original
case since it's the stored data that's returned.

Best,
Erick
On Tue, Dec 11, 2018 at 8:38 AM Moyer, Brett <bmo...@tiaa.org> wrote:
>
> Hello, I'm new to Solr been using it for a few months. A recent question came 
> up from our business partners about URL casing. Previously their URLs were 
> upper case, they made a change and now all lower. Both pages/URLs are still 
> accessible so there are duplicates in Solr. They are requesting all URLs be 
> evaluated as lowercase. What is the best practice on URL case? Is there a 
> negative to making all lowercase? I know I can drop the index and re-crawl to 
> fix it, but long term how should URL case be treated? Thanks!
>
> Brett Moyer
>
> *************************************************************************
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender immediately 
> and then delete it.
>
> TIAA
> *************************************************************************

Reply via email to