Lowercasing might work, it might not.

Hostnames originally were case-insensitive, but that might have changed with 
I18N hostnames.

Paths are interpreted by the web server. On Windows, paths are 
case-insensitive. On Unix, they are case-sensitive. Web servers might be 
configured to use case-insensitive paths.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 11, 2018, at 10:33 AM, Moyer, Brett <bmo...@tiaa.org> wrote:
> 
> https://www.nuveen.com/mutual-funds/nuveen-high-yield-municipal-bond-fund
> https://www.nuveen.com/mutual-funds/Nuveen-High-Yield-Municipal-Bond-Fund
> 
> Is there any issue if we just lowercase all URLs? I can't think of an issue 
> that would be caused, but that's why I'm asking the Guru's!
> 
> Brett Moyer
>    
> 
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com] 
> Sent: Tuesday, December 11, 2018 12:41 PM
> To: solr-user
> Subject: Re: URL Case Sensitive/Insensitive
> 
> What do you mean by "url case"? No, I'm not being snarky.....
> 
> The value returned in a doc is very different than the value searched.
> The stored data is the original input without going through any
> filters.
> 
> If you mean the value _returned_ by Solr from a stored field, then the
> case is exactly whatever was input originally. To get it a consistent
> case, I'd change it on the client side before sending  to Solr, or
> use, say, a  ScriptUpdateProcessor to change it on the way in to Solr.
> 
> If you're talking about _searching_ the URL, you need to put the
> appropriate filters in your analysis chain. Most distributions have a
> "lowercase" type that is a keywordtokenizer and lowercasefilter That
> still treats the searchable text as a single token, so for instance
> you wouldn't be able to search for url:com with pre-and-post wildcards
> which is not a good pattern. If you want to search sub-parts of a url,
> you'll use one of the text-based types to break it up into tokens.
> Even in this case, though, the returned data is still the original
> case since it's the stored data that's returned.
> 
> Best,
> Erick
> On Tue, Dec 11, 2018 at 8:38 AM Moyer, Brett <bmo...@tiaa.org> wrote:
>> 
>> Hello, I'm new to Solr been using it for a few months. A recent question 
>> came up from our business partners about URL casing. Previously their URLs 
>> were upper case, they made a change and now all lower. Both pages/URLs are 
>> still accessible so there are duplicates in Solr. They are requesting all 
>> URLs be evaluated as lowercase. What is the best practice on URL case? Is 
>> there a negative to making all lowercase? I know I can drop the index and 
>> re-crawl to fix it, but long term how should URL case be treated? Thanks!
>> 
>> Brett Moyer
>> 
>> *************************************************************************
>> This e-mail may contain confidential or privileged information.
>> If you are not the intended recipient, please notify the sender immediately 
>> and then delete it.
>> 
>> TIAA
>> *************************************************************************
> *************************************************************************
> This e-mail may contain confidential or privileged information.
> If you are not the intended recipient, please notify the sender immediately 
> and then delete it.
> 
> TIAA
> *************************************************************************

Reply via email to