I can't help you on the implementation issues, but... You may want to do something a little different than keep all-uppercase tokens in upper case. You may want simply to special-case all-uppercase stopwords, so that they are not ignored. The poster boy for that is IT, which in my last search application, was *extremely common *and important. On the corpus side, [it] and [IT] are very distinct. But on the query side, most users will write [it], so it's fine to have it in the index as [it] and not [IT]. Similarly for ON (Ontario) and ME (Maine). A nasty one is OR: if you are using all-uppercase OR for the Boolean operator, how do users enter OR meaning Operations Research? We know that not many users will write ["OR"]. So you may simply want to allow lowercase [or] in the query to match uppercase [OR] in the corpus, and reserve uppercase OR for the Boolean operator. Other cases are much rarer (Dijsktra's THE operating system is of historical interest only...). For non-stopwords, there doesn't seem to be much of a problem.
-s On Wed, Sep 9, 2020 at 2:59 PM Dunham-Wilkie, Mike CITZ:EX < mike.dunham-wil...@gov.bc.ca> wrote: > Hi SOLR list, > > I'm currently using the White Space tokenizer and the Lower Case filter > with SOLR 7.3. I'd like to modify the logic to keep any tokens that are > entirely upper case as upper case, and just apply the Lower Case filter (or > something equivalent) to the remaining tokens. Is there a way to do this > using tokenizers and filters? > > Thanks > Mike > > > Mike Dunham-Wilkie | Senior Spatial Data Administration Analyst | PHONE... > 778-676-1791 > Data Systems & Services - Digital Platforms and Data Division - Ministry > of Citizens' Services > > For faster response and/or future inquires, the following email addresses > are monitored continuously: > BC Geographic Warehouse (BCGW) and Replication/ETL | DataBC Data > Architecture Services (databc...@gov.bc.ca<mailto:databc...@gov.bc.ca>) > BC Data Catalogue (BCDC) and Open Data | DataBC Catalogue Services ( > data...@gov.bc.ca<mailto:data...@gov.bc.ca>) > >