At Infoseek, we used “glue words” to build phrase tokens. It was really 
effective.
Phrase IDF is powerful stuff.

Luckily for you, the patent on that has expired. :-)

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 17, 2020, at 10:46 AM, David Hastings <hastings.recurs...@gmail.com> 
> wrote:
> 
> i use stop words for building shingles into "interesting phrases" for my
> machine teacher/students, so i wouldnt say theres no reason, however my use
> case is very specific.  Otherwise yeah, theyre gone for all practical
> reasons/search scenarios.
> 
> On Mon, Feb 17, 2020 at 1:41 PM Walter Underwood <wun...@wunderwood.org>
> wrote:
> 
>> Why are you using stopwords? I would need a really, really good reason to
>> use those.
>> 
>> Stopwords are an obsolete technique from 16-bit processors. I’ve never
>> used them and
>> I’ve been a search engineer since 1997.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Feb 17, 2020, at 7:31 AM, Thomas Corthals <tho...@klascement.net>
>> wrote:
>>> 
>>> Hi
>>> 
>>> I've run into an issue with creating a Managed Stopwords list that has
>> the
>>> same name as a previously deleted list. Going through the same flow with
>>> Managed Synonyms doesn't result in this unexpected behaviour. Am I
>> missing
>>> something or did I discover a bug in Solr?
>>> 
>>> On a newly started solr with the techproducts core:
>>> 
>>> curl -X PUT -H 'Content-type:application/json' --data-binary
>>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}'
>>> 
>> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist
>>> curl -X DELETE
>>> 
>> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist
>>> curl
>> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts
>>> curl -X PUT -H 'Content-type:application/json' --data-binary
>>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}'
>>> 
>> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist
>>> 
>>> The second PUT request results in a status 500 with error
>>> msg "java.util.LinkedHashMap cannot be cast to java.util.List".
>>> 
>>> Similar requests for synonyms work fine, no matter how many times I
>> repeat
>>> the CREATE/DELETE/RELOAD cycle:
>>> 
>>> curl -X PUT -H 'Content-type:application/json' --data-binary
>>> 
>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedSynonymGraphFilterFactory$SynonymManager"}'
>>> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap
>>> curl -X DELETE
>>> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap
>>> curl
>> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts
>>> curl -X PUT -H 'Content-type:application/json' --data-binary
>>> 
>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedSynonymGraphFilterFactory$SynonymManager"}'
>>> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap
>>> 
>>> Reloading after creating the Stopwords list but not after deleting it
>> works
>>> without error too on a fresh techproducts core (you'll have to remove the
>>> directory from disk and create the core again after running the previous
>>> commands).
>>> 
>>> curl -X PUT -H 'Content-type:application/json' --data-binary
>>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}'
>>> 
>> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist
>>> curl
>> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts
>>> curl -X DELETE
>>> 
>> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist
>>> curl -X PUT -H 'Content-type:application/json' --data-binary
>>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}'
>>> 
>> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist
>>> 
>>> And even curiouser, when doing a CREATE/DELETE for Stopwords, then a
>>> CREATE/DELETE for Synonyms, and only then a RELOAD of the core, the cycle
>>> can be completed twice. (Again, on a freshly created techproducts core.)
>>> Only the third attempt to create a list results in an error. Synonyms can
>>> still be created and deleted repeatedly after this.
>>> 
>>> curl -X PUT -H 'Content-type:application/json' --data-binary
>>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}'
>>> 
>> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist
>>> curl -X DELETE
>>> 
>> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist
>>> curl -X PUT -H 'Content-type:application/json' --data-binary
>>> 
>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedSynonymGraphFilterFactory$SynonymManager"}'
>>> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap
>>> curl -X DELETE
>>> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap
>>> curl
>> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts
>>> curl -X PUT -H 'Content-type:application/json' --data-binary
>>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}'
>>> 
>> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist
>>> curl -X DELETE
>>> 
>> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist
>>> curl -X PUT -H 'Content-type:application/json' --data-binary
>>> 
>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedSynonymGraphFilterFactory$SynonymManager"}'
>>> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap
>>> curl -X DELETE
>>> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap
>>> curl
>> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts
>>> curl -X PUT -H 'Content-type:application/json' --data-binary
>>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}'
>>> 
>> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist
>>> 
>>> The same successes/errors occur when running each cycle against a
>> different
>>> core if the cores share the same configset.
>>> 
>>> Any ideas on what might be going wrong?
>> 
>> 

Reply via email to