At Infoseek, we used “glue words” to build phrase tokens. It was really effective. Phrase IDF is powerful stuff.
Luckily for you, the patent on that has expired. :-) wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 17, 2020, at 10:46 AM, David Hastings <hastings.recurs...@gmail.com> > wrote: > > i use stop words for building shingles into "interesting phrases" for my > machine teacher/students, so i wouldnt say theres no reason, however my use > case is very specific. Otherwise yeah, theyre gone for all practical > reasons/search scenarios. > > On Mon, Feb 17, 2020 at 1:41 PM Walter Underwood <wun...@wunderwood.org> > wrote: > >> Why are you using stopwords? I would need a really, really good reason to >> use those. >> >> Stopwords are an obsolete technique from 16-bit processors. I’ve never >> used them and >> I’ve been a search engineer since 1997. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Feb 17, 2020, at 7:31 AM, Thomas Corthals <tho...@klascement.net> >> wrote: >>> >>> Hi >>> >>> I've run into an issue with creating a Managed Stopwords list that has >> the >>> same name as a previously deleted list. Going through the same flow with >>> Managed Synonyms doesn't result in this unexpected behaviour. Am I >> missing >>> something or did I discover a bug in Solr? >>> >>> On a newly started solr with the techproducts core: >>> >>> curl -X PUT -H 'Content-type:application/json' --data-binary >>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' >>> >> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist >>> curl -X DELETE >>> >> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist >>> curl >> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts >>> curl -X PUT -H 'Content-type:application/json' --data-binary >>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' >>> >> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist >>> >>> The second PUT request results in a status 500 with error >>> msg "java.util.LinkedHashMap cannot be cast to java.util.List". >>> >>> Similar requests for synonyms work fine, no matter how many times I >> repeat >>> the CREATE/DELETE/RELOAD cycle: >>> >>> curl -X PUT -H 'Content-type:application/json' --data-binary >>> >> '{"class":"org.apache.solr.rest.schema.analysis.ManagedSynonymGraphFilterFactory$SynonymManager"}' >>> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap >>> curl -X DELETE >>> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap >>> curl >> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts >>> curl -X PUT -H 'Content-type:application/json' --data-binary >>> >> '{"class":"org.apache.solr.rest.schema.analysis.ManagedSynonymGraphFilterFactory$SynonymManager"}' >>> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap >>> >>> Reloading after creating the Stopwords list but not after deleting it >> works >>> without error too on a fresh techproducts core (you'll have to remove the >>> directory from disk and create the core again after running the previous >>> commands). >>> >>> curl -X PUT -H 'Content-type:application/json' --data-binary >>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' >>> >> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist >>> curl >> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts >>> curl -X DELETE >>> >> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist >>> curl -X PUT -H 'Content-type:application/json' --data-binary >>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' >>> >> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist >>> >>> And even curiouser, when doing a CREATE/DELETE for Stopwords, then a >>> CREATE/DELETE for Synonyms, and only then a RELOAD of the core, the cycle >>> can be completed twice. (Again, on a freshly created techproducts core.) >>> Only the third attempt to create a list results in an error. Synonyms can >>> still be created and deleted repeatedly after this. >>> >>> curl -X PUT -H 'Content-type:application/json' --data-binary >>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' >>> >> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist >>> curl -X DELETE >>> >> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist >>> curl -X PUT -H 'Content-type:application/json' --data-binary >>> >> '{"class":"org.apache.solr.rest.schema.analysis.ManagedSynonymGraphFilterFactory$SynonymManager"}' >>> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap >>> curl -X DELETE >>> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap >>> curl >> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts >>> curl -X PUT -H 'Content-type:application/json' --data-binary >>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' >>> >> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist >>> curl -X DELETE >>> >> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist >>> curl -X PUT -H 'Content-type:application/json' --data-binary >>> >> '{"class":"org.apache.solr.rest.schema.analysis.ManagedSynonymGraphFilterFactory$SynonymManager"}' >>> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap >>> curl -X DELETE >>> http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap >>> curl >> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts >>> curl -X PUT -H 'Content-type:application/json' --data-binary >>> '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' >>> >> http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist >>> >>> The same successes/errors occur when running each cycle against a >> different >>> core if the cores share the same configset. >>> >>> Any ideas on what might be going wrong? >> >>