interesting, i cant seem to find anything on Phrase IDF, dont suppose you have a link or two i could look at by chance?
On Mon, Feb 17, 2020 at 1:48 PM Walter Underwood <wun...@wunderwood.org> wrote: > At Infoseek, we used “glue words” to build phrase tokens. It was really > effective. > Phrase IDF is powerful stuff. > > Luckily for you, the patent on that has expired. :-) > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Feb 17, 2020, at 10:46 AM, David Hastings < > hastings.recurs...@gmail.com> wrote: > > > > i use stop words for building shingles into "interesting phrases" for my > > machine teacher/students, so i wouldnt say theres no reason, however my > use > > case is very specific. Otherwise yeah, theyre gone for all practical > > reasons/search scenarios. > > > > On Mon, Feb 17, 2020 at 1:41 PM Walter Underwood <wun...@wunderwood.org> > > wrote: > > > >> Why are you using stopwords? I would need a really, really good reason > to > >> use those. > >> > >> Stopwords are an obsolete technique from 16-bit processors. I’ve never > >> used them and > >> I’ve been a search engineer since 1997. > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org > >> http://observer.wunderwood.org/ (my blog) > >> > >>> On Feb 17, 2020, at 7:31 AM, Thomas Corthals <tho...@klascement.net> > >> wrote: > >>> > >>> Hi > >>> > >>> I've run into an issue with creating a Managed Stopwords list that has > >> the > >>> same name as a previously deleted list. Going through the same flow > with > >>> Managed Synonyms doesn't result in this unexpected behaviour. Am I > >> missing > >>> something or did I discover a bug in Solr? > >>> > >>> On a newly started solr with the techproducts core: > >>> > >>> curl -X PUT -H 'Content-type:application/json' --data-binary > >>> > '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' > >>> > >> > http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist > >>> curl -X DELETE > >>> > >> > http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist > >>> curl > >> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts > >>> curl -X PUT -H 'Content-type:application/json' --data-binary > >>> > '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' > >>> > >> > http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist > >>> > >>> The second PUT request results in a status 500 with error > >>> msg "java.util.LinkedHashMap cannot be cast to java.util.List". > >>> > >>> Similar requests for synonyms work fine, no matter how many times I > >> repeat > >>> the CREATE/DELETE/RELOAD cycle: > >>> > >>> curl -X PUT -H 'Content-type:application/json' --data-binary > >>> > >> > '{"class":"org.apache.solr.rest.schema.analysis.ManagedSynonymGraphFilterFactory$SynonymManager"}' > >>> > http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap > >>> curl -X DELETE > >>> > http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap > >>> curl > >> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts > >>> curl -X PUT -H 'Content-type:application/json' --data-binary > >>> > >> > '{"class":"org.apache.solr.rest.schema.analysis.ManagedSynonymGraphFilterFactory$SynonymManager"}' > >>> > http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap > >>> > >>> Reloading after creating the Stopwords list but not after deleting it > >> works > >>> without error too on a fresh techproducts core (you'll have to remove > the > >>> directory from disk and create the core again after running the > previous > >>> commands). > >>> > >>> curl -X PUT -H 'Content-type:application/json' --data-binary > >>> > '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' > >>> > >> > http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist > >>> curl > >> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts > >>> curl -X DELETE > >>> > >> > http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist > >>> curl -X PUT -H 'Content-type:application/json' --data-binary > >>> > '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' > >>> > >> > http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist > >>> > >>> And even curiouser, when doing a CREATE/DELETE for Stopwords, then a > >>> CREATE/DELETE for Synonyms, and only then a RELOAD of the core, the > cycle > >>> can be completed twice. (Again, on a freshly created techproducts > core.) > >>> Only the third attempt to create a list results in an error. Synonyms > can > >>> still be created and deleted repeatedly after this. > >>> > >>> curl -X PUT -H 'Content-type:application/json' --data-binary > >>> > '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' > >>> > >> > http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist > >>> curl -X DELETE > >>> > >> > http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist > >>> curl -X PUT -H 'Content-type:application/json' --data-binary > >>> > >> > '{"class":"org.apache.solr.rest.schema.analysis.ManagedSynonymGraphFilterFactory$SynonymManager"}' > >>> > http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap > >>> curl -X DELETE > >>> > http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap > >>> curl > >> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts > >>> curl -X PUT -H 'Content-type:application/json' --data-binary > >>> > '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' > >>> > >> > http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist > >>> curl -X DELETE > >>> > >> > http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist > >>> curl -X PUT -H 'Content-type:application/json' --data-binary > >>> > >> > '{"class":"org.apache.solr.rest.schema.analysis.ManagedSynonymGraphFilterFactory$SynonymManager"}' > >>> > http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap > >>> curl -X DELETE > >>> > http://localhost:8983/solr/techproducts/schema/analysis/synonyms/testmap > >>> curl > >> http://localhost:8983/solr/admin/cores?action=RELOAD\&core=techproducts > >>> curl -X PUT -H 'Content-type:application/json' --data-binary > >>> > '{"class":"org.apache.solr.rest.schema.analysis.ManagedWordSetResource"}' > >>> > >> > http://localhost:8983/solr/techproducts/schema/analysis/stopwords/testlist > >>> > >>> The same successes/errors occur when running each cycle against a > >> different > >>> core if the cores share the same configset. > >>> > >>> Any ideas on what might be going wrong? > >> > >> > >