Re: Stopwords impact on search

2020-04-24 Thread Walter Underwood
IDF and stopword removal are different approaches to the same thing. Removing stopwords is a binary decision on how important common words are for search. It says some words are completely useless. IDF is a proportional measure on how important common words are for search. Instead of removing a

Re: stored=true what should I see from stem fields

2020-04-24 Thread Chris Hostetter
: Is what is shown in "analysis" the same as what is stored in a field? https://lucene.apache.org/solr/guide/8_5/analyzers.html The output of an Analyzer affects the terms indexed in a given field (and the terms used when parsing queries against those fields) but it has no impact on the

Re: Stopwords impact on search

2020-04-24 Thread Steven White
Hi everyone, I get it why and when if stopwords are note indexed is a bad idea and can give you 0 or incomplete results. But what about the quality of search result when stopwords are indexed vs. not indexed? 1) Stopwords are removed and I do word search, not phrase for "solr and lucene are so

Re: stored=true what should I see from stem fields

2020-04-24 Thread Shawn Heisey
On 4/24/2020 5:48 PM, matthew sporleder wrote: Is what is shown in "analysis" the same as what is stored in a field? The stored data (what you see in search results) is always exactly what was sent to Solr, modified by any update processors that are in use. The index (what you are actually

stored=true what should I see from stem fields

2020-04-24 Thread matthew sporleder
Is what is shown in "analysis" the same as what is stored in a field? I am confusing myself pretty thoroughly: I have some fields: And I have this: I run this through the analyzer for stuff_stems:

IdleTimeout setting in Jetty (Solr 7.7.1)

2020-04-24 Thread Kommu, Vinodh K.
Hi, Our clients are running streaming expressions on 140M docs collection which has relatively huge data however query is not completing through & timing out after 120secs (which is default timeout in jetty*.xml files). We had changed the default timeout from 120s to 300s which worked fine. To

Re: using S3 as the Directory for Solr

2020-04-24 Thread dhurandar S
Its 10 PB of source data, But we do have indexes on most of the attributes. 80% or so We have a need to support such large data and we have use cases of finding a needle in the haystack kinda scenario. Most of our users are used to Search query language or Solr in addition to SQL. So we would have

Re: Stopwords impact on search

2020-04-24 Thread Walter Underwood
I’m astonished that the default still has that. It was a bad idea in Solr 1.3, when it bit my ass. We help people with this about once a month and the advice is always the same. Imagine all the poor people who never ask about it and run with that default! wunder Walter Underwood

Re: Stopwords impact on search

2020-04-24 Thread Jan Høydahl
Turns out there is already a JIRA for this SOLR-10992 where both you and I commented already :) But it’s 3 years old... > 24. apr. 2020 kl. 16:34 skrev Erick Erickson : > > +1 to removing stopword filters. > >> On Apr 24, 2020, at 10:28 AM,

Re: Stopwords impact on search

2020-04-24 Thread Rohan Kasat
So do we use stopwords filter as part of query analyzer, to avoid highlighting of these stop words ? Regards, Rohan On Fri, Apr 24, 2020 at 7:45 AM Walter Underwood wrote: > Agreed. Here is an article from 13 years ago when I accidentally turned on > stopword removal at Netflix. It caused bad

Re: Stopwords impact on search

2020-04-24 Thread Walter Underwood
Agreed. Here is an article from 13 years ago when I accidentally turned on stopword removal at Netflix. It caused bad problems. https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/ Infoseek was not removing stopwords when I joined them in 1996. Since then, I’ve always left

Re: Stopwords impact on search

2020-04-24 Thread Erick Erickson
+1 to removing stopword filters. > On Apr 24, 2020, at 10:28 AM, Jan Høydahl wrote: > > I tend to agree. Should we simply remove the stopword filters from the > default configsets shipping with Solr? > > Jan > >> 24. apr. 2020 kl. 14:44 skrev David Hastings : >> >> you should never use the

Re: Stopwords impact on search

2020-04-24 Thread Jan Høydahl
I tend to agree. Should we simply remove the stopword filters from the default configsets shipping with Solr? Jan > 24. apr. 2020 kl. 14:44 skrev David Hastings : > > you should never use the stopword filter unless you have a very specific > purpose > > On Fri, Apr 24, 2020 at 8:33 AM Steven

Re: "Error creating core. No system property or default value specified for X"

2020-04-24 Thread Erick Erickson
Screenshots do not come through, the mail server pretty aggressively strips them. Ok, this is weird. There’s no way you should be getting results from an index that has only a write.lock file. So the fact that you were suggests that you are looking. at one thing and Solr is looking at another.

SPLITSHARD and Blockjoin in Solr 8.2

2020-04-24 Thread katjaz
I try to do SPLITSHARD on index with children documents (blockjoin). /admin/collections?action=SPLITSHARD=block=shard1=1000 and i miss the most of children documents after spliting. old shard: 3 360 836 parents 39 956 824 children new shard 1: 1 679 363 parents 4 995 371 children new shard 2:

"Error creating core. No system property or default value specified for X"

2020-04-24 Thread Teresa McMains
Some events as background. I have made some changes to schema.xml to define a new field type and have a few string fields use this field type. [cid:image001.png@01D61A1C.F1D06060] I reloaded the core. Upon suggestion from this group, rather than just re-indexing (because it might not refresh

Re: How to update dataImportHandler config in solr version 5.3

2020-04-24 Thread matthew sporleder
Are you 100% sure it is using solrcloud and that the config is not simply on the disk? On Fri, Apr 24, 2020 at 7:11 AM Lewin Joy (TMNA) wrote: > > ll PROTECTED 関係者外秘 > Hi, > > We have an old collection running on a very old solr version. 5.3 > Now, we have a need to update the url string inside

Re: Stopwords impact on search

2020-04-24 Thread David Hastings
you should never use the stopword filter unless you have a very specific purpose On Fri, Apr 24, 2020 at 8:33 AM Steven White wrote: > Hi everyone, > > What is, if any, the impact of stopwords in to my search ranking quality? > Will my ranking improve is I do not index stopwords? > > I'm trying

Stopwords impact on search

2020-04-24 Thread Steven White
Hi everyone, What is, if any, the impact of stopwords in to my search ranking quality? Will my ranking improve is I do not index stopwords? I'm trying to figure out if I should use the stopword filter or not. Thanks in advanced. Steve

How to update dataImportHandler config in solr version 5.3

2020-04-24 Thread Lewin Joy (TMNA)
ll PROTECTED 関係者外秘 Hi, We have an old collection running on a very old solr version. 5.3 Now, we have a need to update the url string inside db-data-config.xml for the DataImportHandler. Now, I see that this version does not support downconfig and upconfig as good as in current versions. I was

Re: How to Password Protect Apache Solr Server Admin Pages in Solr cloud mode

2020-04-24 Thread Jan Høydahl
Yes, please read the Reference Guide https://lucene.apache.org/solr/guide/8_5/overview-of-the-solr-admin-ui.html#login-screen PS: Note that the Admin UI is just static public files and does not need protection as such. But the Admin UI app in your browser will do HTTP requests to Solr APIs,

How to Password Protect Apache Solr Server Admin Pages in Solr cloud mode

2020-04-24 Thread Amy Bai
Hi community, I there any way to password protect apache solr server admin pages in solr cloud mode? Something like when I open the server admin pages, there will be a login page need user and password input. I learn that modify jetty.xml could work in standalone mode, how about in solr cloud

LTR - FieldValueFeature Question

2020-04-24 Thread Ashwin Ramesh
Hi everybody, Do we need to have 'indexed=true' to be able to retrieve the value of a field via FieldValueFeature or is having docValue=true enough? Currently, we have some dynamic fields as [dynamicField=true, stored=false, indexed=false, docValue=true]. However when we noticing that the value