RE: Solr search engine configuration

2018-03-13 Thread PeterKerk
Thanks, will look into all that :-) -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

RE: Solr search engine configuration

2018-03-13 Thread Markus Jelsma
olled boosting mechanism but it should work more or less. We use payloads everywhere for fine controlled scoring, but that involves a lot of code. Cheers, Markus -Original message- > From:PeterKerk > Sent: Tuesday 13th March 2018 21:35 > To: solr-user@lucene.apache.org > Subject

RE: Solr search engine configuration

2018-03-13 Thread PeterKerk
Cool, will do some more digging around in the analysis GUI first. One last thing then on this comment of yours: "Does the decompounder support emitting the compound word as well? If so, enable it. It should help scoring compounds higher via IDF as they are less common." So I checked the Javadoc:

RE: Solr search engine configuration

2018-03-13 Thread Markus Jelsma
Inline, cheers. -Original message- > From:PeterKerk > Sent: Tuesday 13th March 2018 18:53 > To: solr-user@lucene.apache.org > Subject: RE: Solr search engine configuration > > You must stay in the Javadoc section, there the examples are good, or the > refe

RE: Solr search engine configuration

2018-03-13 Thread PeterKerk
You must stay in the Javadoc section, there the examples are good, or the reference guide: https://lucene.apache.org/core/6_5_0/analyzers-common/org/apache/lucene/analysis/compound/HyphenationCompoundWordTokenFilterFactory.html https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#filt

Re: Solr search engine configuration

2018-03-13 Thread Shawn Heisey
On 3/13/2018 7:24 AM, PeterKerk wrote: PVK COMMENT: But without a Stopfilter, wont stopwords be included in searches? I though that for example Google excluded these words in their algorithms? I just did a google search for "to be or not to be".  It worked flawlessly. If Google were using stop

RE: Solr search engine configuration

2018-03-13 Thread Markus Jelsma
-Original message- > From:PeterKerk > Sent: Tuesday 13th March 2018 14:24 > To: solr-user@lucene.apache.org > Subject: RE: Solr search engine configuration > > Markus, > > Thanks again. Ok, 1 by 1: > > StemmerOverride wants \t separated fields, tha

RE: Solr search engine configuration

2018-03-13 Thread PeterKerk
Markus, Thanks again. Ok, 1 by 1: StemmerOverride wants \t separated fields, that is probably the cause of the AIooBE you get. Regarding schema definitions, each factory JavaDoc [1] has a proper example listed. I recommend putting a decompounder before a stemmer, and have an accent (or ICU) fold

Re: Solr search engine configuration

2018-03-12 Thread Shawn Heisey
On 3/12/2018 4:15 PM, PeterKerk wrote: > I trimmed stemdict_nl.txt for testing to just this: > > aachenaach > aachener aachener According to the example here: https://github.com/apache/lucene-solr/blob/master/solr/core/src/test-files/solr/collection1/c

RE: Solr search engine configuration

2018-03-12 Thread Markus Jelsma
.html -Original message- > From:PeterKerk > Sent: Monday 12th March 2018 23:16 > To: solr-user@lucene.apache.org > Subject: RE: Solr search engine configuration > > @Erick: thank you for clarifying! > > @Markus: > I feel like I'm not (or at least shoul

RE: Solr search engine configuration

2018-03-12 Thread PeterKerk
@Erick: thank you for clarifying! @Markus: I feel like I'm not (or at least should not be :-)) the first person to run into these challenges. "You can solve this by adding manual rules to StemmerOverrideFilter, but due to the compound nature of words, you would need to add it for all the mills"

Re: Solr search engine configuration

2018-03-12 Thread Erick Erickson
gt; terms, and may not always work with compounds of which the head is a plural, > just like dierenzaak, of scholierenkorting. > > Also add a AccentFoldingFilter, or ICUNormalizer to get rid of accents, or > you may have trouble finding a café. > > Regards, > Markus > >

RE: Solr search engine configuration

2018-03-12 Thread Markus Jelsma
ICUNormalizer to get rid of accents, or you may have trouble finding a café. Regards, Markus -Original message- > From:PeterKerk > Sent: Sunday 11th March 2018 23:55 > To: solr-user@lucene.apache.org > Subject: Re: Solr search engine configuration > > Sorry for thi

Re: Solr search engine configuration

2018-03-11 Thread PeterKerk
Sorry for this lengthy post, but I wanted to be complete. The only occurence of edismax in solrconfig.xml is this one: edismax explicit 10

Re: Solr search engine configuration

2018-03-11 Thread Erick Erickson
bq: I tried the query with and without the &defType=edismax parameter but I'm getting the EXACT same results. Does that mean some configuration error? Well, not an error at all, this line: ExtendedDismaxQParser Means you're using edismax. If that happens both with or without &defType, that means

Re: Solr search engine configuration

2018-03-11 Thread PeterKerk
Thanks! That provides me with some more insight, I altered the search query to "dieren zaak" to see how queries consisting of more than 1 word are handled. I see that words are tokenized into groups of 3, I think because of my NGramFilterFactory with minGramSize of 3. (title_sear

Re: Solr search engine configuration

2018-03-10 Thread Erick Erickson
You're mixing two different parsers I think. If you're using edismax (either specify defType=edismax on your query or set it up as the default for, say, the "/select" handler in solrcofnig.xml). The "qf" parameter only is relevant if you _are_ using edismax. If you wan to use edismax your query co