Re: A question about solr filter cache

2020-02-17 Thread Hongxu Ma
@Vadim Ivanov Thank you! From: Vadim Ivanov Sent: Tuesday, February 18, 2020 15:27 To: solr-user@lucene.apache.org Subject: RE: A question about solr filter cache Hi! Yes, it may depends on Solr version Solr 8.3 Admin

RE: A question about solr filter cache

2020-02-17 Thread Vadim Ivanov
Hi! Yes, it may depends on Solr version Solr 8.3 Admin filterCache page stats looks like: stats: CACHE.searcher.filterCache.cleanupThread:false CACHE.searcher.filterCache.cumulative_evictions:0 CACHE.searcher.filterCache.cumulative_hitratio:0.94 CACHE.searcher.filterCache.cumulative_hits:198

Re: A question about solr filter cache

2020-02-17 Thread Hongxu Ma
@Erick Erickson and @Mikhail Khludnev got it, the explanation is very clear. Thank you for your help. From: Hongxu Ma Sent: Tuesday, February 18, 2020 10:22 To: Vadim Ivanov ; solr-user@lucene.apache.org Subject: Re: A question

Re: A question about solr filter cache

2020-02-17 Thread Hongxu Ma
Thank you @Vadim Ivanov I know that admin page, but I cannot find the memory usage of filter cache (only has "CACHE.searcher.filterCache.size", I think it's the used slot number of filtercache) There is my output (solr version 7.3.1): filterCache *

Best Practises around relevance tuning per query

2020-02-17 Thread Ashwin Ramesh
Hi, We are in the process of applying a scoring model to our search results. In particular, we would like to add scores for documents per query and user context. For example, we want to have a score from 500 to 1 for the top 500 documents for the query “dog” for users who speak US English. We

Re: Metadata info on Stored Fields

2020-02-17 Thread Edward Ribeiro
Sorry, my fault, I bypassed this excerpt of yours: " do I get the file name included in each snippet fragment - this again needs exploring on my end". No, the solution I proposed doesn't address that. :( Edward Em seg, 17 de fev de 2020 14:03, Srijan escreveu: > You know what, I think I

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread Walter Underwood
Make phrases into single tokens at indexing and query time. Let the engine do the rest of the work. For example, “subunits of the army” can become “subunitsofthearmy” or “subunits_of_the_army”. We used patterns to choose phrases, so “word word”, “word glue word”, or “word glue glue word” could

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread David Hastings
interesting, i cant seem to find anything on Phrase IDF, dont suppose you have a link or two i could look at by chance? On Mon, Feb 17, 2020 at 1:48 PM Walter Underwood wrote: > At Infoseek, we used “glue words” to build phrase tokens. It was really > effective. > Phrase IDF is powerful stuff.

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread Walter Underwood
At Infoseek, we used “glue words” to build phrase tokens. It was really effective. Phrase IDF is powerful stuff. Luckily for you, the patent on that has expired. :-) wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 17, 2020, at 10:46 AM, David

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread David Hastings
i use stop words for building shingles into "interesting phrases" for my machine teacher/students, so i wouldnt say theres no reason, however my use case is very specific. Otherwise yeah, theyre gone for all practical reasons/search scenarios. On Mon, Feb 17, 2020 at 1:41 PM Walter Underwood

Re: Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread Walter Underwood
Why are you using stopwords? I would need a really, really good reason to use those. Stopwords are an obsolete technique from 16-bit processors. I’ve never used them and I’ve been a search engineer since 1997. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my

Re: Metadata info on Stored Fields

2020-02-17 Thread Srijan
You know what, I think I missed a major description in my earlier email. I want to be able to return additional data from stored fields alongside the snippets during highlighting. In this case, the filename where this snippet came from. Not sure your approach would address that. On Mon, Feb 17,

Re: Metadata info on Stored Fields

2020-02-17 Thread Edward Ribeiro
Hi, You may try to create two kinds of docs forming a parent-child relationship without nesting. Like 894 parent ... 3213 child 894 xxx portion of file 1 remaining portion of file 1 ... Then you can add metadata for each child doc. The search can be done on child docs but if you need to

Re-creating deleted Managed Stopwords lists results in error

2020-02-17 Thread Thomas Corthals
Hi I've run into an issue with creating a Managed Stopwords list that has the same name as a previously deleted list. Going through the same flow with Managed Synonyms doesn't result in this unexpected behaviour. Am I missing something or did I discover a bug in Solr? On a newly started solr

Metadata info on Stored Fields

2020-02-17 Thread Srijan
Hi, I have a data model where the operational "Object" can have one or more files attached. Indexing these objects in Solr means indexing all metadata info and the contents of the files. For file contents what I have right now is a single multi-valued field (for each locale) Example: xxx yyy

Re: A question about solr filter cache

2020-02-17 Thread Erick Erickson
That’s the upper limit of a filter cache entry (maxDoc/8). For low numbers of hits, more space-efficient structures are used. Specifically a list of doc IDs is kept. So say you have an fq clause that marks 10 doc. The filterCache entry is closer to 40 bytes + sizeof(query object) etc. Still,

RE: A question about solr filter cache

2020-02-17 Thread Vadim Ivanov
You can easily check amount of RAM used by core filterCache in Admin UI: Choose core - Plugins/Stats - Cache - filterCache It shows useful information on configuration, statistics and current RAM usage by filter cache, as well as some examples of current filtercaches in RAM Core, for ex, with 10

Re: A question about solr filter cache

2020-02-17 Thread Mikhail Khludnev
Hello, The former https://github.com/apache/lucene-solr/blob/188f620208012ba1d726b743c5934abf01988d57/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L84 More efficient sets (roaring and/or elias-fano, iirc) present in Lucene, but not yet being used in Solr. On Mon, Feb 17, 2020 at

Re: A question about solr filter cache

2020-02-17 Thread Nicolas Franck
If 1GB would make solr go out of memory by using a filter query cache, then it would have already happened during the initial upload of the solr documents. Imagine the amount of memory you need for one billion documents.. A filter cache would be the least of your problems. 1GB is small in

A question about solr filter cache

2020-02-17 Thread Hongxu Ma
Hi I want to know the internal of solr filter cache, especially its memory usage. I googled some pages: https://teaspoon-consulting.com/articles/solr-cache-tuning.html https://lucene.472066.n3.nabble.com/Solr-Filter-Cache-Size-td4120912.html (Erick Erickson's answer) All of them said its