Position search

2019-10-14 Thread Kaminski, Adi
Hi, What's the recommended way to search in Solr (assuming 8.2 is used) for specific terms/phrases/expressions while limiting the search from position perspective. For example to search only in the first/last 100 words of the document ? Is there any built-in functionality for that ? Thanks in a

Facet Advice

2019-10-14 Thread Moyer, Brett
Hello, looking for some advice, I have the suspicion we are doing Facets all wrong. We host financial information and recently "tagged" our pages with appropriate Facets. We have built a Flat design. Are we going at it the wrong way? In Solr we have a "Tags" field, based on some magic we tagged

Lemmatizer for indexing

2019-10-14 Thread Shamik Bandopadhyay
Hi, I'm trying to use a lemmatized in my analysis chain. Just wondering what is the recommended way of achieving this. I've come across few different implementation which are listed below; Open NLP --> https://lucene.apache.org/solr/guide/7_5/language-analysis.html#opennlp-lemmatizer-filter htt

solr 8.1.1 many time slower returning query results than solr 4.10.4 or solr 6.5.1

2019-10-14 Thread Russell Bahr
Hello, I am sorry in advance as this will be a lengthy email as I will try to provide proper details. We currently have 2 solr cloud deployments and we are hoping to upgrade to solr 8.x from these but are running into severe performance problems with solr 8.1.1. I am hoping for some guidance in

Re: Solr 7.6 frequent OOM with Java 9, G1 and large heap sizes - any tests with Java 13 and the new ZGC?

2019-10-14 Thread Shawn Heisey
On 10/14/2019 7:18 AM, Vassil Velichkov (Sensika) wrote: After the migration from 6.x to 7.6 we kept the default GC for a couple of weeks, than we've started experimenting with G1 and we've managed to achieve less frequent OOM crashes, but not by much. Changing your GC settings will never pre

RE: Solr 7.6 frequent OOM with Java 9, G1 and large heap sizes - any tests with Java 13 and the new ZGC?

2019-10-14 Thread Vassil Velichkov (Sensika)
Hi Shawn, My answers are in-line below... Cheers, Vassil -Original Message- From: Shawn Heisey Sent: Monday, October 14, 2019 3:56 PM To: solr-user@lucene.apache.org Subject: Re: Solr 7.6 frequent OOM with Java 9, G1 and large heap sizes - any tests with Java 13 and the new ZGC? On 1

Re: Solr 7.6 frequent OOM with Java 9, G1 and large heap sizes - any tests with Java 13 and the new ZGC?

2019-10-14 Thread Shawn Heisey
On 10/14/2019 6:18 AM, Vassil Velichkov (Sensika) wrote: We have 1 x Replica with 1 x Solr Core per JVM and each JVM runs in a separate VMware VM. We have 32 x JVMs/VMs in total, containing between 50M to 180M documents per replica/core/JVM. With 180 million documents, each filterCache entry

RE: Solr 7.6 frequent OOM with Java 9, G1 and large heap sizes - any tests with Java 13 and the new ZGC?

2019-10-14 Thread Vassil Velichkov (Sensika)
Hi Erick, We have 1 x Replica with 1 x Solr Core per JVM and each JVM runs in a separate VMware VM. We have 32 x JVMs/VMs in total, containing between 50M to 180M documents per replica/core/JVM. In our case most filterCache entities (maxDoc/8 + overhead) are typically more than 16MB, which is m

Re: Solr 7.6 frequent OOM with Java 9, G1 and large heap sizes - any tests with Java 13 and the new ZGC?

2019-10-14 Thread Erick Erickson
The filterCache isn’t a single huge allocation, it’s made up of _size_ entries, each individual entry shouldn’t be that big, each entry should cap around maxDoc/8 bytes + some overhead. I just scanned the e-mail, I’m not clear how many _replicas_ per JVM you have, nor how many JVMs per server y

Solr 7.6 frequent OOM with Java 9, G1 and large heap sizes - any tests with Java 13 and the new ZGC?

2019-10-14 Thread Vassil Velichkov (DGM)
Hi Everyone, Since we’ve upgraded our cluster (legacy sharding) from Solr 6.x to Solr 7.6 we have frequent OOM crashes on specific nodes. All investigations (detailed below) lead to a hard-coded limitation in the G1 garbage collector and the Java Heap is exhausted due to too many filterCache a

RE: Solr 7.6 frequent OOM with Java 9, G1 and large heap sizes - any tests with Java 13 and the new ZGC?

2019-10-14 Thread Vassil Velichkov (Sensika)
Thanks Jörn, Yep, we are rebalancing the cluster to keep up to ~100M documents per shard, but that's not quite optimal in our use-case. We've tried with various ratios between JVM Heap / OS RAM (up to 128GB / 256GB) and we have the same Java Heap OOM crashes. For example, a BitSet of 160M docum

Re: Solr 7.6 frequent OOM with Java 9, G1 and large heap sizes - any tests with Java 13 and the new ZGC?

2019-10-14 Thread Jörn Franke
I would try JDK11 - it works much better than JDK9 in general. I don‘t think JDK13 with ZGC will bring you better results. There seems to be sth strange with the JDk version or Solr version and some settings. Then , make sure that you have much more free memory for the os cache than the heap.

RE: Using Tesseract OCR to extract PDF files in EML file attachment

2019-10-14 Thread Retro
Hello, thanks for answer, but let me explain the setup. We are running our own backup solution for emails (messages from Exchange in MSG format). Content of these messages then indexed in SOLR. But SOLR can not process attachments within those MSG files, can not OCR them. This is what I need - to O

Solr 7.6 frequent OOM with Java 9, G1 and large heap sizes - any tests with Java 13 and the new ZGC?

2019-10-14 Thread Vassil Velichkov (Sensika)
Hi Everyone, Since we’ve upgraded our cluster (legacy sharding) from Solr 6.x to Solr 7.6 we have frequent OOM crashes on specific nodes. All investigations (detailed below) lead to a hard-coded limitation in the G1 garbage collector. The Java Heap is exhausted due to too many filterCache allo