Re: Handling Locales in Solr

2021-02-24 Thread Markus Jelsma
Hello, We put all our customers in the same core/collection because of this, it is not practical to manage hundreds of cores, including their small overhead. Although it can be advantageous when it comes to relevance tuning, no skewed statistics because of other customers. In your case, an

Re: Overriding Sort and boosting some docs to the top

2021-02-24 Thread Markus Jelsma
rformant? > > Note:- Assume that I have hundreds of ids to boost like this. > Is there a difference to the answer if docs to be boosted after the sort is > less? > > Thanks! > Mark > > On Wed, Feb 24, 2021 at 4:41 PM Markus Jelsma > wrote: > > > Hello, > >

Re: Overriding Sort and boosting some docs to the top

2021-02-24 Thread Markus Jelsma
Hello, You are probably looking for the elevator component, check it out: https://lucene.apache.org/solr/guide/8_8/the-query-elevation-component.html Regards, Markus Op wo 24 feb. 2021 om 11:59 schreef Mark Robinson : > Hi, > > I wanted to sort and then boost some docs to the top and these

Re: Using multiple language stop words in Solr Core

2021-02-11 Thread Markus Jelsma
Hell Abhay, Do not enable stopwords unless you absolutely know what you are doing. In general, it is a bad practice that somehow still lingers on. But to answer the question, you must have one field and fieldType for each language, so language specific filters go there. Also, using edismax and

Re: Excessive logging 8.8.0

2021-02-05 Thread Markus Jelsma
t this is a mistake... > > https://issues.apache.org/jira/browse/SOLR-15136 > > > : Date: Thu, 4 Feb 2021 12:45:16 +0100 > : From: Markus Jelsma > : Reply-To: solr-user@lucene.apache.org > : To: solr-user@lucene.apache.org > : Subject: Excessive logging 8.8.0 &g

Excessive logging 8.8.0

2021-02-04 Thread Markus Jelsma
Hello all, We upgraded some nodes to 8.8.0 and notice there is excessive logging on INFO when some traffic/indexing is going on: 2021-02-04 11:42:48.535 INFO (qtp261748192-268) [c:data s:shard2 r:core_node4 x:data_shard2_replica_t2] o.a.s.c.c.ZkStateReader already watching , added to s

Re: different score from different replica of same shard

2021-01-13 Thread Markus Jelsma
the same. > > Regards, > Bernd > > > Am 13.01.21 um 14:54 schrieb Markus Jelsma: > > Hello Bernd, > > > > This is normal for NRT replicas, because the way segments are merged and > > deletes are removed is not synchronized between replicas. In that case > >

Re: different score from different replica of same shard

2021-01-13 Thread Markus Jelsma
Hello Bernd, This is normal for NRT replicas, because the way segments are merged and deletes are removed is not synchronized between replicas. In that case counts for TF and IDF and norms become slightly different. You can either use ExactStatsCache that fetches counts for terms before scoring,

Re: Monitoring Solr for currently running queries

2020-12-29 Thread Markus Jelsma
Hello Ufuk, You can log slow queries [1]. If you would want to see currently running queries you would have to extend SearchHandler and build the custom logic yourself. Watch out for SolrCloud because the main query as well as the per-shard queries can pass through that same SearchHandler. You

RE: Performance issues with CursorMark

2020-10-26 Thread Markus Jelsma
; Sent: Monday 26th October 2020 17:00 > To: solr-user@lucene.apache.org > Subject: Re: Performance issues with CursorMark > > Hey Markus, > > What are you sorting on? Do you have docValues enabled on the sort field ? > > On Mon, Oct 26, 2020 at 5:36 AM Markus Jelsma > wro

Performance issues with CursorMark

2020-10-26 Thread Markus Jelsma
Hello, We have been using a simple Python tool for a long time that eases movement of data between Solr collections, it uses CursorMark to fetch small or large pieces of data. Recently it stopped working when moving data from a production collection to my local machine for testing, the Solr

RE: advice on whether to use stopwords for use case

2020-10-01 Thread Markus Jelsma
Well, when not splitting on whitespace you can the CharFilter for regex replacements [1] to clear the entire search string if anywhere in the string a banned word is found: .*(cigarette|tobacco).* [1]

RE: Trailing space issue with indexed data.

2020-08-18 Thread Markus Jelsma
Hello, You can use TrimFieldUpdateProcessorFactory [1] in your URP chain to remove leading or trailing whitespace when indexing. Regards, Markus [1] https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html -Original

RE: Drop bad document in update batch

2020-08-18 Thread Markus Jelsma
Subject: Re: Drop bad document in update batch > > I think you’re looking for TolerantUpdateProcessor(Factory), added in > SOLR-445. It hung around for a LOGGG time and didn’t actually get > added until 6.1. > > > On Aug 18, 2020, at 12:51 PM, Markus J

Drop bad document in update batch

2020-08-18 Thread Markus Jelsma
Hello, Normally, if a single document is bad, the whole indexing batch is dropped. I think i remember there was an URP(?) that discards bad documents from the batch, but i cannot find it in the manual [1]. Is it possible or am i starting to imagine things? Thanks, Markus [1]

RE: Manipulating client's query using a Query object

2020-08-17 Thread Markus Jelsma
> (Or should we be using this extended ExtendedDisMaxQParser class server > side in Solr?) > > Kind regards, > > Edd > > ---- > Edward Turner > > > On Mon, 17 Aug 2020 at 15:06, Markus Jelsma > wrote: > > > Hello Edward, > > > &

RE: Manipulating client's query using a Query object

2020-08-17 Thread Markus Jelsma
Hello Edward, Yes you can by extending ExtendedDismaxQParser [1] and override its parse() method. You get the main Query object through super.parse(). If you need even more fine grained control on how Query objects are created you can extend ExtendedSolrQueryParser's [2] (inner class)

RE: eDismax query syntax question

2020-06-13 Thread Markus Jelsma
Hello, These are special characters, if you don't need them, you must escape them. See top of the article: https://lucene.apache.org/solr/guide/8_5/the-extended-dismax-query-parser.html Markus -Original message- > From:Webster Homer > Sent: Friday 12th June 2020 22:09 > To:

RE: Building a web based search engine

2020-06-02 Thread Markus Jelsma
really looked at this issue yet, but it would be nice to have an example of > this. Search for a SolrJ tutorial, they are plentiful. Also check out Solr's own extensive manual, everything you need is there. > Jim > > > > On Tue, Jun 2, 2020 at 12:12 PM Markus Jelsma > w

RE: Building a web based search engine

2020-06-02 Thread Markus Jelsma
Hello, We have been building precisely that for over ten years now. The '10,000 foot level overview' is basically: * forget about Lucene for now, Solr uses it under the hood; * get Solr, and start it with the schema.xml file that comes with Nutch; * get Nutch, give it a set of domains or hosts

RE: 8.5.1 LogReplayer extremely slow

2020-05-12 Thread Markus Jelsma
I found the bastard, it was a freaky document that skrewed Solr over, indexing kept failing, passing documents between replica's times out, documents get reindexed and so the document (and others) end up in the transaction log (many times) and are eligible for reindexing. Reindexing and

8.5.1 LogReplayer extremely slow

2020-05-11 Thread Markus Jelsma
Hello, Our main Solr text search collection broke down last night (search was still working fine), every indexing action timed out with the Solr master spending most of its time in Java regex. One shard has only one replica left for queries and it stays like that. I have copied both shard's

RE: Indexing Korean

2020-05-01 Thread Markus Jelsma
Hello, Although it is not mentioned in Solr's language analysis page in the manual, Lucene has had support for Korean for quite a while now. https://lucene.apache.org/core/8_5_0/analyzers-nori/index.html Regards, Markus -Original message- > From:Audrey Lorberfeld -

RE: heavy reads from disk when off-heap ram is constrained

2020-02-27 Thread Markus Jelsma
Hello Kyle, This is actually the manual [1] clearly warns for. Snippet copied from the manual: "When setting the maximum heap size, be careful not to let the JVM consume all available physical memory. If the JVM process space grows too large, the operating system will start swapping it, which

RE: Repeatable search term bug in Solr 8?

2020-02-27 Thread Markus Jelsma
Hello Phil, Solr never returns "The website encountered an unexpected error. Please try again later." as an error. To get to the root of the problem, you should at least post error logs that Solr actually throws, if it does at all. You either have an application error, or an actual Solr

Solr 8.x Startup problems when ZK is partially unavailable

2020-01-10 Thread Markus Jelsma
Hello, I have multiple collections, one 7.5.0 and the rest is on 8.3.1. They all share the same ZK ensemble and have the same ZK connection string. The first ZK address in the connection string is one that is not reachable, it seems firewalled, the rest is accessible. The 7.5.0 nodes do not

PreAnalyzedFieldUpdateProcessor issues in Solrcloud

2019-12-20 Thread Markus Jelsma
Hello, We are moving our text analysis to outside of Solr and use PreAnalyzedField to speed up indexing. We also use MLT, but these two don't work together, there is no way for MLT to properly analyze a document using the PreAnalyzedField's analyzer, and it does not pass the code in the MLT

RE: Position search

2019-10-15 Thread Markus Jelsma
that approach work for the other use case of searching from end of > documents ? > For example if I need to perform some term search from the end, e.g. "book" > in the last 30 or 100 words. > > Is there SpanLastQuery ? > > Thanks, > Adi > > -Original Me

RE: Position search

2019-10-15 Thread Markus Jelsma
Hello Adi, Try SpanFirstQuery. It limits the search to within the Nth term in the field. Regards, Markus -Original message- > From:Kaminski, Adi > Sent: Tuesday 15th October 2019 8:25 > To: solr-user@lucene.apache.org > Subject: Position search > > Hi, > What's the recommended way

RE: Custom update processor not kicking in

2019-09-18 Thread Markus Jelsma
Hello Rahul, I don't know why you don't see your logs lines, but if i remember correctly, you must put all custom processors above Log, Distributed and Run, at least i remember i read it somewhere a long time ago. We put all our custom processors on top of the three default processors and

RE: SolrClient from inside processAdd function

2019-09-05 Thread Markus Jelsma
Is there any way to get the information about the current Solr endpoint > from within the custom URP? > > On Wed, Sep 4, 2019 at 3:10 PM Markus Jelsma > wrote: > > > Hello Arnold, > > > > Yes, we do this too for several cases. > > > > You can create the So

RE: SolrClient from inside processAdd function

2019-09-04 Thread Markus Jelsma
Hello Arnold, Yes, we do this too for several cases. You can create the SolrClient in the Factory's inform() method, and pass is to the URP when it is created. You must implement SolrCoreAware and close the client when the core closes as well. Use a CloseHook for this. If you do not close the

RE: 8.2.0 After changing replica types, state.json is wrong and replication no longer takes place

2019-08-23 Thread Markus Jelsma
asn't caused any issues. > > I'll make a note to check state.json next time we encounter the > situation to see if I can see what you reported. > > Regards, > Ere > > Markus Jelsma kirjoitti 22.8.2019 klo 16.36: > > Hello, > > > > There is a newly created

8.2.0 After changing replica types, state.json is wrong and replication no longer takes place

2019-08-22 Thread Markus Jelsma
Hello, There is a newly created 8.2.0 all NRT type cluster for which i replaced each NRT replica with a TLOG type replica. Now, the replicas no longer replicate when the leader receives data. The situation is odd, because some shard replicas kept replicating up until eight hours ago, another

StackOverflowError leader election on 8.2.0

2019-08-21 Thread Markus Jelsma
Hello, Looking this up i found SOLR-5692, but that was solved a lifetime ago, so just checking if this is a familiar error and one i missing in Jira: A client's Solr 8.2.0 cluster brought us the next StackOverflowError while running 8.2.0 on Java 8: Exception in thread

RE: Solr 8 getZkStateReader throwing AlreadyClosedException

2019-07-01 Thread Markus Jelsma
Opened SOLR-13591. https://issues.apache.org/jira/browse/SOLR-13591 -Original message- > From:Markus Jelsma > Sent: Thursday 27th June 2019 13:20 > To: solr-user@lucene.apache.org; solr-user > Subject: RE: Solr 8 getZkStateReader throwing AlreadyClosedException > > This was 8.1.1

RE: refused connection

2019-06-28 Thread Markus Jelsma
Hello, If you get a Connection Refused, then normally the server is just offline. But, something weird is hiding in your stack trace, you should check it out further: > Caused by: java.net.ConnectException: Cannot assign requested address > (connect failed) I have not seen this before.

RE: Solr 8 getZkStateReader throwing AlreadyClosedException

2019-06-27 Thread Markus Jelsma
This was 8.1.1 to be precise. Sorry! -Original message- > From:Markus Jelsma > Sent: Thursday 27th June 2019 13:19 > To: solr-user > Subject: Solr 8 getZkStateReader throwing AlreadyClosedException > > Hello, > > We had two different SolrClients failing on different collections

Solr 8 getZkStateReader throwing AlreadyClosedException

2019-06-27 Thread Markus Jelsma
Hello, We had two different SolrClients failing on different collections and machines just around the same time. After restarting everything was just fine again. The following exception was thrown: 2019-06-27 11:04:28.117 ERROR (qtp203849460-13532) [c:_shard1_replica_t15]

RE: Increased disk space usage 8.1.1 vs 7.7.1

2019-06-13 Thread Markus Jelsma
sey > Sent: Thursday 13th June 2019 13:42 > To: solr-user@lucene.apache.org > Subject: Re: Increased disk space usage 8.1.1 vs 7.7.1 > > On 6/13/2019 4:19 AM, Markus Jelsma wrote: > > We are upgrading to Solr 8. One of our reindexed collections takes a GB > > more than the pro

RE: Different facet count between 7.7.1 and 8.1.1

2019-06-13 Thread Markus Jelsma
gt; an "optimize" change anything? Is this DocValues strings? > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 12. jun. 2019 kl. 23:49 skrev Markus Jelsma : > > > > Hello again, > > > > We found

Increased disk space usage 8.1.1 vs 7.7.1

2019-06-13 Thread Markus Jelsma
Hello, We are upgrading to Solr 8. One of our reindexed collections takes a GB more than the production uses which is on 7.7.1. Production also has deleted documents. This means Solr 8 somehow uses more disk space. I have checked both Solr and Lucene's CHANGES but no ticket was immediately

CursorMark, batch size/speed

2019-06-12 Thread Markus Jelsma
Hello, One of our collections hates CursorMark, it really does. When under very heavy load the nodes can occasionally consume GBs additional heap for no clear reason immediately after downloading the entire corpus. Although the additional heap consumption is a separate problem that i hope

Different facet count between 7.7.1 and 8.1.1

2019-06-12 Thread Markus Jelsma
Hello again, We found another oddity when upgrading to Solr 8. For a *:* query, the facet counts for a simple string field do not match at all between these versions. Solr 7.7.1 gives less or zero counts where as for 8 we see the correct counts. So something seems fixed for a bug that i was

RE: Solr Heap Usage

2019-06-07 Thread Markus Jelsma
Hello, We use VisualVM for making observations. But use Eclipse MAT for in-depth analysis, usually only when there is a suspected memory leak. Regards, Markus -Original message- > From:John Davis > Sent: Friday 7th June 2019 20:30 > To: solr-user@lucene.apache.org > Subject: Re:

RE: Solr 8.1.1, JMX and VisualVM

2019-05-30 Thread Markus Jelsma
ct: Re: Solr 8.1.1, JMX and VisualVM > > Hi, > > This has to do with the new JVM flags that optimise performance, they were > added roughly at the same time when Solr switched to G1GC. > > In ‘bin/solr’ please comment out this flag: '-XX:+PerfDisableSharedMem'. > > &g

RE: Query of Death Lucene/Solr 7.6

2019-05-30 Thread Markus Jelsma
22, 2019 at 11:00 AM Gregg Donovan wrote: > > > FWIW: we have also seen serious Query of Death issues after our upgrade to > > Solr 7.6. Are there any open issues we can watch? Is Markus' findings > > around `pf` our best guess? We've seen these issues even with ps=0. We also &

RE: Solr 8.1.1, JMX and VisualVM

2019-05-30 Thread Markus Jelsma
Hello, Slight correction, SolrCLI does become visible in the local applications view. I just missed it before. Thanks, Markus -Original message- > From:Markus Jelsma > Sent: Thursday 30th May 2019 14:47 > To: solr-user > Subject: Solr 8.1.1, JMX and VisualVM > > Hello, > > While

Solr 8.1.1, JMX and VisualVM

2019-05-30 Thread Markus Jelsma
Hello, While upgrading from 7.7 to 8.1.1, i noticed start.jar and SolrCLI no longer pop up in the local applications view of VisualVM! I CTRL-F'ed my way through the changelog for Solr 8.0.0 to 8.1.1 but could not find anything related. I am clueless! Using OpenJDK 11.0.3 2019-04-16 and Solr

Field ByteArrayUtf8CharSequence instead of String

2019-05-30 Thread Markus Jelsma
Hello, When upgrading to 7.7 i got SOLR-13249, when a SolrInputField's value suddenly became ByteArrayUtf8CharSequence instead of a String. That has been addressed. I am now upgrading to 8.1.1 and have a SearchComponent that operates on uses SolrClient to fetch documents from elsewhere

RE: Very low filter cache hit ratio

2019-05-29 Thread Markus Jelsma
Hello, What is missing in that article is you must never use NOW without rounding it down in a filter query. If you have it, round it down to an hour, day or minute to prevent flooding the filter cache. Regards, Markus -Original message- > From:Atita Arora > Sent: Wednesday 29th May

Facetting heat map, too many cells

2019-05-03 Thread Markus Jelsma
Hello, With gridlevel set to 3 i have a map of 256 x 128. However, i would really like a higher resolution, preferable twice as high. But with any gridlevel higher than 3, or distErrPct 0.1 or lower, i get the IllegalArgumentException, saying it does not want to give me a 1024x1024 sized map.

RE: Solr-Batch Update

2019-04-25 Thread Markus Jelsma
Hello, There is no definitive rule for this, it depends on your situation such as size of documents, resource constraints and possible heavy analysis chain. And in case of (re)indexing a large amount, your autocommit time/limit is probably more important. In our case, some collections are

NPE in CharsRefBuilder

2019-04-15 Thread Markus Jelsma
Hello, I made a ConditionalTokenFilter filter and factory. Its Lucene based unit tests work really well, and i can see it is doing something, queries are differently analyzed based on some condition. But when debugging through the GUI i get the following: 2019-04-15 12:37:42.219 ERROR

7.7.1 FlattenGraphFilterFactory at query-time?

2019-03-12 Thread Markus Jelsma
Hello, Due to reading 'This filter must be included on index-time analyzer..' in the documentation, i never considered adding it to a query-time analyser. However, we had problems with a set of three two-word synonyms never yielding the same number of results with SynonymGraph. When switching

RE: Re: Suppress stack trace in error response

2019-02-22 Thread Markus Jelsma
Hello, Solr's error responses respect the configured response writer settings, so you could probably remove the element and the stuff it contains using XSLT. It is not too fancy, but it should work. Regards, Markus -Original message- > From:Branham, Jeremy (Experis) > Sent: Friday

RE: Query of Death Lucene/Solr 7.6

2019-02-22 Thread Markus Jelsma
enumerated approach for phrase queries where slop>0, so setting ps=0 would > probably also help. > Michael > > On Fri, Feb 8, 2019 at 5:57 AM Markus Jelsma > wrote: > > > Hello (apologies for cross-posting), > > > > While working on SOLR-12743, using 7.

RE: TLOG replica, updateHandler errors in metrics, no logs

2019-02-21 Thread Markus Jelsma
produce this > should be > a JIRA IMO. > > Best, > Erick > > > On Feb 21, 2019, at 2:33 AM, Markus Jelsma > > wrote: > > > > Hello, > > > > We are moving some replica's to TLOG, one collection runs 7.5, the others > > 7.

TLOG replica, updateHandler errors in metrics, no logs

2019-02-21 Thread Markus Jelsma
Hello, We are moving some replica's to TLOG, one collection runs 7.5, the others 7.7. When indexing, we see UPDATE.updateHandler.errors increment for each document being indexed, there is nothing in the logs. Is this a known issue? Thanks, Markus

RE: solr cloud version upgrade 7.6 to 7.7 collection indexes all marked as down

2019-02-19 Thread Markus Jelsma
Hello, We just witnessed this too with 7.7. No no obvious messages in the logs, the replica status would not come out of 'down'. Meanwhile we got another weird exception from a neighbouring collection sharing the same nodes: 2019-02-18 13:47:20.622 ERROR

RE: Solr 7.7 UpdateRequestProcessor broken

2019-02-15 Thread Markus Jelsma
I stumbled upon this too yesterday and created SOLR-13249. In local unit tests we get String but in distributed unit tests we get a ByteArrayUtf8CharSequence instead. https://issues.apache.org/jira/browse/SOLR-13249 -Original message- > From:Andreas Hubold > Sent: Friday 15th

Query of Death Lucene/Solr 7.6

2019-02-08 Thread Markus Jelsma
Hello (apologies for cross-posting), While working on SOLR-12743, using 7.6 on two nodes and 7.2.1 on the remaining four, we stumbled upon a situation where the 7.6 nodes quickly succumb when a 'Query-of-Death' is issued, 7.2.1 up to 7.5 are all unaffected (tested and confirmed). Following

LFUCache

2019-02-04 Thread Markus Jelsma
Hello, Thanks to SOLR-12743 - one of our collections can't use FastLRUCache - we are considering LFUCache instead. But there is SOLR-3393 as well, claiming the current implementation is inefficient. But ConcurrentLRUCache and ConcurrentLFUCache both use ConcurrentHashmap under the hood, the

RE: Re: Delayed/waiting requests

2019-01-16 Thread Markus Jelsma
Hello, There is an extremely undocumented parameter to get the cache's contents displayed. Set showItems="100" on the filter cache. Regards, Markus -Original message- > From:Erick Erickson > Sent: Wednesday 16th January 2019 17:40 > To: solr-user > Subject: Re: Re:

RE: KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)

2018-11-29 Thread Markus Jelsma
Hello, Sorry for trying this once more. Is there anyone around who can help me, and perhaps others, on this subject and the linked Jira ticket and failing test? I could really need some help from someone who is really familiar with edismax code and the underlying QueryBuilder parts that are

RE: Delete all, index all, end up with 1 segment with 50% deletes

2018-11-28 Thread Markus Jelsma
and > https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/ > (Solr 7.5+). > > Best, > Erick > On Tue, Nov 27, 2018 at 4:29 AM Markus Jelsma > wrote: > > > > Hello, > > > > A background  batch process compiles a data set, when fi

Delete all, index all, end up with 1 segment with 50% deletes

2018-11-27 Thread Markus Jelsma
Hello, A background batch process compiles a data set, when finished, it sends a delete all to its target collection, then everything gets sent by SolrJ, followed by a regular commit. When inspecting the core i notice it has one segment with 9578 documents, of which exactly half are deleted.

RE: KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)

2018-11-22 Thread Markus Jelsma
Hello, I have opened a SOLR-13009 describing the problem. The attached patch contains a unit test proving the problem, i.e. the test fails. Any help would be greatly appreciated. Many thanks, Markus https://issues.apache.org/jira/browse/SOLR-13009 -Original message- > From:Markus

RE: KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)

2018-11-18 Thread Markus Jelsma
Hello, Apologies for bothering you all again, but i really need some help in this matter. How can we resolve this issue? Are we dealing with a bug here (then i'll open a ticket), am i doing something wrong? Is here anyone who had the same issue or understand the problem? Many thanks, Markus

RE: Extracting important multi term phrases from the text

2018-11-15 Thread Markus Jelsma
lePositionIncrements="false" for stop word filter but > that parameter only works for lucene version 4.3 or earlier. Looks like > it's an open issue in lucene > https://issues.apache.org/jira/browse/LUCENE-4065 > > For now, I am trying to find a workaround using PatternReplaceFilterFactory.

RE: Extracting important multi term phrases from the text

2018-11-15 Thread Markus Jelsma
Hello Pratik, We would use ShingleFilter for this indeed. If you only want bigrams/shingles, don't forget to disable outputUnigrams and set both shinle size limits to 2. Regards, Markus -Original message- > From:Pratik Patel > Sent: Thursday 15th November 2018 17:00 > To:

KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)

2018-11-13 Thread Markus Jelsma
Hello, apologies for this long winded e-mail. Our fields have KeywordRepeat and language specific filters such as a stemmer, the final filter at query-time is SynonymGraph. We do not use RemoveDuplicatesFilter for those of you wondering why when you see the parsed queries below, this is due to

RE: Odd Scoring behavior

2018-10-30 Thread Markus Jelsma
Hello Webster, It smells like KeywordRepeat. In general it is not a problem if all terms are scored twice. But you also have RemoveDuplicates, and this causes that in some cases a term in one field is scored twice, but once in the other field and then you have a problem. Due to lack of

RE: Merging data from different sources

2018-10-30 Thread Markus Jelsma
Hello Martin, We also use an URP for this in some cases. We index documents to some collection, the URP reads a field from that document which is an ID in another collection. So we fetch that remote Solr document on-the-fly, and use those fields to enrich the incoming document. It is very

RE: Solr Shards down for unknown reason

2018-10-15 Thread Markus Jelsma
Hello, We observed this problem too with older Solr versions. Whenever none of the shard's replica's would come up we would just shut them all down again and restart just one replica and wait. In some cases it won't come up (still true for Solr 7.4), but start a second shard a while later and

RE: Opinions on index optimization...

2018-10-03 Thread Markus Jelsma
There are a few bugs for which you require to merge the index, see SOLR-8807 and related bugs. https://issues.apache.org/jira/browse/SOLR-8807 -Original message- > From:Erick Erickson > Sent: Wednesday 3rd October 2018 4:50 > To: solr-user > Subject: Re: Opinions on index

RE: Java version 11 for solr 7.5?

2018-09-26 Thread Markus Jelsma
Indeed, but JDK-8038348 has been fixed very recently for Java 9 or higher. -Original message- > From:Jeff Courtade > Sent: Wednesday 26th September 2018 17:36 > To: solr-user@lucene.apache.org > Subject: Re: Java version 11 for solr 7.5? > > My concern with using g1 is solely based on

RE: Grammatical tenses Stemming in SOLR

2018-09-21 Thread Markus Jelsma
Hello Aishwarya, KStem does a really bad job with the examples you have given, it won't remove the -s and -ing suffixes in some strange cases. Porter/Snowball work just fine for this example. What won't work, of course, are irregular verbs and nouns (plural forms). They always need to be

RE: Heap Memory Problem after Upgrading to 7.4.0

2018-09-06 Thread Markus Jelsma
ing to 7.4.0 > > I think this is pretty bad. I created > https://issues.apache.org/jira/browse/SOLR-12743. Feel free to add any more > details you have there. > > On Mon, Sep 3, 2018 at 1:50 PM Markus Jelsma > wrote: > > > Hello Björn, > > > > Take great

RE: Heap Memory Problem after Upgrading to 7.4.0

2018-09-03 Thread Markus Jelsma
e you able to figure out anything? > Currently thinking about rollbacking to 7.2.1. > > > > > On 3. Sep 2018, at 21:54, Markus Jelsma wrote: > > > > Hello, > > > > Getting an OOM plus the fact you are having a lot of IndexSearcher > > instances

RE: Heap Memory Problem after Upgrading to 7.4.0

2018-09-03 Thread Markus Jelsma
Hello, Getting an OOM plus the fact you are having a lot of IndexSearcher instances rings a familiar bell. One of our collections has the same issue [1] when we attempted an upgrade 7.2.1 > 7.3.0. I managed to rule out all our custom Solr code but had to keep our Lucene filters in the schema,

RE: Boost matches occurring early in the field (offset)

2018-08-29 Thread Markus Jelsma
Hello Jan, Many years ago i made an extension of SpanFirstQuery called GradientSpanFirstQuery that did just that, decrease the boost for each advanced position in the text. Then Lucene 4 or 5 came and this code wouldn't compile any more. @Override protected AcceptStatus

RE: 7.4.0 SQL handler throws exception if WHERE clause is present

2018-08-29 Thread Markus Jelsma
Hi, Forget about it, after ten years without SQL, i managed to forget i had to wrap the WHERE value in quotes, single quotes in this case. Thanks, Markus -Original message- > From:Markus Jelsma > Sent: Wednesday 29th August 2018 11:51 > To: solr-user > Subject: 7.4.0 SQL handler

7.4.0 SQL handler throws exception if WHERE clause is present

2018-08-29 Thread Markus Jelsma
Hello, I was, finally, trying the SQL handler on one of our collections. Executing a SELECT * FROM logs LIMIT 10 runs fine, but restricting the set using a WHERE clause gives me the exception below. The type field is a String type, indexed and has DocValues. I must be doing something wrong,

RE: Contextual Synonym Filter

2018-08-17 Thread Markus Jelsma
Hello, If you are using Dismax or Edismax, you can easily extend the QParser and reconstruct your analyzer on-the-fly, based on what you find in the filter query. Be sure to keep a cache of the analyzer because construction can be very heavy. Check the Edismax code, it offers clear examples

RE: Searching by dates

2018-08-16 Thread Markus Jelsma
Hello Christopher, We have a library whose soul purpose it is to extract, parse and validate dates found in free text, in all major world languages (and many more) and every in thinkable format/notation. It can also deal with times, timezones (resolve them back to UTC), different eras (e.g.

7.2.1 Solr collection sluggish

2018-08-08 Thread Markus Jelsma
Hello, We've got, again, a little mystery here. Our main text collection is suddenly running at a snail's pace since Monday very early in the morning, the monitoring graph for response time went up. This is not unusual for Solr so the JVM's were all restarted, it always solves a sluggish

RE: Recipe for moving to solr cloud without reindexing

2018-08-07 Thread Markus Jelsma
Subject: Re: Recipe for moving to solr cloud without reindexing > > Thank you, that is of course a way to go, but I would actually like to be > able to shard ... > Could I use your approach and add shards dynamically? > > > 2018-08-07 13:28 GMT+02:00 Markus Jelsma : &g

RE: Recipe for moving to solr cloud without reindexing

2018-08-07 Thread Markus Jelsma
Hello Bjarke, If you are not going to shard you can just create a 1 shard/1 replica collection, shut down Solr, copy the data directory into the replica's directory and start up again. Regards, Markus -Original message- > From:Bjarke Buur Mortensen > Sent: Tuesday 7th August 2018

RE: indexing two words, searching single word

2018-08-03 Thread Markus Jelsma
Hello, If your case is English you could use synonyms to work around the problem of the few compound words of the language. However, would you be dealing with a Germanic compound language, the HyphenationCompoundWordTokenFilter [1] or DictionaryCompoundWordTokenFilter are a better choice. The

RE: Solr Server crashes when requesting a result with too large resultRows

2018-07-31 Thread Markus Jelsma
Hello Georg, As you have seen, a high rows parameter is a bad idea. Use cursor mark [1] instead. Regards, Markus [1] https://lucene.apache.org/solr/guide/7_4/pagination-of-results.html -Original message- > From:Georg Fette > Sent: Tuesday 31st July 2018 10:44 > To:

RE: Recent configuration change to our site causes frequent index corruption

2018-07-26 Thread Markus Jelsma
Hello, Is your maximum number of open files 1024? If so, increase it to a more regular 65536. Some operating systems ship with 1024 for reasons i don't understand. Whenever installing Solr anywhere for the past ten years, we have had to check this each and every time, and still have to!

RE: Can I use RegEx function?

2018-07-23 Thread Markus Jelsma
pache.org > Subject: Re: Can I use RegEx function? > > Can I use it in "fl" and "facet.field" as a function > > On Mon, Jul 23, 2018 at 11:33 AM Markus Jelsma > wrote: > > > Hello, > > > > The usual faceting works for all queries, facet.q

RE: Can I use RegEx function?

2018-07-23 Thread Markus Jelsma
; Sent: Monday 23rd July 2018 10:26 > To: solr-user@lucene.apache.org > Subject: Re: Can I use RegEx function? > > can it be used in facets? > > On Mon, Jul 23, 2018, 11:24 Markus Jelsma > wrote: > > > Hello, > > > > It is not really obvious in documenta

RE: Can I use RegEx function?

2018-07-23 Thread Markus Jelsma
Hello, It is not really obvious in documentation, but the standard query parser supports regular expressions. Encapsulate your regex with forward slashes /, q=field:/[a-z]+$/ will work. Regards, Markus -Original message- > From:Peter Sh > Sent: Monday 23rd July 2018 10:09 > To:

RE: Cannot index to 7.2.1 collection alias

2018-07-18 Thread Markus Jelsma
MessageHandler.java:784) > > > Thanks, > MArkus > > -Original message- > > From:Shawn Heisey > > Sent: Tuesday 17th July 2018 16:39 > > To: solr-user@lucene.apache.org > > Subject: Re: Cannot index to 7.2.1 collection alias > > > > On 7

RE: Cannot index to 7.2.1 collection alias

2018-07-17 Thread Markus Jelsma
dex to 7.2.1 collection alias > > On 7/17/2018 6:28 AM, Markus Jelsma wrote: > > Just attempted to connect and index a bunch of documents to a collection > > alias, got a NPE right away. Can't find this error in Jira, did i overlook > > something? Create new ticket? &

RE: Cannot index to 7.2.1 collection alias

2018-07-17 Thread Markus Jelsma
Additionaly, reloading a collection alias also doesn't work. Can't find that one in Jira either, new ticket? Thanks, Markus -Original message- > From:Markus Jelsma > Sent: Tuesday 17th July 2018 14:28 > To: solr-user > Subject: Cannot index to 7.2.1 collection alias > > Hello, >

Cannot index to 7.2.1 collection alias

2018-07-17 Thread Markus Jelsma
Hello, Just attempted to connect and index a bunch of documents to a collection alias, got a NPE right away. Can't find this error in Jira, did i overlook something? Create new ticket? Thanks, Markus

RE: 7.3 appears to leak

2018-07-16 Thread Markus Jelsma
ion is out-of-roof. Where previously 512MB heap was enough, now 6G > aren’t enough to index all files. > > kind regards, > > Thomas > > > Am 04.07.2018 um 15:03 schrieb Markus Jelsma : > > > > Hello Andrey, > > > > I didn't think of that! I will try i

  1   2   3   4   5   6   7   8   9   10   >