Weird time outs 7.2

2018-01-23 Thread Markus Jelsma
Hi, On 7.2 is regularly see this popping up: 2018-01-23 16:16:37.056 ERROR (qtp329611835-117592) [c:logs s:shard1 r:core_node1 x:logs_shard1_replica1] o.a.s.s.HttpSolrCall null:java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 12/12 ms at

RE: Solr search engine configuration

2018-03-12 Thread Markus Jelsma
Hi, Glad to hear you removed the gramming, but Kraaij-Pohlmann isn't going to solve all problems either, for example molens => molen, but molen => mool, and many more like that. You can solve this by adding manual rules to StemmerOverrideFilter, but due to the compound nature of words, you

New payload handling 7.2

2018-02-27 Thread Markus Jelsma
Hello, Our payload handling became broken since Lucene/Solr 7.2, we sometimes get 0.0 = AveragePayloadFunction.docScore() for some but not all query clauses. We only have payloads on some terms, to signal the similarity it needs to 'punish' the term, e.g. being a article or adjective. I

RE: Solr search engine configuration

2018-03-13 Thread Markus Jelsma
Hi - In that case you need the KeywordRepeat and RemoveDuplicates filters as well, i'd suggest reading their Javadocs. With the docs and the analysis GUI, you can probably figure out their respective place in the tokenizer chain yourself. Trusting on IDF is not really a fine controlled

RE: Solr search engine configuration

2018-03-12 Thread Markus Jelsma
Hello Peter, StemmerOverride wants \t separated fields, that is probably the cause of the AIooBE you get. Regarding schema definitions, each factory JavaDoc [1] has a proper example listed. I recommend putting a decompounder before a stemmer, and have an accent (or ICU) folder as one of the

RE: Solr search engine configuration

2018-03-13 Thread Markus Jelsma
-Original message- > From:PeterKerk > Sent: Tuesday 13th March 2018 14:24 > To: solr-user@lucene.apache.org > Subject: RE: Solr search engine configuration > > Markus, > > Thanks again. Ok, 1 by 1: > > StemmerOverride wants \t separated fields, that is

RE: How to store files larger than zNode limit

2018-03-13 Thread Markus Jelsma
has: > If this > option is changed, the system property must be set on all servers and > clients otherwise problems will arise > > Other than Zookeeper java property what are the other places this should be > set? > > Thank you > Roopa > > Sent from my iPhone >

RE: How to store files larger than zNode limit

2018-03-13 Thread Markus Jelsma
Hi - For now, the only option is to allow larger blobs via jute.maxbuffer (whatever jute means). Despite ZK being designed for kb sized blobs, Solr demands us to abuse it. I think there was a ticket for compression support, but that only stretches the limit. We are running ZK with 16 MB for

RE: Solr search engine configuration

2018-03-13 Thread Markus Jelsma
Inline, cheers. -Original message- > From:PeterKerk > Sent: Tuesday 13th March 2018 18:53 > To: solr-user@lucene.apache.org > Subject: RE: Solr search engine configuration > > You must stay in the Javadoc section, there the examples are good, or the > reference

RE: PreAnalyzed FieldType, and simultaneously importing JSON

2018-04-03 Thread Markus Jelsma
eriority to the FieldType here: > https://issues.apache.org/jira/browse/SOLR-4619?focusedCommentId=13611191=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13611191 > Sadly, the FieldType is the one that is documented in the ref guide, but > not the URP :-( > &g

RE: Storing Ranking Scores And Documents In Separate Indices

2018-04-05 Thread Markus Jelsma
Hello Quynh, Solr has support for external file fields [1]. They are a simple key=float based text file where key is ID, and the float can be used for boosting/scoring documents. This is a much simpler approach than using a separate collection. These files can be reloaded every commit and are

RE: PreAnalyzed URP and SchemaRequest API

2018-04-13 Thread Markus Jelsma
rse it isn't for everybody -- > only when the analysis chain is sufficiently complex. > > On Mon, Apr 9, 2018 at 9:45 AM Markus Jelsma <markus.jel...@openindex.io> > wrote: > > > Hello David, > > > > The remote client has everything on the class path but jus

RE: PreAnalyzed URP and SchemaRequest API

2018-04-09 Thread Markus Jelsma
opy PreAnalyzedParser into > your codebase so that you don't have to reinvent any wheels, even though > that's awkward. Perhaps that ought to be in Solrj? But no we don't want > SolrJ depending on Lucene-core, though it'd make a fine "optional" > dependency. > > On Wed,

RE: IndexFetcher cannot download index file

2018-04-24 Thread Markus Jelsma
Inline. -Original message- > From:Shawn Heisey <apa...@elyograg.org> > Sent: Tuesday 24th April 2018 21:18 > To: solr-user@lucene.apache.org > Subject: Re: IndexFetcher cannot download index file > > On 4/24/2018 12:36 PM, Markus Jelsma wrote: > > I

RE: IndexFetcher cannot download index file

2018-04-24 Thread Markus Jelsma
-Original message- > From:Shawn Heisey <apa...@elyograg.org> > Sent: Tuesday 24th April 2018 19:12 > To: solr-user@lucene.apache.org > Subject: Re: IndexFetcher cannot download index file > > On 4/24/2018 9:46 AM, Markus Jelsma wrote: > > Disk space was W

RE: IndexFetcher cannot download index file

2018-04-24 Thread Markus Jelsma
19:12 > > To: solr-user@lucene.apache.org > > Subject: Re: IndexFetcher cannot download index file > > > > On 4/24/2018 9:46 AM, Markus Jelsma wrote: > > > Disk space was WARN level. It seems only stack traces of ERROR level > > > messages are visible via

RE: query bag of word with negation

2018-04-22 Thread Markus Jelsma
Hello Nicolas, Yes you can! Check out ComplexPhaseQParser https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-ComplexPhraseQueryParser Regards, Markus -Original message- > From:Nicolas Paris > Sent: Sunday 22nd April 2018 20:04 > To:

ClassCastException: o.a.l.d.Field cannot be cast to o.a.l.d.StoredField

2018-04-24 Thread Markus Jelsma
Hello, We have a DocumentTransformer that gets a Field from the SolrDocument and casts it to StoredField (although aparently we don't need to cast). This works well in tests and fine in production, except for some curious, unknown and unreproducible, cases, throwing the ClassCastException. I

IndexFetcher checksums don't match

2018-04-24 Thread Markus Jelsma
Hello, After a failed log replay (it got a ClassCastException) with 7.2.1 it seems Solr tries to haul over a 50 GB index from another replica. While doing so, it throws a good number of checksum warnings. Why don't the checksums match? Can i safely ignore them? Do i need to do something about

RE: IndexFetcher cannot download index file

2018-04-24 Thread Markus Jelsma
Forget about it, recovery got a java.io.IOException: No space left on device but it wasn't clear until i inspected the real logs. The logs in de web admin didn't show the disk space exception, even when i expand the log line. Maybe that could be changed. Thanks, Markus -Original

IndexFetcher cannot download index file

2018-04-24 Thread Markus Jelsma
Hello, Slightly different questions/problem, what is going on here on 7.2.1? During the recovery, none of this node's fellow replicas indexes were changed but we still got this error. When we got that error, the recovery was restarted, but shortly after the replicas indexes got updated and

RE: IndexFetcher cannot download index file

2018-04-24 Thread Markus Jelsma
l 2018 17:39 > To: solr-user@lucene.apache.org > Subject: Re: IndexFetcher cannot download index file > > On 4/24/2018 6:52 AM, Markus Jelsma wrote: > > Forget about it, recovery got a java.io.IOException: No space left on > > device but it wasn't clear until i inspected the r

PreAnalyzed FieldType, and simultaneously importing JSON

2018-03-29 Thread Markus Jelsma
Hello, We want to move to PreAnalyzed FieldType to offload our very heavy analysis chain away from the search cluster, so we have to configure our fields to accept pre-analyzed tokens in production. But we use the same schema in development environments too, and that is where we use JSON

PreAnalyzed URP and SchemaRequest API

2018-04-04 Thread Markus Jelsma
Hello, We intend to move to PreAnalyzed URP for analysis offloading. Browsing the Javadocs i came across the SchemaRequest API looking for a way to get a Field object remotely, which i seem to need for JsonPreAnalyzedParser.toFormattedString(Field f). But all i can get from SchemaRequest API

QueryElevator prepare() in in distributed search

2018-03-16 Thread Markus Jelsma
Hello, QueryElevator.prepare() runs five times for a single query in distributed search, this is probably not how it should be, but in what phase of distributed search is it supposed to actually run? Many thanks, Markus

RE: QueryElevator prepare() in in distributed search

2018-03-20 Thread Markus Jelsma
Anything on this one to share? Thanks, Markus -Original message- > From:Markus Jelsma > Sent: Friday 16th March 2018 18:13 > To: Solr-user > Subject: QueryElevator prepare() in in distributed search > > Hello, > >

RE: Question on "other language" than english stemmers and using both

2018-02-27 Thread Markus Jelsma
thing. It's > just a tool for me so i didn't want to go too deep into it bit sometimes a > must is a must. :) default schema.xml? I just get this managed_schema file > when installing. Do you mean that one? > > > Am 27. Februar 2018 11:12:39 vorm. schrieb Markus Jelsma

RE: Question on "other language" than english stemmers and using both

2018-02-27 Thread Markus Jelsma
Hello, Mixing language specific filters in the same analyzer is not going to give predictable or desirable results. Instead, create separate text_en and text_de fieldTypes and fields. See Solr's default schema.xml, it has many examples of various languages. Depending on what query parser you

RE: Solr Cloud: query elevation + deduplication?

2018-03-06 Thread Markus Jelsma
Hi, I would not use ID (uniqueKey) as signature field, query elevation would never work properly with such a set up, change a document's content, and it 'll get a new ID. If i remember correctly this factory still deletes duplicates if signatureField is not uniqueKey. Regarding SOLR-3473,

RE: ClassCastException: o.a.l.d.Field cannot be cast to o.a.l.d.StoredField

2018-04-26 Thread Markus Jelsma
related to updateLog replay. > > On Tue, Apr 24, 2018 at 7:13 AM Markus Jelsma <markus.jel...@openindex.io> > wrote: > > > Hello, > > > > We have a DocumentTransformer that gets a Field from the SolrDocument and > > casts it to StoredField (although apa

RE: 7.3 appears to leak

2018-06-28 Thread Markus Jelsma
ke? Any custom > plugins or things we should be aware of? Simple indexing artificial docs, > querying and committing doesn't seem to reproduce the issue for me. > > On Thu, Apr 26, 2018 at 10:13 PM, Markus Jelsma > wrote: > > > Hello, > > > > We just finished upgrad

RE: 7.4.0 changes in DocTransformer behaviour

2018-06-28 Thread Markus Jelsma
-Original message- > From:Shawn Heisey > Sent: Wednesday 27th June 2018 17:40 > To: solr-user@lucene.apache.org > Subject: Re: 7.4.0 changes in DocTransformer behaviour > > On 6/27/2018 8:29 AM, Markus Jelsma wrote: > > I am attempting an upgrade to 7.4.0,

RE: 7.3 appears to leak

2018-06-28 Thread Markus Jelsma
Hello Yonik, If leaking a whole SolrIndexSearcher would cause this problem, then the only custom component would be our copy/paste-and-enhance version of the elevator component, is the root of all problems. It is a direct copy of the 7.2 source where only things like getAnalyzedQuery, the

RE: Collection reload leaves dangling SolrCore instances

2018-06-28 Thread Markus Jelsma
y 2nd May 2018 17:21 > > To: solr-user > > Subject: Re: Collection reload leaves dangling SolrCore instances > > > > Markus: > > > > You may well be hitting SOLR-11882. > > > > On Wed, May 2, 2018 at 8:18 AM, Shawn Heisey wrote: > > > On 5

RE: Solr Shards down for unknown reason

2018-10-15 Thread Markus Jelsma
Hello, We observed this problem too with older Solr versions. Whenever none of the shard's replica's would come up we would just shut them all down again and restart just one replica and wait. In some cases it won't come up (still true for Solr 7.4), but start a second shard a while later and

RE: Merging data from different sources

2018-10-30 Thread Markus Jelsma
Hello Martin, We also use an URP for this in some cases. We index documents to some collection, the URP reads a field from that document which is an ID in another collection. So we fetch that remote Solr document on-the-fly, and use those fields to enrich the incoming document. It is very

RE: Odd Scoring behavior

2018-10-30 Thread Markus Jelsma
Hello Webster, It smells like KeywordRepeat. In general it is not a problem if all terms are scored twice. But you also have RemoveDuplicates, and this causes that in some cases a term in one field is scored twice, but once in the other field and then you have a problem. Due to lack of

KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)

2018-11-13 Thread Markus Jelsma
Hello, apologies for this long winded e-mail. Our fields have KeywordRepeat and language specific filters such as a stemmer, the final filter at query-time is SynonymGraph. We do not use RemoveDuplicatesFilter for those of you wondering why when you see the parsed queries below, this is due to

RE: Extracting important multi term phrases from the text

2018-11-15 Thread Markus Jelsma
Hello Pratik, We would use ShingleFilter for this indeed. If you only want bigrams/shingles, don't forget to disable outputUnigrams and set both shinle size limits to 2. Regards, Markus -Original message- > From:Pratik Patel > Sent: Thursday 15th November 2018 17:00 > To:

RE: Extracting important multi term phrases from the text

2018-11-15 Thread Markus Jelsma
lePositionIncrements="false" for stop word filter but > that parameter only works for lucene version 4.3 or earlier. Looks like > it's an open issue in lucene > https://issues.apache.org/jira/browse/LUCENE-4065 > > For now, I am trying to find a workaround using PatternReplaceFilterFactory.

RE: Opinions on index optimization...

2018-10-03 Thread Markus Jelsma
There are a few bugs for which you require to merge the index, see SOLR-8807 and related bugs. https://issues.apache.org/jira/browse/SOLR-8807 -Original message- > From:Erick Erickson > Sent: Wednesday 3rd October 2018 4:50 > To: solr-user > Subject: Re: Opinions on index

RE: Heap Memory Problem after Upgrading to 7.4.0

2018-09-03 Thread Markus Jelsma
e you able to figure out anything? > Currently thinking about rollbacking to 7.2.1. > > > > > On 3. Sep 2018, at 21:54, Markus Jelsma wrote: > > > > Hello, > > > > Getting an OOM plus the fact you are having a lot of IndexSearcher > > instances

RE: Heap Memory Problem after Upgrading to 7.4.0

2018-09-03 Thread Markus Jelsma
Hello, Getting an OOM plus the fact you are having a lot of IndexSearcher instances rings a familiar bell. One of our collections has the same issue [1] when we attempted an upgrade 7.2.1 > 7.3.0. I managed to rule out all our custom Solr code but had to keep our Lucene filters in the schema,

RE: Grammatical tenses Stemming in SOLR

2018-09-21 Thread Markus Jelsma
Hello Aishwarya, KStem does a really bad job with the examples you have given, it won't remove the -s and -ing suffixes in some strange cases. Porter/Snowball work just fine for this example. What won't work, of course, are irregular verbs and nouns (plural forms). They always need to be

RE: Java version 11 for solr 7.5?

2018-09-26 Thread Markus Jelsma
Indeed, but JDK-8038348 has been fixed very recently for Java 9 or higher. -Original message- > From:Jeff Courtade > Sent: Wednesday 26th September 2018 17:36 > To: solr-user@lucene.apache.org > Subject: Re: Java version 11 for solr 7.5? > > My concern with using g1 is solely based on

RE: KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)

2018-11-18 Thread Markus Jelsma
Hello, Apologies for bothering you all again, but i really need some help in this matter. How can we resolve this issue? Are we dealing with a bug here (then i'll open a ticket), am i doing something wrong? Is here anyone who had the same issue or understand the problem? Many thanks, Markus

RE: Re: Delayed/waiting requests

2019-01-16 Thread Markus Jelsma
Hello, There is an extremely undocumented parameter to get the cache's contents displayed. Set showItems="100" on the filter cache. Regards, Markus -Original message- > From:Erick Erickson > Sent: Wednesday 16th January 2019 17:40 > To: solr-user > Subject: Re: Re:

RE: KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)

2018-11-29 Thread Markus Jelsma
Hello, Sorry for trying this once more. Is there anyone around who can help me, and perhaps others, on this subject and the linked Jira ticket and failing test? I could really need some help from someone who is really familiar with edismax code and the underlying QueryBuilder parts that are

RE: KeywordRepeat, stemming, (single term) synonyms and minimum should match (edismax)

2018-11-22 Thread Markus Jelsma
Hello, I have opened a SOLR-13009 describing the problem. The attached patch contains a unit test proving the problem, i.e. the test fails. Any help would be greatly appreciated. Many thanks, Markus https://issues.apache.org/jira/browse/SOLR-13009 -Original message- > From:Markus

Delete all, index all, end up with 1 segment with 50% deletes

2018-11-27 Thread Markus Jelsma
Hello, A background batch process compiles a data set, when finished, it sends a delete all to its target collection, then everything gets sent by SolrJ, followed by a regular commit. When inspecting the core i notice it has one segment with 9578 documents, of which exactly half are deleted.

RE: Delete all, index all, end up with 1 segment with 50% deletes

2018-11-28 Thread Markus Jelsma
and > https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/ > (Solr 7.5+). > > Best, > Erick > On Tue, Nov 27, 2018 at 4:29 AM Markus Jelsma > wrote: > > > > Hello, > > > > A background  batch process compiles a data set, when fi

RE: solr cloud version upgrade 7.6 to 7.7 collection indexes all marked as down

2019-02-19 Thread Markus Jelsma
Hello, We just witnessed this too with 7.7. No no obvious messages in the logs, the replica status would not come out of 'down'. Meanwhile we got another weird exception from a neighbouring collection sharing the same nodes: 2019-02-18 13:47:20.622 ERROR

TLOG replica, updateHandler errors in metrics, no logs

2019-02-21 Thread Markus Jelsma
Hello, We are moving some replica's to TLOG, one collection runs 7.5, the others 7.7. When indexing, we see UPDATE.updateHandler.errors increment for each document being indexed, there is nothing in the logs. Is this a known issue? Thanks, Markus

RE: TLOG replica, updateHandler errors in metrics, no logs

2019-02-21 Thread Markus Jelsma
produce this > should be > a JIRA IMO. > > Best, > Erick > > > On Feb 21, 2019, at 2:33 AM, Markus Jelsma > > wrote: > > > > Hello, > > > > We are moving some replica's to TLOG, one collection runs 7.5, the others > > 7.

RE: Query of Death Lucene/Solr 7.6

2019-02-22 Thread Markus Jelsma
enumerated approach for phrase queries where slop>0, so setting ps=0 would > probably also help. > Michael > > On Fri, Feb 8, 2019 at 5:57 AM Markus Jelsma > wrote: > > > Hello (apologies for cross-posting), > > > > While working on SOLR-12743, using 7.

RE: Re: Suppress stack trace in error response

2019-02-22 Thread Markus Jelsma
Hello, Solr's error responses respect the configured response writer settings, so you could probably remove the element and the stuff it contains using XSLT. It is not too fancy, but it should work. Regards, Markus -Original message- > From:Branham, Jeremy (Experis) > Sent: Friday

7.7.1 FlattenGraphFilterFactory at query-time?

2019-03-12 Thread Markus Jelsma
Hello, Due to reading 'This filter must be included on index-time analyzer..' in the documentation, i never considered adding it to a query-time analyser. However, we had problems with a set of three two-word synonyms never yielding the same number of results with SynonymGraph. When switching

Query of Death Lucene/Solr 7.6

2019-02-08 Thread Markus Jelsma
Hello (apologies for cross-posting), While working on SOLR-12743, using 7.6 on two nodes and 7.2.1 on the remaining four, we stumbled upon a situation where the 7.6 nodes quickly succumb when a 'Query-of-Death' is issued, 7.2.1 up to 7.5 are all unaffected (tested and confirmed). Following

RE: Solr 7.7 UpdateRequestProcessor broken

2019-02-15 Thread Markus Jelsma
I stumbled upon this too yesterday and created SOLR-13249. In local unit tests we get String but in distributed unit tests we get a ByteArrayUtf8CharSequence instead. https://issues.apache.org/jira/browse/SOLR-13249 -Original message- > From:Andreas Hubold > Sent: Friday 15th

LFUCache

2019-02-04 Thread Markus Jelsma
Hello, Thanks to SOLR-12743 - one of our collections can't use FastLRUCache - we are considering LFUCache instead. But there is SOLR-3393 as well, claiming the current implementation is inefficient. But ConcurrentLRUCache and ConcurrentLFUCache both use ConcurrentHashmap under the hood, the

NPE in CharsRefBuilder

2019-04-15 Thread Markus Jelsma
Hello, I made a ConditionalTokenFilter filter and factory. Its Lucene based unit tests work really well, and i can see it is doing something, queries are differently analyzed based on some condition. But when debugging through the GUI i get the following: 2019-04-15 12:37:42.219 ERROR

RE: Solr Heap Usage

2019-06-07 Thread Markus Jelsma
Hello, We use VisualVM for making observations. But use Eclipse MAT for in-depth analysis, usually only when there is a suspected memory leak. Regards, Markus -Original message- > From:John Davis > Sent: Friday 7th June 2019 20:30 > To: solr-user@lucene.apache.org > Subject: Re:

Field ByteArrayUtf8CharSequence instead of String

2019-05-30 Thread Markus Jelsma
Hello, When upgrading to 7.7 i got SOLR-13249, when a SolrInputField's value suddenly became ByteArrayUtf8CharSequence instead of a String. That has been addressed. I am now upgrading to 8.1.1 and have a SearchComponent that operates on uses SolrClient to fetch documents from elsewhere

RE: Solr 8.1.1, JMX and VisualVM

2019-05-30 Thread Markus Jelsma
Hello, Slight correction, SolrCLI does become visible in the local applications view. I just missed it before. Thanks, Markus -Original message- > From:Markus Jelsma > Sent: Thursday 30th May 2019 14:47 > To: solr-user > Subject: Solr 8.1.1, JMX and VisualVM > > Hello, > > While

Solr 8.1.1, JMX and VisualVM

2019-05-30 Thread Markus Jelsma
Hello, While upgrading from 7.7 to 8.1.1, i noticed start.jar and SolrCLI no longer pop up in the local applications view of VisualVM! I CTRL-F'ed my way through the changelog for Solr 8.0.0 to 8.1.1 but could not find anything related. I am clueless! Using OpenJDK 11.0.3 2019-04-16 and Solr

RE: Very low filter cache hit ratio

2019-05-29 Thread Markus Jelsma
Hello, What is missing in that article is you must never use NOW without rounding it down in a filter query. If you have it, round it down to an hour, day or minute to prevent flooding the filter cache. Regards, Markus -Original message- > From:Atita Arora > Sent: Wednesday 29th May

RE: Query of Death Lucene/Solr 7.6

2019-05-30 Thread Markus Jelsma
22, 2019 at 11:00 AM Gregg Donovan wrote: > > > FWIW: we have also seen serious Query of Death issues after our upgrade to > > Solr 7.6. Are there any open issues we can watch? Is Markus' findings > > around `pf` our best guess? We've seen these issues even with ps=0. We also &

RE: Solr 8.1.1, JMX and VisualVM

2019-05-30 Thread Markus Jelsma
ct: Re: Solr 8.1.1, JMX and VisualVM > > Hi, > > This has to do with the new JVM flags that optimise performance, they were > added roughly at the same time when Solr switched to G1GC. > > In ‘bin/solr’ please comment out this flag: '-XX:+PerfDisableSharedMem'. > > &g

Increased disk space usage 8.1.1 vs 7.7.1

2019-06-13 Thread Markus Jelsma
Hello, We are upgrading to Solr 8. One of our reindexed collections takes a GB more than the production uses which is on 7.7.1. Production also has deleted documents. This means Solr 8 somehow uses more disk space. I have checked both Solr and Lucene's CHANGES but no ticket was immediately

RE: Different facet count between 7.7.1 and 8.1.1

2019-06-13 Thread Markus Jelsma
gt; an "optimize" change anything? Is this DocValues strings? > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > 12. jun. 2019 kl. 23:49 skrev Markus Jelsma : > > > > Hello again, > > > > We found

RE: Increased disk space usage 8.1.1 vs 7.7.1

2019-06-13 Thread Markus Jelsma
sey > Sent: Thursday 13th June 2019 13:42 > To: solr-user@lucene.apache.org > Subject: Re: Increased disk space usage 8.1.1 vs 7.7.1 > > On 6/13/2019 4:19 AM, Markus Jelsma wrote: > > We are upgrading to Solr 8. One of our reindexed collections takes a GB > > more than the pro

Different facet count between 7.7.1 and 8.1.1

2019-06-12 Thread Markus Jelsma
Hello again, We found another oddity when upgrading to Solr 8. For a *:* query, the facet counts for a simple string field do not match at all between these versions. Solr 7.7.1 gives less or zero counts where as for 8 we see the correct counts. So something seems fixed for a bug that i was

CursorMark, batch size/speed

2019-06-12 Thread Markus Jelsma
Hello, One of our collections hates CursorMark, it really does. When under very heavy load the nodes can occasionally consume GBs additional heap for no clear reason immediately after downloading the entire corpus. Although the additional heap consumption is a separate problem that i hope

Facetting heat map, too many cells

2019-05-03 Thread Markus Jelsma
Hello, With gridlevel set to 3 i have a map of 256 x 128. However, i would really like a higher resolution, preferable twice as high. But with any gridlevel higher than 3, or distErrPct 0.1 or lower, i get the IllegalArgumentException, saying it does not want to give me a 1024x1024 sized map.

RE: refused connection

2019-06-28 Thread Markus Jelsma
Hello, If you get a Connection Refused, then normally the server is just offline. But, something weird is hiding in your stack trace, you should check it out further: > Caused by: java.net.ConnectException: Cannot assign requested address > (connect failed) I have not seen this before.

RE: Solr 8 getZkStateReader throwing AlreadyClosedException

2019-07-01 Thread Markus Jelsma
Opened SOLR-13591. https://issues.apache.org/jira/browse/SOLR-13591 -Original message- > From:Markus Jelsma > Sent: Thursday 27th June 2019 13:20 > To: solr-user@lucene.apache.org; solr-user > Subject: RE: Solr 8 getZkStateReader throwing AlreadyClosedException > > This was 8.1.1

RE: Solr-Batch Update

2019-04-25 Thread Markus Jelsma
Hello, There is no definitive rule for this, it depends on your situation such as size of documents, resource constraints and possible heavy analysis chain. And in case of (re)indexing a large amount, your autocommit time/limit is probably more important. In our case, some collections are

Solr 8 getZkStateReader throwing AlreadyClosedException

2019-06-27 Thread Markus Jelsma
Hello, We had two different SolrClients failing on different collections and machines just around the same time. After restarting everything was just fine again. The following exception was thrown: 2019-06-27 11:04:28.117 ERROR (qtp203849460-13532) [c:_shard1_replica_t15]

RE: Solr 8 getZkStateReader throwing AlreadyClosedException

2019-06-27 Thread Markus Jelsma
This was 8.1.1 to be precise. Sorry! -Original message- > From:Markus Jelsma > Sent: Thursday 27th June 2019 13:19 > To: solr-user > Subject: Solr 8 getZkStateReader throwing AlreadyClosedException > > Hello, > > We had two different SolrClients failing on different collections

8.2.0 After changing replica types, state.json is wrong and replication no longer takes place

2019-08-22 Thread Markus Jelsma
Hello, There is a newly created 8.2.0 all NRT type cluster for which i replaced each NRT replica with a TLOG type replica. Now, the replicas no longer replicate when the leader receives data. The situation is odd, because some shard replicas kept replicating up until eight hours ago, another

RE: 8.2.0 After changing replica types, state.json is wrong and replication no longer takes place

2019-08-23 Thread Markus Jelsma
asn't caused any issues. > > I'll make a note to check state.json next time we encounter the > situation to see if I can see what you reported. > > Regards, > Ere > > Markus Jelsma kirjoitti 22.8.2019 klo 16.36: > > Hello, > > > > There is a newly created

StackOverflowError leader election on 8.2.0

2019-08-21 Thread Markus Jelsma
Hello, Looking this up i found SOLR-5692, but that was solved a lifetime ago, so just checking if this is a familiar error and one i missing in Jira: A client's Solr 8.2.0 cluster brought us the next StackOverflowError while running 8.2.0 on Java 8: Exception in thread

RE: SolrClient from inside processAdd function

2019-09-04 Thread Markus Jelsma
Hello Arnold, Yes, we do this too for several cases. You can create the SolrClient in the Factory's inform() method, and pass is to the URP when it is created. You must implement SolrCoreAware and close the client when the core closes as well. Use a CloseHook for this. If you do not close the

RE: SolrClient from inside processAdd function

2019-09-05 Thread Markus Jelsma
Is there any way to get the information about the current Solr endpoint > from within the custom URP? > > On Wed, Sep 4, 2019 at 3:10 PM Markus Jelsma > wrote: > > > Hello Arnold, > > > > Yes, we do this too for several cases. > > > > You can create the So

RE: Custom update processor not kicking in

2019-09-18 Thread Markus Jelsma
Hello Rahul, I don't know why you don't see your logs lines, but if i remember correctly, you must put all custom processors above Log, Distributed and Run, at least i remember i read it somewhere a long time ago. We put all our custom processors on top of the three default processors and

RE: Position search

2019-10-15 Thread Markus Jelsma
that approach work for the other use case of searching from end of > documents ? > For example if I need to perform some term search from the end, e.g. "book" > in the last 30 or 100 words. > > Is there SpanLastQuery ? > > Thanks, > Adi > > -Original Me

RE: Position search

2019-10-15 Thread Markus Jelsma
Hello Adi, Try SpanFirstQuery. It limits the search to within the Nth term in the field. Regards, Markus -Original message- > From:Kaminski, Adi > Sent: Tuesday 15th October 2019 8:25 > To: solr-user@lucene.apache.org > Subject: Position search > > Hi, > What's the recommended way

PreAnalyzedFieldUpdateProcessor issues in Solrcloud

2019-12-20 Thread Markus Jelsma
Hello, We are moving our text analysis to outside of Solr and use PreAnalyzedField to speed up indexing. We also use MLT, but these two don't work together, there is no way for MLT to properly analyze a document using the PreAnalyzedField's analyzer, and it does not pass the code in the MLT

RE: Repeatable search term bug in Solr 8?

2020-02-27 Thread Markus Jelsma
Hello Phil, Solr never returns "The website encountered an unexpected error. Please try again later." as an error. To get to the root of the problem, you should at least post error logs that Solr actually throws, if it does at all. You either have an application error, or an actual Solr

RE: heavy reads from disk when off-heap ram is constrained

2020-02-27 Thread Markus Jelsma
Hello Kyle, This is actually the manual [1] clearly warns for. Snippet copied from the manual: "When setting the maximum heap size, be careful not to let the JVM consume all available physical memory. If the JVM process space grows too large, the operating system will start swapping it, which

Solr 8.x Startup problems when ZK is partially unavailable

2020-01-10 Thread Markus Jelsma
Hello, I have multiple collections, one 7.5.0 and the rest is on 8.3.1. They all share the same ZK ensemble and have the same ZK connection string. The first ZK address in the connection string is one that is not reachable, it seems firewalled, the rest is accessible. The 7.5.0 nodes do not

RE: 8.5.1 LogReplayer extremely slow

2020-05-12 Thread Markus Jelsma
I found the bastard, it was a freaky document that skrewed Solr over, indexing kept failing, passing documents between replica's times out, documents get reindexed and so the document (and others) end up in the transaction log (many times) and are eligible for reindexing. Reindexing and

8.5.1 LogReplayer extremely slow

2020-05-11 Thread Markus Jelsma
Hello, Our main Solr text search collection broke down last night (search was still working fine), every indexing action timed out with the Solr master spending most of its time in Java regex. One shard has only one replica left for queries and it stays like that. I have copied both shard's

RE: Indexing Korean

2020-05-01 Thread Markus Jelsma
Hello, Although it is not mentioned in Solr's language analysis page in the manual, Lucene has had support for Korean for quite a while now. https://lucene.apache.org/core/8_5_0/analyzers-nori/index.html Regards, Markus -Original message- > From:Audrey Lorberfeld -

RE: Manipulating client's query using a Query object

2020-08-17 Thread Markus Jelsma
> (Or should we be using this extended ExtendedDisMaxQParser class server > side in Solr?) > > Kind regards, > > Edd > > ---- > Edward Turner > > > On Mon, 17 Aug 2020 at 15:06, Markus Jelsma > wrote: > > > Hello Edward, > > > &

Drop bad document in update batch

2020-08-18 Thread Markus Jelsma
Hello, Normally, if a single document is bad, the whole indexing batch is dropped. I think i remember there was an URP(?) that discards bad documents from the batch, but i cannot find it in the manual [1]. Is it possible or am i starting to imagine things? Thanks, Markus [1]

RE: Trailing space issue with indexed data.

2020-08-18 Thread Markus Jelsma
Hello, You can use TrimFieldUpdateProcessorFactory [1] in your URP chain to remove leading or trailing whitespace when indexing. Regards, Markus [1] https://lucene.apache.org/solr/8_6_0//solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html -Original

RE: Drop bad document in update batch

2020-08-18 Thread Markus Jelsma
Subject: Re: Drop bad document in update batch > > I think you’re looking for TolerantUpdateProcessor(Factory), added in > SOLR-445. It hung around for a LOGGG time and didn’t actually get > added until 6.1. > > > On Aug 18, 2020, at 12:51 PM, Markus J

RE: Manipulating client's query using a Query object

2020-08-17 Thread Markus Jelsma
Hello Edward, Yes you can by extending ExtendedDismaxQParser [1] and override its parse() method. You get the main Query object through super.parse(). If you need even more fine grained control on how Query objects are created you can extend ExtendedSolrQueryParser's [2] (inner class)

RE: advice on whether to use stopwords for use case

2020-10-01 Thread Markus Jelsma
Well, when not splitting on whitespace you can the CharFilter for regex replacements [1] to clear the entire search string if anywhere in the string a banned word is found: .*(cigarette|tobacco).* [1]

<    10   11   12   13   14   15   16   >