RE: Debug logging in Maven project

2017-01-10 Thread Markus Jelsma
g > Subject: Re: Debug logging in Maven project > > Seems like you have enabled only console appender. I remember there was a > changed made to disable console appender if Solr is started in background > mode. > > On Jan 10, 2017 5:55 AM, "Markus Jelsma" <markus.je

RE: Debug logging in Maven project

2017-01-10 Thread Markus Jelsma
sole appender. I remember there was a > > changed made to disable console appender if Solr is started in background > > mode. > > > > On Jan 10, 2017 5:55 AM, "Markus Jelsma" <markus.jel...@openindex.io> wrote: > > > > > Hello, > > > >

RE: DocTransformer not always working

2016-12-21 Thread Markus Jelsma
user, or specified as "extra" by the > : > DocTransformer. > : > > : > IIUC you want the stored value of the "minhash" field to be available to > : > you, but the response writer code doesn't know that -- it just knows you > : > want "minh

RE: Exception while integrating openNLP with Solr

2017-03-22 Thread Markus Jelsma
Hi - We are not having large issues using OpenNLP for POS-tagging in Lucene. But you mention commits, a committing with or without POS payloads is hardly any different so commits should be unaffected. Maybe you have another issue? Perhaps use a sampler to pinpoint the problem. Markus

RE: Exception while integrating openNLP with Solr

2017-03-22 Thread Markus Jelsma
Hi - We don't use that OpenNLP patch, nor do we use such kind of lemmatizer. We just rely on POS-tagging via a CharFilter with custom trained maxent models and it is fast enough. So, do you really need that analyzer that is giving you a hard time? I don't know what that lemmatizer does but you

RE: Exception while integrating openNLP with Solr

2017-03-22 Thread Markus Jelsma
Hello - there is an underlying SIOoBE causing you trouble: at java.lang.Thread.run(Thread.java:745) *Caused by: java.lang.ArrayIndexOutOfBoundsException: 1* at opennlp.tools.lemmatizer.SimpleLemmatizer.(SimpleLemmatizer.java:46) Regards,, Marks -Original message- >

RE: Exception while integrating openNLP with Solr

2017-03-22 Thread Markus Jelsma
Hello - you need to increase the heap to work around the out of memory exception. There is not much you can to do increase the indexing speed using OpenNLP. Regards, Markus -Original message- > From:aruninfo100 > Sent: Wednesday 22nd March 2017 12:27 > To:

Solr 6.x leaking one SolrZkClient instance per second?

2017-04-04 Thread Markus Jelsma
Hi, One of our nodes became berzerk after a restart, Solr went completely nuts! So i opened VisualVM to keep an eye on it and spotted a different problem that occurs in all our Solr 6.4.2 and 6.5.0 nodes. It appears Solr is leaking one SolrZkClient instance per second via

RE: Solr 6.x leaking one SolrZkClient instance per second?

2017-04-04 Thread Markus Jelsma
nstance per second? > > Please open a Jira issue. Thanks! > > On Tue, Apr 4, 2017 at 7:16 PM, Markus Jelsma > <markus.jel...@openindex.io> wrote: > > Hi, > > > > One of our nodes became berzerk after a restart, Solr went completely nuts! > > So i opene

maxDoc ten times greater than numDoc

2017-04-12 Thread Markus Jelsma
Hi, One of our 2 shard collections is rather small and gets all its entries reindexed every 20 minutes orso. Now i just noticed maxDoc is ten times greater than numDoc, the merger is never scheduled but settings are default. We just overwrite the existing entries, all of them. Here are the

RE: maxDoc ten times greater than numDoc

2017-04-12 Thread Markus Jelsma
Hello - i know it includes all those deleted/overwritten documents. But having 89,9 % deleted documens is quite unreasonable, so i would expect the mergeScheduler to kick in at least once in a while. It doesn't with default settings so i am curious what is wrong. Our large regular search

RE: CommonGrams

2017-04-11 Thread Markus Jelsma
Hi - i cannot think of any real drawback right away. But you probably can expect a slightly different ordered MLT response. It should not be a problem if you select enough terms for MLT lookup. Regards, Markus -Original message- > From:David Hastings

RE: Deleting a field in schema.xml, reindex needed?

2017-04-11 Thread Markus Jelsma
Hi - We did this on one occasion and Solr started complaining in the logs about a field that is present but not defined. We thought the problem would go away within 30 days - the time within every document is reindexed or deleted - but it did not, for some reason. Forcing a merge did not solve

RE: Solr Shingle is not working properly in solr 6.5.0

2017-04-05 Thread Markus Jelsma
SOLR-10423: Disable graph query production via schema configuration > . >   This fixes broken queries for ShingleFilter-containing query-time analyzers >when request param sow=false. >   (Steve Rowe) > - > > -- > Steve > www.lucidworks.com > > > On

RE: maxDoc ten times greater than numDoc

2017-04-13 Thread Markus Jelsma
te-based query may trigger this. I think there was a bug about > > abandoned child or something. > > > > This is pure speculation of course. > > > > Regards, > >Alex. > > > > http://www.solr-start.com/ - Resources for Solr users, new and e

RE: keywords not found - google like feature

2017-04-13 Thread Markus Jelsma
; To: solr-user@lucene.apache.org > Subject: Re: keywords not found - google like feature > > Another ugly solution would be to use the debugQuery=true option, then > analyze the reults in explain, if the word isnt in the explain, then you > strike it out. > > On Thu, Apr 13, 2017 a

RE: keywords not found - google like feature

2017-04-13 Thread Markus Jelsma
Hi - There is no such feature out-of-the-box in Solr. But you probably could modify a highlighter implementation to return this information, the highlighter is the component that comes closest to that feature. Regards, Markus -Original message- > From:Nilesh Kamani

RE: maxDoc ten times greater than numDoc

2017-04-13 Thread Markus Jelsma
riginal segments got merged away so > I doubt it's any weirdness with a small index. > > Speaking of small index, why are you sharding with only > 8K docs? Sharding will probably slow things down for such > a small index. This isn't germane to your question though. > > Best, > Er

CloudDescriptor.getNumShards() sometimes returns null

2017-04-14 Thread Markus Jelsma
Hi - I've got this 2 shard/2 replica cluster. In handler i need the number of shards of the cluster. cloudDescriptor = core.getCoreDescriptor().getCloudDescriptor(); return cloudDescriptor.getNumShards(); It is, however, depending on which node is executing this, sometimes null!

RE: Use case for the Shingle Filter

2017-03-05 Thread Markus Jelsma
Hello - we use it for text classification and online near-duplicate document detection/filtering. Using shingles means you want to consider order in the text. It is analogous to using bigrams and trigrams when doing language detection, you cannot distinguish between Danish and Norwegian solely

bin/solr -a doesn't work?

2017-03-01 Thread Markus Jelsma
Hello, Because we upload large files to Zookeeper, i tried: bin/solr restart -c -m 1500m -a "-Djute.maxbuffer=0xF2" But the script keeps hanging, and no Solr is started. The -a parameter doesn't seem to work. I am missing something very obvious? Thanks, Markus

RE: bin/solr -a doesn't work?

2017-03-02 Thread Markus Jelsma
g > Subject: Re: bin/solr -a doesn't work? > > Hi Markus, > > Maybe you can post the script or error message here, so we can have a > better understanding of the situation. > > Regards, > Edwin > > > On 1 March 2017 at 19:53, Markus Jelsma <markus.jel...

RE: Solr Shingle is not working properly in solr 6.5.0

2017-04-05 Thread Markus Jelsma
Steve - please include a broad description of this feature in the next CHANGES.txt. I will forget about this thread but need to be reminded of why i could need it :) Thanks, Markus -Original message- > From:Steve Rowe > Sent: Wednesday 5th April 2017 23:26 > To:

RE: Is there a way to determine fields available for faceting for a search without doing the faceting?

2017-08-10 Thread Markus Jelsma
solr/search/admin/luke?show=schema=json=true gives you schema information. Look for all fields that are string, int etc and test if they are either indexed are have docValues. -Original message- > From:Michael Joyner > Sent: Thursday 10th August 2017 16:12 > To:

RE: Token "states" not getting lemmatized by Solr?

2017-08-11 Thread Markus Jelsma
I checked our English analyzer using KStemFilter. To my surprise, both united and states are not affected by the filter. Regards, Markus -Original message- > From:Ahmet Arslan > Sent: Thursday 10th August 2017 21:57 > To: solr-user@lucene.apache.org >

RE: 6.6.0 getNumShards() NPE?!

2017-08-14 Thread Markus Jelsma
t; > On Thu, Aug 10, 2017 at 4:12 PM, Markus Jelsma > <markus.jel...@openindex.io> wrote: > > I can now reproduce it on the two shard, two replica cluster. > > > > It does NOT happen on the collection_shard1_replica1 and > > collection_shard2_replica1

6.6.0 getNumShards() NPE?!

2017-08-10 Thread Markus Jelsma
Hello, Having trouble, again, with CloudDescriptor and friend, getting the number of shards of the collection. It sometimes returns 1 for a collection of two shards. Having this code: cloudDescriptor = core.getCoreDescriptor().getCloudDescriptor(); return

RE: 6.6.0 getNumShards() NPE?!

2017-08-10 Thread Markus Jelsma
I can now reproduce it on the two shard, two replica cluster. It does NOT happen on the collection_shard1_replica1 and collection_shard2_replica1 nodes. It happens consistently on the collection_shard1_replica2 and collection_shard2_replica2 nodes. Any ideas? -Original message- >

RE: Slowly running OOM due to Query instances?!

2017-07-07 Thread Markus Jelsma
to Query instances?! > > What changed in the system? > > Has there been a code change, increased QPS or different types of queries > being run? > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Fri, Jul 7, 2017 at 8:07 AM, Markus Jel

RE: Slowly running OOM due to Query instances?!

2017-07-07 Thread Markus Jelsma
ss did you know how much total cache > consumption may go based on your current solrconfig.xml settings. Also 2 > shards and 3 replca's are on 6 such machines i assume. > > Thanks, > Susheel > > On Fri, Jul 7, 2017 at 7:01 AM, Markus Jelsma <markus.jel...@openindex.io> &g

RE: SolrJ 6.6.0 Connection pool shutdown now with stack trace

2017-07-18 Thread Markus Jelsma
pool shutdown now with stack trace > > Do you see any errors etc. in solr.log during this time? > > On Tue, Jul 18, 2017 at 7:10 AM, Markus Jelsma <markus.jel...@openindex.io> > wrote: > > > The problem was never re

RE: SolrJ 6.6.0 Connection pool shutdown now with stack trace

2017-07-18 Thread Markus Jelsma
> (firewall). Yes, that's using the hammer to swat a fly :-) > > > > Regards, > >    Alex. > > > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > > > > On 29 June 2017 at 08:21, Markus Jelsma <ma

RE: SolrJ 6.6.0 Connection pool shutdown now with stack trace

2017-07-18 Thread Markus Jelsma
ests/client/connections happening concurrently. > > Thanks, > Susheel > > On Tue, Jul 18, 2017 at 8:43 AM, Markus Jelsma <markus.jel...@openindex.io> > wrote: > > > Hello Susheel, > > > > No, nothing at all. I've check all six nodes, th

Slowly running OOM due to Query instances?!

2017-07-07 Thread Markus Jelsma
Hello, This morning i spotted our QTime suddenly go up. This has been going on for a few hours by now and coincides with a serious increase in heap consumption. No node ran out of memory so far but either that is going to happen soon, or the nodes become unusable in another manner. I

RE: Slowly running OOM due to Query instances?!

2017-07-07 Thread Markus Jelsma
ote that there were fixes made in Solr > 6.6 with PayloadScoreQuery in this regard. See LUCENE-7808 and LUCENE-7481 > > Erik > > > > On Jul 7, 2017, at 7:01 AM, Markus Jelsma <markus.jel...@openindex.io> > > wrote: > > > > Hello, &g

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-20 Thread Markus Jelsma
again, but even at those moments, GC is minimal and the heap stays at about 55 - 60 % and only peaks every 15 minutes when documents are indexed. Thanks, Markus -Original message- > From:Shawn Heisey <apa...@elyograg.org> > Sent: Wednesday 19th July 2017 16:08 > To

6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
Hello, Another peculiarity here, our six node (2 shards / 3 replica's) cluster is going crazy after a good part of the day has passed. It starts eating CPU for no good reason and its latency goes up. Grafana graphs show the problem really well After restarting 2/6 nodes, there is also quite a

RE: Optimize stalls at the same point

2017-07-25 Thread Markus Jelsma
Upgrade to 6.x and get, in general, decent JVM settings. And decrease your heap, having it so extremely large is detrimental at best. Our shards can be 25 GB in size, but we run fine (apart from other problems recently discovered) with a 900 MB heap, so you probably have a lot of room to

RE: Optimize stalls at the same point

2017-07-25 Thread Markus Jelsma
i.apache.org/solr/ShawnHeisey > GC_TUNE=" \ > -XX:+UseG1GC \ > -XX:+ParallelRefProcEnabled \ > -XX:G1HeapRegionSize=8m \ > -XX:MaxGCPauseMillis=200 \ > -XX:+UseLargePages \ > -XX:+AggressiveOpts \ > “ > > Last week, I benchmarked the 4.x config handling 15,000 requests/minut

RE: High CPU utilization on Upgrading to Solr Version 6.3

2017-07-27 Thread Markus Jelsma
> Max heap is 25G for each Solr Process. (Xms 25g Xmx 25g) You can most likely drop this to Xmx 1g and all your problems are then most likely solved just by doing so. Regards, Markus -Original message- > From:Atita Arora > Sent: Thursday 27th July 2017 9:30 >

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-26 Thread Markus Jelsma
dexing peak? And > also that there free RAM (after heap allocation)? Can it happen that > warming query is unnecessary heavy? Also, explicit commits might cause > issues, consider the best practice with auto-commit and openSearcher=false > and soft commit when necessary. > > >

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
> > Can you expose the stack deeper? > Can they start to sync shards due to some reason? > > On Wed, Jul 19, 2017 at 12:35 PM, Markus Jelsma <markus.jel...@openindex.io> > wrote: > > > Hello, > > > > Another peculiarity here, our six node (2 shards / 3

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
Cheers -- Rick > > On July 19, 2017 5:35:32 AM EDT, Markus Jelsma <markus.jel...@openindex.io> > wrote: > >Hello, > > > >Another peculiarity here, our six node (2 shards / 3 replica's) cluster > >is going crazy after a good part of the day has passed.

RE: 6.6 cloud starting to eat CPU after 8+ hours

2017-07-19 Thread Markus Jelsma
k even from solradmin. Overall, this > behavior looks like typical heavy merge kicking off from time to time. > > On Wed, Jul 19, 2017 at 3:31 PM, Markus Jelsma <markus.jel...@openindex.io> > wrote: > > > Hello, > > > > No i cannot expose the stack, VisualV

RE: SolrJ 6.6.0 Connection pool shutdown

2017-06-29 Thread Markus Jelsma
some sort of unexpected hardware piece of equipment > (firewall). Yes, that's using the hammer to swat a fly :-) > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 29 June 2017 at 08:21, Markus Jelsma &l

RE: OpenNLP and Solr

2017-07-06 Thread Markus Jelsma
Hi - There is no out-of-the-box integration of OpenNLP in Lucene at this moment, but there is an ancient patch if you are adventurous. Regards, LUCENE-2899 -Original message- > From:meenu > Sent: Thursday 6th July 2017 16:26 > To: solr-user@lucene.apache.org >

RE: SolrJ 6.6.0 Connection pool shutdown

2017-06-29 Thread Markus Jelsma
t; > Sent: Tuesday 27th June 2017 23:02 > To: solr-user@lucene.apache.org > Subject: Re: SolrJ 6.6.0 Connection pool shutdown > > On 6/27/2017 6:50 AM, Markus Jelsma wrote: > > We have a proces checking presence of many documents in a collection, just > > a simple cli

RE: High disk write usage

2017-07-05 Thread Markus Jelsma
Try mergeFactor of 10 (default) which should be fine in most cases. If you got an extreme case, either create more shards and consider better hardware (SSD's) -Original message- > From:Antonio De Miguel > Sent: Wednesday 5th July 2017 16:48 > To:

RE: CloudDescriptor.getNumShards() sometimes returns null

2017-04-24 Thread Markus Jelsma
Hi - that (RE: Overseer session expires on multiple collection creation) was the wrong thread. I meant, any ideas on this one? Many thanks, Markus -Original message- > From:Markus Jelsma > Sent: Friday 14th April 2017 17:25 > To: solr-user

RE: Overseer session expires on multiple collection creation

2017-04-24 Thread Markus Jelsma
Hi - any ideas on this one? Many thanks, Markus -Original message- > From:apoorvqwerty > Sent: Friday 21st April 2017 15:04 > To: solr-user@lucene.apache.org > Subject: Overseer session expires on multiple collection creation > > Hi, > I am trying to create

RE: CloudDescriptor.getNumShards() sometimes returns null

2017-04-24 Thread Markus Jelsma
ards() sometimes returns null > > What version of Solr? This has been reworked pretty heavily lately no > 6x and trunk. > > On Mon, Apr 24, 2017 at 2:24 AM, Markus Jelsma > <markus.jel...@openindex.io> wrote: > > Hi - that (RE: Overseer session expires on multiple c

RE: Solr 6 and IDF

2017-08-08 Thread Markus Jelsma
> This is so we can test this. We think it will help, but we'll see. > > On Tue, Aug 8, 2017 at 3:53 PM, Markus Jelsma <markus.jel...@openindex.io> > wrote: > > > Yes, extend the default Similarity, return 1.0f for idf and probably the > > idfExplain methods, an

RE: Solr 6 and IDF

2017-08-08 Thread Markus Jelsma
ty” is not > twice as much about New York. Same for the movie “New York, New York”. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Aug 8, 2017, at 2:18 PM, Markus Jelsma <markus.jel...@openindex.io

RE: Solr 6 and IDF

2017-08-08 Thread Markus Jelsma
Yes, extend the default Similarity, return 1.0f for idf and probably the idfExplain methods, and configure it in your schema, global or per-field. If you think this is a good idea, why not also return 1.0f for tf? And while you're at it, also omitNorms on all fields entirely? I am curious if

RE: Solr uses lots of shared memory!

2017-08-23 Thread Markus Jelsma
Subject: Re: Solr uses lots of shared memory! > > On 8/22/2017 7:24 AM, Markus Jelsma wrote: > > I have never seen this before, one of our collections, all nodes eating > > tons of shared memory! > > > > Here's one of the nodes: > > 10497 solr  20   0 19.439g

RE: 6.5.1. cloud went partially down

2017-05-10 Thread Markus Jelsma
I am not this is directly related but we also sometimes see clients losing connections on 6.5.1, this with the problem described below are unique to 6.5.1, i have not seen this many issues with cloud in a short time for a very long time. 2017-05-09 21:30:36.661 ERROR (Document compiler)

RE: Estimating CPU

2017-06-20 Thread Markus Jelsma
To add on Erick, First thing that comes to mind, you also have a huge heap, do you really need it to be that large, if not absolutely necessary, reduce it. If you need it because of FieldCache, consider DocValues instead and reduce the heap again. Use tools like VisualVM to see what the CPU is

RE: How can I enable NER Plugin in Solr 6.x

2017-06-22 Thread Markus Jelsma
Solr hasn't got built in support for NER, but you can try its UIMA integration with external third-party suppliers: https://cwiki.apache.org/confluence/display/solr/UIMA+Integration -Original message- > From:FOTACHE CHRISTIAN > Sent: Thursday 22nd

RE: Proximity searches with a wildcard

2017-06-23 Thread Markus Jelsma
Sure: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser -Original message- > From:Michael Craven > Sent: Friday 23rd June 2017 22:06 > To: solr-user@lucene.apache.org > Subject: Proximity searches with a wildcard >

RE: Questions about typical/simple clustered Solr software and hardware architecture

2017-06-23 Thread Markus Jelsma
Hello, see inline. -Original message- > From:ken edward > Sent: Friday 23rd June 2017 21:07 > To: solr-user@lucene.apache.org > Subject: Questions about typical/simple clustered Solr software and hardware > architecture > > Hello, > > I am brand new to Solr,

RE: Spread SolrCloud across two locations

2017-05-24 Thread Markus Jelsma
Hi - Again, hiring a simple VM at a third location without a Solr cloud sounds like the simplest solution. It keeps the quorum tight and sound. This simple solution is the one i would try first. Or am i completely missing something and sound like an idiot? Could be, of course. Regards, Markus

RE: How to avoid unnecessary query parsing on distributed search in QueryComponent.prepare()?

2017-05-24 Thread Markus Jelsma
I've asked myself this question too some times. In this case extending MLT QParser. So far, i've not found a simple means to propagate a parsed top-level Lucene query object over the wire. But, since there is a clear toString for that Query object, if we could retranslate that String to a

RE: Spread SolrCloud across two locations

2017-05-23 Thread Markus Jelsma
I would probably start by renting a VM at a third location to run Zookeeper. Markus -Original message- > From:Jan Høydahl > Sent: Tuesday 23rd May 2017 11:09 > To: solr-user > Subject: Spread SolrCloud across two locations > > Hi, >

RE: Custom Response writer

2017-06-16 Thread Markus Jelsma
Yes, index the employee and item names instead of only their ID's. And if you can't for some reason, i'd implement a DocTransformer instead of a ResponseWriter. Regards, Markus -Original message- > From:mganeshs > Sent: Friday 16th June 2017 16:19 > To:

IndexSchema uniqueKey is not stored

2017-06-16 Thread Markus Jelsma
Hi, Moving over to docValues as stored field i got this: o.a.s.s.IndexSchema uniqueKey is not stored - distributed search and MoreLikeThis will not work But distributed and MLT still work. Is the warning actually obsolete these days? Regards, Markus

6.5.1. cloud went partially down

2017-05-08 Thread Markus Jelsma
Hi, Multiple 6.5.1. clouds / collections went down this weekend around the same time, they share the same ZK quorum. The nodes stayed up but did not rejoin the cluster (find or connect to ZK) This is what the log told us: 2017-05-06 18:58:34.893 WARN

RE: Term no longer matches if PositionLengthAttr is set to two

2017-05-01 Thread Markus Jelsma
Hello again, apologies for cross-posting and having to get back to this unsolved problem. Initially i thought this is a problem i have with, or in Lucene. Maybe not, so is this problem in Solr? Is here anyone who has seen this problem before? Many thanks, Markus -Original message- >

RE: CloudDescriptor.getNumShards() sometimes returns null

2017-05-03 Thread Markus Jelsma
24th April 2017 16:50 > > To: solr-user <solr-user@lucene.apache.org> > > Subject: Re: CloudDescriptor.getNumShards() sometimes returns null > > > > What version of Solr? This has been reworked pretty heavily lately no > > 6x and trunk. > > > >

RE: Term no longer matches if PositionLengthAttr is set to two

2017-05-04 Thread Markus Jelsma
Ok, we decided not to implement PositionLengthAttribute for now due to, it either is a bad applied (how could one even misapply that attribute?) or Solr's QueryBuilder has a weird way of dealing with it or.. well. Thanks, Markus -Original message- > From:Markus Jelsma

SolrJ 6.6.0 Connection pool shutdown

2017-06-27 Thread Markus Jelsma
Hi, We have a proces checking presence of many documents in a collection, just a simple client.getById(id). It sometimes begins throwing lots of these exceptions in a row: org.apache.solr.client.solrj.SolrServerException: java.lang.IllegalStateException: Connection pool shut down Then, as

RE: How to remove control characters in stored value at Solr side

2017-09-18 Thread Markus Jelsma
I agree. But, can you then explain why Apache Nutch with SolrJ had this problem? It seems that by default SolrJ does use XML as transport format. We have always used SolrJ which i assumed would default to javabin, but we had this exact problem anyway, and solved it by stripping non-character

RE: How to remove control characters in stored value at Solr side

2017-09-19 Thread Markus Jelsma
Ah, thanks! -Original message- > From:Chris Hostetter > Sent: Monday 18th September 2017 23:11 > To: solr-user@lucene.apache.org > Subject: RE: How to remove control characters in stored value at Solr side > > > : But, can you then explain why Apache Nutch

RE: How to remove control characters in stored value at Solr side

2017-09-14 Thread Markus Jelsma
Hello, You can not do this in Solr, you cannot even send non-character code points in the first place. For Apache Nutch we solved the problem by stripping those non-character code points from Strings before putting them in SolrDocument. Check the ticket, you can easily resuse the strip method.

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
reclaimDeletesWeight in merge settings to some higher value > > than default (I think it is 2) to favor segments with deleted docs when > > merging. > > > > HTH, > > Emir > > -- > > Monitoring - Log Management - Alerting - Anomaly Detection > >

Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
Hello, Using a 6.6.0, i just spotted one of our collections having a core of which over 80 % of the total number of documents were deleted documents. It has configured with no non-default settings. Is this supposed to happen? How can i prevent these kind of numbers? Thanks, Markus

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 4 Oct 2017, at 13:31, Markus Jelsma <markus.jel...@openindex.io> wrote: > > > > Hello, > > &

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
rge or full reindexing + aliases. > > HTH, > Emir > > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 4 Oct 2017, at 14:47, Markus Jelsma <markus.je

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
merged until you have 97.5G worth of deleted > docs. > > More here: > https://issues.apache.org/jira/browse/LUCENE-7976 > > Erick > > On Wed, Oct 4, 2017 at 5:47 AM, Markus Jelsma > <markus.jel...@openindex.io> wrote: > > Do you mean a periodic forceMerge?

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
s because this is such a small > index you don't have very many like-sized segments to merge after your > periodic run. Setting segs per tier to a much lower number (like 2) > might kick in the background merging. It'll make more I/O during > indexing happen of course. > >

RE: Moving to Point, trouble with IntPoint.newRangeQuery()

2017-09-26 Thread Markus Jelsma
Thanks! I'll try it and get back later! -Original message- > From:Chris Hostetter > Sent: Tuesday 26th September 2017 18:52 > To: Solr-user > Subject: Re: Moving to Point, trouble with IntPoint.newRangeQuery() > > > : I have a

RE: Moving to Point, trouble with IntPoint.newRangeQuery()

2017-09-26 Thread Markus Jelsma
> range queries, so you can’t just use them interchangeably, you have to > reindex your data. I’m guessing your ‘d1’ field here is a TrieIntField or > similar? > > Alan Woodward > www.flax.co.uk > > > > On 26 Sep 2017, at 12:22, Markus Jelsma <markus.jel...@openi

RE: Moving to Point, trouble with IntPoint.newRangeQuery()

2017-10-03 Thread Markus Jelsma
Ok, i have stripped down the QParser to demonstrate the problem. This is the basic test with only one document in the index: public void testPointRange() throws Exception { assertU(adoc("id", "8", "digest1", "-1820898630")); assertU(commit()); assertQ( req("q", "{!qdigest

RE: Moving to Point, trouble with IntPoint.newRangeQuery()

2017-10-03 Thread Markus Jelsma
Ok, it has been resolved. I was lucky to have spotted i was looking at the wrong schema fike! The one the test actually used was not yet updated from Trie to Point! Thanks! Markus -Original message- > From:Markus Jelsma > Sent: Tuesday 3rd October 2017

Moving to Point, trouble with IntPoint.newRangeQuery()

2017-09-26 Thread Markus Jelsma
Hello, I have a QParser impl. that transforms text input to one or more integers, it makes a BooleanQuery one a field with all integers in OR-more. It used to work by transforming the integer using LegacyNumericUtils.intToPrefixCoded, getting a BytesRef. I have now moved it to use

RE: Facet on a Payload field type?

2017-08-23 Thread Markus Jelsma
he correct one for the user's > language, but I'm curious as to how others have solved this. > > On Wed, Aug 23, 2017 at 2:10 PM, Markus Jelsma <markus.jel...@openindex.io> > wrote: > > > Technically they could, facetting is possible on TextField, but it would &

RE: Solr uses lots of shared memory!

2017-08-24 Thread Markus Jelsma
ect: Re: Solr uses lots of shared memory! > > Just an idea, how about taking a dump with jmap and using > MemoryAnalyzerTool to see what is going on? > > Regards > Bernd > > > Am 24.08.2017 um 11:49 schrieb Markus Jelsma: > > Hello Shalin, > > > >

RE: Solr memory leak

2017-08-28 Thread Markus Jelsma
See https://issues.apache.org/jira/browse/SOLR-10506 Fixed for 7.0 Markus -Original message- > From:Hendrik Haddorp > Sent: Monday 28th August 2017 17:42 > To: solr-user@lucene.apache.org > Subject: Solr memory leak > > Hi, > > we noticed that triggering

RE: Solr memory leak

2017-08-28 Thread Markus Jelsma
It is, unfortunately, not committed for 6.7. -Original message- > From:Markus Jelsma > Sent: Monday 28th August 2017 17:46 > To: solr-user@lucene.apache.org > Subject: RE: Solr memory leak > > See https://issues.apache.org/jira/browse/SOLR-10506 > Fixed

RE: Facet on a Payload field type?

2017-08-23 Thread Markus Jelsma
Technically they could, facetting is possible on TextField, but it would be useless for facetting. Payloads are only used for scoring via a custom Similarity. Payloads also can only contain one byte of information (or was it 64 bits?) Payloads are not something you want to use when dealing

RE: Solr uses lots of shared memory!

2017-08-23 Thread Markus Jelsma
it reaches some sort of plateau. -Original message- > From:Shawn Heisey <apa...@elyograg.org> > Sent: Wednesday 23rd August 2017 16:37 > To: solr-user@lucene.apache.org > Subject: Re: Solr uses lots of shared memory! > > On 8/23/2017 7:32 AM, Markus Jelsma wrote: > > W

RE: EdgeNGramFilterFactory More specific ?

2017-08-24 Thread Markus Jelsma
NGramFilter! -Original message- > From:Guilleret Florian > Sent: Thursday 24th August 2017 10:30 > To: solr-user@lucene.apache.org > Subject: EdgeNGramFilterFactory More specific ? > > I use a fieldtype who use EdgeNGramFilterFactory min 1 max 15. It

Solr uses lots of shared memory!

2017-08-22 Thread Markus Jelsma
Hi, I have never seen this before, one of our collections, all nodes eating tons of shared memory! Here's one of the nodes: 10497 solr 20 0 19.439g 4.505g 3.139g S 1.0 57.8 2511:46 java RSS is roughly equal to heap size + usual off-heap space + shared memory. Virtual is equal to

RE: Solr uses lots of shared memory!

2017-08-24 Thread Markus Jelsma
've already seen this, but top and similar can be > confusing when trying to interpret MMapDirectory. Uwe has an excellent > explication: > > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > Best, > Erick > > On Wed, Aug 23, 2017 at 9:10 AM,

RE: Solr uses lots of shared memory!

2017-08-24 Thread Markus Jelsma
> http://lucene.apache.org/solr/guide/6_6/datadir-and-directoryfactory-in-solrconfig.html#DataDirandDirectoryFactoryinSolrConfig-SpecifyingtheDirectoryFactoryForYourIndex > > On Wed, Aug 23, 2017 at 7:02 PM, Markus Jelsma > <markus.jel...@openindex.io> wrote: > > I do not think it i

RE: Search by similarity?

2017-08-25 Thread Markus Jelsma
Yes, that is roughly how MLT works as well. You can also do a full OR-search on the terms using LuceneQParser. Markus -Original message- > From:Junte Zhang > Sent: Friday 25th August 2017 18:38 > To: solr-user@lucene.apache.org > Subject: RE: Search by

RE: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled.

2017-08-31 Thread Markus Jelsma
ses and restart your ZooKeeper > instances (not all at the same time of course). This is what I do in these > cases and helped me so far. > > > > > Von:Markus Jelsma <markus.jel...@openindex.io> > An: Solr-user <solr-user@lucene.apache.org> > Da

RE: Solr uses lots of shared memory!

2017-09-04 Thread Markus Jelsma
;https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en > >Kevin Risden > > > > > >On Thu, Aug 24, 2017 at 10:19 AM, Markus Jelsma > ><markus.jel...@openindex.io> wrote: > >

RE: Solr uses lots of shared memory!

2017-09-01 Thread Markus Jelsma
; From:Bernd Fehling <bernd.fehl...@uni-bielefeld.de> > > Sent: Thursday 24th August 2017 15:39 > > To: solr-user@lucene.apache.org > > Subject: Re: Solr uses lots of shared memory! > > > > Just an idea, how about taking a dump with jmap and using > > Mem

RE: RE: Antwort: 6.6 Cannot talk to ZooKeeper - Updates are disabled.

2017-09-01 Thread Markus Jelsma
Keeper... (0) -> (4) > > 2017-08-31 10:01:56 INFO ZkClientClusterStateProvider:134 - Cluster at > > ZKHOST1:9983,ZKHOST2:9983,ZKHOST3:9983,ZKHOST4:9983,ZKHOST5:9983 ready > > > > > > > > > > > > Von:Markus Jelsma <markus.jel...@openindex.i

6.6 Cannot talk to ZooKeeper - Updates are disabled.

2017-08-31 Thread Markus Jelsma
Hello, One node is behaving badly, at least according to the logs, but the node is green in the cluster overview although the logs claim recovery fails all the time. It is not the first time this message pops up in the logs of one of the nodes, why can it not talk to Zookeeper? I miss a

<    8   9   10   11   12   13   14   15   16   >