Re: Facets based on sampling

2017-10-20 Thread John Davis
Hi Yonik, Any update on sampling based facets. The current faceting is really slow for fields with high cardinality even with method=uif. Or are there alternative work-arounds to only look at N docs when computing facets? On Fri, Nov 4, 2016 at 4:43 PM, Yonik Seeley wrote: >

Re: Upload/update full schema and solrconfig in standalone mode

2017-10-20 Thread Erick Erickson
SCP is just a copy program that allows you to copy files to a remote system. Think "cp -r..." But to a remote system. On Oct 20, 2017 11:47, "Alessandro Hoss" wrote: > Thanks for your comments Rick, > > Sorry, but I didn't understand what you mean with "scp". But let me

Re: Jetty maxThreads

2017-10-20 Thread Yonik Seeley
The high number of maxThreads is to avoid distributed deadlock. The fix is multiple thread pools, depending on request type: https://issues.apache.org/jira/browse/SOLR-7344 -Yonik On Wed, Oct 18, 2017 at 4:41 PM, Walter Underwood wrote: > Jetty maxThreads is set to

Re: Jetty maxThreads

2017-10-20 Thread Shawn Heisey
On 10/18/2017 2:41 PM, Walter Underwood wrote: > Jetty maxThreads is set to 10,000 which seams way too big. > > The comment suggests 5X the number of CPUs. We have 36 CPUs, which would mean > 180 threads, which seems more reasonable. I have not seen any evidence that maxThreads at 1 causes

Re: Deploy Solr to production: best practices

2017-10-20 Thread Shawn Heisey
On 10/18/2017 11:32 PM, maximka19 wrote: > *1.* Container: from Solr 5 there is now .WAR-file provided in package. I > couldn't deploy Solr 7.1 to Tomcat 9. None of existing tutorials or guides > helped. No such information for newer versions. The included Jetty is the only supported option since

Re: Certificate issue ERR_SSL_VERSION_OR_CIPHER_MISMATCH

2017-10-20 Thread Shawn Heisey
On 10/19/2017 6:30 AM, Younge, Kent A - Norman, OK - Contractor wrote: > Built a clean Solr server imported my certificates and when I go to the > SSL/HTTPS page it tells me that I have ERR_SSL_VERSION_OR_CIPHER_MISMATCH in > Chrome and in IE tells me that I need to TURN ON TLS 1.0, TLS 1.1, and

Re: Concern on solr commit

2017-10-20 Thread Shawn Heisey
On 10/18/2017 3:09 AM, Leo Prince wrote: > Is there any known negative impacts in setting up autoSoftCommit as 1 > second other than RAM usage..? For most users, setting autoSoftCommit to one second is a BAD idea.  In many indexes, commits take longer than one second to complete.  If you do heavy

Re: hi load mitigation

2017-10-20 Thread Shawn Heisey
On 10/17/2017 7:10 AM, j.s. wrote: > i run a stand alone solr instance in which usage has suddenly spiked a > bit. the load was at 8, but by adding another CPU i brought it down to > 2. much better but not where i'd like it to be. > > i guess i'm writing to see if anyone has any suggestions about

Retrieve DocIdSet from Query in lucene 5.x

2017-10-20 Thread Jamie Johnson
I am trying to migrate some old code that used to retrieve DocIdSets from filters, but with Filters being deprecated in Lucene 5.x I am trying to move away from those classes but I'm not sure the right way to do this now. Are there any examples of doing this?

Re: Upload/update full schema and solrconfig in standalone mode

2017-10-20 Thread Alessandro Hoss
Thanks for your comments Rick, Sorry, but I didn't understand what you mean with "scp". But let me explain our scenario: Our application is on-premises, so I can't control the infrastructure of the customer, they just tell me the Solr address and if Solr is running on Cloud mode or not. As our

Re: Solr facets counts deep paged returns inconsistent counts

2017-10-20 Thread Yonik Seeley
On Fri, Oct 20, 2017 at 2:22 PM, kenny wrote: > Thanks for the clear explanation. A couple of follow up questions > > - can we tune overrequesting in json API? > Yes, I still need to document it, but you can specify a specific number of documents to overrequest: { type :

Re: Solr facets counts deep paged returns inconsistent counts

2017-10-20 Thread kenny
Thanks for the clear explanation. A couple of follow up questions - can we tune overrequesting in json API? - we do see conflicting counts but that's when we have offsets different from 0. We have now already tested it in solr 6.6 with json api. We sometimes get the same value in different

Re: Solr nodes going into recovery mode and eventually failing

2017-10-20 Thread shamik
Zisis, thanks for chiming in. This is really an interesting information and probably in line what I'm trying to fix. In my case, the facet fields are certainly not high cardinal ones. Most of them have a finite set of data, the max being 200 (though it has a low usage percentage). Earlier I had

LTR feature extraction performance issues

2017-10-20 Thread Brian Yee
I enabled LTR feature extraction and response times spiked. I suppose that was to be expected, but are there any tips regarding performance? I have the feature values cache set up as described in the docs: Do I simply have to wait for the cache to fill up and hope that response times go

Re: OOM during indexing with 24G heap - Solr 6.5.1

2017-10-20 Thread Shawn Heisey
On 10/16/2017 5:38 PM, Randy Fradin wrote: > Each shard has around 4.2 million documents which are around 40GB on disk. > Two nodes have 3 shard replicas each and the third has 2 shard replicas. > > The text of the exception is: java.lang.OutOfMemoryError: Java heap space > And the heap dump is a

Re: Solr nodes going into recovery mode and eventually failing

2017-10-20 Thread shamik
Thanks Eric, in my case, each replica is running on it's own JVM, so even if we consider 8gb of filter cache, it still has 27gb to play with. Isn't this is a decent amount of memory to handle the rest of the JVM operation? Here's an example of implicit filters that get applied to almost all the

Re: Solr nodes going into recovery mode and eventually failing

2017-10-20 Thread Zisis T.
I'll post my experience too, I believe it might be related to the low FilterCache hit ratio issue. Please let me know if you think I'm off topic here to create a separate thread. I've run search stress tests on a 2 different Solr 6.5.1 installations sending Distributed search queries with facets

Re: Upload/update full schema and solrconfig in standalone mode

2017-10-20 Thread Rick Leir
Alessandro First, let me say that the whole idea makes me nervous. 1/ are you better off with scp? I would not want to do this via Solr API 2/ the right way to do this is with Ansible, Puppet or Docker, 3/ would you like to update a 'QA' installation, test it, then flip it into production?

Re: E-Commerce Search: tf-idf, tie-break and boolean model

2017-10-20 Thread Walter Underwood
Setting mm to 100% means that any misspelled word in a query means zero results. That is not a good experience. Usually, 10% of queries contain a misspelling. Set mm to 1. The F-measure is not a good choice for this because recall is not very important in e-commerce. Use precision-oriented

Re: Solr facets counts deep paged returns inconsistent counts

2017-10-20 Thread Yonik Seeley
Facet refinement in Solr guarantees that counts for returned constraints are correct, but does not guarantee that the top N returned isn't missing a constraint. Consider the following shard counts (3 shards) for the following constraints (aka facet values): constraintA: 2 0 0 constraintB: 0 2 0

SynonymFilterFactory deprecated

2017-10-20 Thread Vincenzo D'Amore
Hi all, I see in Solr SynonymFilterFactory is deprecated https://lucene.apache.org/core/7_1_0/analyzers-common/org/apache/lucene/analysis/synonym/SynonymFilterFactory.html the documentation suggest: Use SynonymGraphFilterFactory >

Re: solr core replication

2017-10-20 Thread Erick Erickson
Does that persist even after you restart Solr on the target cluster? And that clears up one bit of confusion I had, I didn't know how you were having each shard on the target cluster use a different master URL given they all use the same solrconfig file. I was guessing some magic with system

Solr facets counts deep paged returns inconsistent counts

2017-10-20 Thread kenny
Hi all, When we run some 'deep' facet counts (eg facet values from 0 to 500 and then from 500 to 1000), we see small but disturbing difference in counts between the two (for example last count on first batch 165, first count on second batch 167) We run this on solr 5.3.1 in cloud mode (3

Upload/update full schema and solrconfig in standalone mode

2017-10-20 Thread Alessandro Hoss
Hello, Is it possible to upload the entire schema and solrconfig.xml to a Solr running on standalone mode? I know about the Config API , but it allows only add or modify solrconfig properties, and what I want is to change the whole config

Re: Goal: reverse chronological display Methods? (1) boost, and/or (2) disable idf

2017-10-20 Thread Rick Leir
Bill, In the debug score calculations, the bf boosting does not appear at all. I would expect it to at least show up with a small value. So maybe we need to look at the query. Cheers -- Rick -- Sorry for being brief. Alternate email is rickleir at yahoo dot com

Solr json facet API contains option

2017-10-20 Thread kenny
Hi, I don't seem to find a 'contains' (with or without ignorecase) in the available descriptions of the JSON facet API. Is that because there is none? Or is it just not adequately described. For example in the official ref guide for 6.6 or 7.0 there is no mention of this feature. Is it

Re: LTR features and searching for field using multiple words

2017-10-20 Thread Dariusz Wojtas
I have found a solution based on Yonik's post in this thread: http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/95646 The answer was to surround the searched value with double quotes. In my case they had to be escaped, because there are already quotes in the SOLR feature definition.

[ANNOUNCE] Luke 7.1.0 released

2017-10-20 Thread Tomoko Uchida
Download the release zip here: https://github.com/DmitryKey/luke/releases/tag/luke-7.1.0 Upgrade to Lucene 7.1.0. and, other changes in this release: -- Tomoko Uchida

LTR features and searching for field using multiple words

2017-10-20 Thread Dariusz Wojtas
Hi, Recently I work with LTR features. In some of these features I use the block join parent parser. It works as expected until I pass multi-word value into the query. I have a parameter called 'fullAddressStreet' and it - works when I pass value 'something' - does not work if I pass value

Re: Measuring time spent in analysis and writing to index

2017-10-20 Thread Zisis T.
Another thing you can do - and which has helped me in the past quite a few times - is to just run JVisualVM, attach to Solr's Java process and enable the CPU sampler under the Sampler tab. As you run indexing the methods that most time is spent on will appear near the top. -- Sent from:

Re: Concern on solr commit

2017-10-20 Thread Emir Arnautović
Hi Leo, If you gracefully shut down Solr documents will be committed. Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 20 Oct 2017, at 08:44, Leo Prince wrote: > >

Sort by field from another collection

2017-10-20 Thread Dmitry Gerasimov
Hi! I have one main collection of people and a few more collections with additional data. All search queries are on the main collection with joins to one or more additional collections. A simple example would be: (*:* {!join from=people_person_id to=people_person_id

Re: ClassicAnalyzer Behavior on accent character

2017-10-20 Thread Chitra
Hi, So, Isn't advisable to use classicTokenizer and classicAnalyzer? On Thu, Oct 19, 2017 at 8:29 PM, Erick Erickson wrote: > Have you looked at the specification to see how it's _supposed_ to work? > > From the javadocs: > "implements Unicode text segmentation, *

Re: E-Commerce Search: tf-idf, tie-break and boolean model

2017-10-20 Thread Vincenzo D'Amore
Thanks for all the info, I really appreciate your help. I'm working on the configuration and following your suggestions. We already had a golden set of query-results pairs (~1000) used to tune and check how my application (and Solr configuration) performs. But I've to entirely double check if

Re: solr core replication

2017-10-20 Thread Hendrik Haddorp
Hi Erick, that is actually the call I'm using :-) If you invoke http://solr_target_machine:port/solr/core/replication?command=details after that you can see the replication status. But even after a Solr restart the call still shows the replication relation and I would like to remove this so

Re: Concern on solr commit

2017-10-20 Thread Leo Prince
Thank you Yonik. Since we are using SoftCommits, the docs written will be in RAM until a AutoCommit to reflect onto Disk, I just wanted to know what happens when Solr restarts. Being said, I am using 4.10 and tomcat is handling the Solr, when we restart the tomcat service just before an