Logging in Solrcloud

2017-12-04 Thread Matzdorf, Stefan, Springer SBM DE
Hey everybody, i have a question regarding query-request logging in solr-cloud. I've set the the "org.apache.solr.core.SolrCore.Request"-logger to INFO-level and its logging all those query-requests. So far so good. BUT, as I'm running Solr in cloud mode with 3 nodes and 3 shards per collection

Re: Solr Cloud configuration

2017-12-04 Thread Shawn Heisey
On 12/4/2017 12:11 PM, Steve Pruitt wrote: Getting my Solr Cloud nodes up and running took manually setting execution permissions on the configuration files and manually creating the logs and logs/archived folders under /opt/solr/server. Even though I have my log folders set to var/solr/logs

Re: Dataimporter status

2017-12-04 Thread Shawn Heisey
On 12/3/2017 9:27 AM, Mahmoud Almokadem wrote: We're facing an issue related to the dataimporter status on new Admin UI (7.0.1). Calling to the API http://solrip/solr/collection/dataimport?_=1512314812090&command=status&indent=on&wt=json returns different status despite the importer is running

Re: Multiple cores versus a "source" field.

2017-12-04 Thread Walter Underwood
One more opinion on source field vs separate collections for multiple corpora. Index statistics don’t really settle down until at least 100k documents. Below that, idf is pretty noisy. With Ultraseek, we used pre-calculated frequency data for collections under 10k docs. If your corpora have sim

RE: Multiple cores versus a "source" field.

2017-12-04 Thread Phil Scadden
Thanks Eric. I have already followed the solrj indexing very closely - I have to do a lot of manipulation at indexing time. The other blog article is very interesting as I do indeed use "year" (year of publication) and it is very frequently used to filter queries. I will have a play with that no

Re: Multiple cores versus a "source" field.

2017-12-04 Thread Erick Erickson
That's the unpleasant part of semi-structued documents (PDF, Word, whatever). You never know the relationship between raw size and indexable text. Basically anything that you don't care to contribute to _scoring_ is often better in an fq clause. You can also use {!cache=false} to bypass actually u

Re: merge metrics not showing up in Jconsole

2017-12-04 Thread suresh pendap
Hi, I wanted to check if it is a known issue that the merge metrics are not exposed as JMX beans. Any one else in the community ran into this issue? Thanks Suresh On Sun, Dec 3, 2017 at 4:24 PM, suresh pendap wrote: > I see only these metrics in my Jconsole window > > [image: Inline image 1] >

RE: Multiple cores versus a "source" field.

2017-12-04 Thread Phil Scadden
>You'll have a few economies of scale I think with a single core, but frankly I >don't know if they'd be enough to measure. You say the docs are "quite large" >though, >are you talking books? Magazine articles? is 20K large or are the 20M? Technical reports. Sometimes up to 200MB pdfs, but that

Re: Multiple cores versus a "source" field.

2017-12-04 Thread Erick Erickson
At that scale, whatever you find administratively most convenient. You'll have a few economies of scale I think with a single core, but frankly I don't know if they'd be enough to measure. You say the docs are "quite large" though, are you talking books? Magazine articles? is 20K large or are the 2

Multiple cores versus a "source" field.

2017-12-04 Thread Phil Scadden
I have two different document stores that I want index. Both are quite small (<50,000 documents though documents can be quite large). They are quite capable of using the same schema, but you would not want to search both simultaneously. I can see two approaches to handling this case. 1/ Create a

Re: Issue with CDCR bootstrapping in Solr 7.1

2017-12-04 Thread Tom Peters
Not sure how it's possible. But I also tried using the _default config and just adding in the source and target configuration to make sure I didn't have something wonky in my custom solrconfig that was causing this issue. I can confirm that until I restart the follower nodes, they will not recei

Re: Index Content Removing the HTML Tags.

2017-12-04 Thread Erick Erickson
Have you tried: HtmlStripCharFilterFactory? On Mon, Dec 4, 2017 at 12:37 PM, Fiz Newyorker wrote: > Hello Solr Group, > > Good Morning ! > > I am working on Solr 6.5 version and I am trying to Index from Mongo DB > 3.2.5. > > I have content collection in mongodb where there is body column which h

Index Content Removing the HTML Tags.

2017-12-04 Thread Fiz Newyorker
Hello Solr Group, Good Morning ! I am working on Solr 6.5 version and I am trying to Index from Mongo DB 3.2.5. I have content collection in mongodb where there is body column which has html tags in it. I want to index body column with out html tags. *Please see the below body column data in mo

Re: check softCommit , autocommit and hard commit count

2017-12-04 Thread Shawn Heisey
On 12/4/2017 1:53 AM, Puppy Linux Distros wrote: I know it's a bad practice but due to some reasons, our application fires hard commits via code(upon most of the /update) and invokes the /update api with commit=true and application very less uses softcommits. I will recommend devs to look forward

Re: Solr JVM best pratices

2017-12-04 Thread Dominique Bejean
Thank you Shaw for replying each items I start to figure out better all these tricky jvm stuff. Dominique Le dim. 3 déc. 2017 à 01:30, Shawn Heisey a écrit : > On 12/2/2017 8:43 AM, Dominique Bejean wrote: > > I would like to have some advices on best practices related to Heap Size, > > MMap,

Solr Cloud configuration

2017-12-04 Thread Steve Pruitt
Getting my Solr Cloud nodes up and running took manually setting execution permissions on the configuration files and manually creating the logs and logs/archived folders under /opt/solr/server. Even though I have my log folders set to var/solr/logs in the default/solr.in.sh file. After gettin

Re: Skewed IDF in multi lingual index, again

2017-12-04 Thread Yonik Seeley
On Mon, Dec 4, 2017 at 1:35 PM, Shawn Heisey wrote: > I'm pretty sure that the difference between docCount and maxDoc is deleted > documents. docCount (not the best name) here is the number of documents with the field being searched. docFreq (df) is the number of documents actually containing t

Re: Solr Cloud permissions

2017-12-04 Thread Shawn Heisey
On 12/4/2017 9:54 AM, Steve Pruitt wrote: I used the -u option to provide the installer with a user id. The /var/solr folder has the user set as the owner. But, the /opt/solr folder is owned by root. How did this happen? When you install the service, Solr has no need to write to the progra

Re: Skewed IDF in multi lingual index, again

2017-12-04 Thread Shawn Heisey
On 12/4/2017 7:21 AM, alessandro.benedetti wrote: the reason docCount was improving things is because it was using a docCount relative to a specific field while maxDoc is global all over the index ? Lucene/Solr doesn't actually delete documents when you delete them, it just marks them as delet

Re: [EXTERNAL] - Re: starting SolrCloud nodes

2017-12-04 Thread Shawn Heisey
On 12/4/2017 7:33 AM, Steve Pruitt wrote: I edited /etc/default/solr.in.sh to list my ZK hosts and I uncommented ZK_CLIENT_TIMEOUT leaving the default value of 15000. The default is 15 seconds, most of the example configs that Solr includes have it increased to 30 seconds. IMHO, 15 seconds i

Re: JVM GC Issue

2017-12-04 Thread S G
I think the below article explains it well: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html I was thinking that doc-Values need to be transitioned into JVM from the OS cache. Turns out that is not required as the docValues are loaded into the virtual address space by the OS

Re: Recovery Issue - Solr 6.6.1 and HDFS

2017-12-04 Thread Joe Obernberger
Hi All - this same problem happened again, and I think I partially understand what is going on.  The part I don't know is what caused any of the replicas to go into full recovery in the first place, but once they do, they cause network interfaces on servers to go fully utilized in both in/out d

RE: Solr Cloud permissions

2017-12-04 Thread Steve Pruitt
I used the -u option to provide the installer with a user id. The /var/solr folder has the user set as the owner. But, the /opt/solr folder is owned by root. How did this happen? I checked the opt/solr/bin/init.d/solr and verified RUNAS is set to the user I entered. When I try to execute se

Re: check softCommit , autocommit and hard commit count

2017-12-04 Thread Erick Erickson
Neither commit does anything if no updates have been received. But you don't need to wait for the devs to STOP DOING THAT ;). In solrconfig.xml you can set: IgnoreCommitOptimizeUpdateProcessorFactory see the ref guide Best, Erick On Mon, Dec 4, 2017 at 12:53 AM, Puppy Linux Distros wrote: >

Solr Cloud permissions

2017-12-04 Thread Steve Pruitt
The documentation states you cannot run Solr cloud as root. When I installed Solr I gave it another user. I checked the init.d script and RUNAS is set to the user I entered. This user doesn't have the permissions I need, but I am not exactly sure where to check permissions. Thanks. -S

Re: Solr score use cases

2017-12-04 Thread alessandro.benedetti
I would like to stress how important is what Erick explained. A lot of times people want to use the score to show it to the users/calculate probability/doing weird calculations. Score is used to rank results, given a query. To give a local ordering. This is the only useful information for the end

Re: Huge Query execution time for multiple ORs

2017-12-04 Thread Faraz Fallahi
Will do thx Am 04.12.2017 9:27 nachm. schrieb "Emir Arnautović" < emir.arnauto...@sematext.com>: > Hi Faraz, > When you say query without sort, I assume that you mean you omit sort so > you expect it to be sorted by score. It is expected to be slower than equal > query without calculating score -

Re: Skewed IDF in multi lingual index, again

2017-12-04 Thread alessandro.benedetti
Furthermore, taking a look to the code for BM25 similarity, it seems to me it is currently working right : - docCount is used per field if != -1 /** * Computes a score factor for a simple term and returns an explanation * for that score factor. * * * The default implementation us

RE: [EXTERNAL] - Re: starting SolrCloud nodes

2017-12-04 Thread Steve Pruitt
Thanks. I edited /etc/default/solr.in.sh to list my ZK hosts and I uncommented ZK_CLIENT_TIMEOUT leaving the default value of 15000. I am not sure if I need to set the SOLR_HOST. This is not a production install, but I am running with three ZK machines and three Solr machines in the cluster. T

Re: Huge Query execution time for multiple ORs

2017-12-04 Thread Emir Arnautović
Hi Faraz, When you say query without sort, I assume that you mean you omit sort so you expect it to be sorted by score. It is expected to be slower than equal query without calculating score - e.g. run same query as fq. What you observe can be explained with: * Solr is calculating score even not

Re: Skewed IDF in multi lingual index, again

2017-12-04 Thread alessandro.benedetti
Hi Markus, just out of interest, why did " It was solved back then by using docCount instead of maxDoc when calculating idf, it worked really well!" solve the problem ? i assume you are using different fields, one per language. Each field is appearing on a different number of docs I guess. e.g. t

Re: Huge Query execution time for multiple ORs

2017-12-04 Thread Faraz Fallahi
Hi guys, Sorry to bother you again, but i am really confused: Ive used solr admin website and created a query with lots of ORs using solr 4.7. When i execute the query without a sort it executes in round about 3.5 - 4 seconds. When i execute it with a sort on a field called pubdate it takes abou

Re: Java 9 and Solr 6.6

2017-12-04 Thread Sergio García Maroto
Thanks. Very clear not to go with java 9. On 2 December 2017 at 00:37, Shawn Heisey wrote: > On 12/1/2017 12:32 PM, marotosg wrote: > > Would you recommend installing Solr 6.6.1 with Java 9 for a production > > environement? > > Solr 7.x has been tested with Java 9 and should work with no proble

Solr 7.1.0 Group Facets Error

2017-12-04 Thread Priya Rodrigues
Facing errors on using groups and facets These queries work fine - http://localhost:8983/solr/urls/select?q=*:*&rows=0&facet=true&facet.field=city_id http://localhost:8983/solr/urls/select?q=*:*&rows=0&facet=true&facet.field=city_id&group=true&group.field=locality_id These kind of queries (where

Re: check softCommit , autocommit and hard commit count

2017-12-04 Thread Puppy Linux Distros
Hi, Thanks Shawn for the help. I think I should have added few more details to my previous mail. I know it's a bad practice but due to some reasons, our application fires hard commits via code(upon most of the /update) and invokes the /update api with commit=true and application very less uses s