Re: SolrJ and autoscaling

2018-06-07 Thread Shalin Shekhar Mangar
Yes, we don't have Solrj support for changing autoscaling configuration today. It'd be nice to have for sure. Can you please file a Jira? Patches are welcome too! On Wed, Jun 6, 2018 at 8:33 PM, Hendrik Haddorp wrote: > Hi, > > I'm trying to read and modify the autoscaling config. The API on >

Re: Setting preferred replica for query/read

2018-06-07 Thread Shawn Heisey
On 6/7/2018 9:17 PM, Zheng Lin Edwin Yeo wrote: Thanks for your reply. As currently we are looking at having a replica to do indexing, and another replica to be use for searching, these 2 requests looks like it can archive this purpose. Will this be implemented in the Solr 7.4 release?

Re: Setting preferred replica for query/read

2018-06-07 Thread Zheng Lin Edwin Yeo
Hi Ere, Thanks for your reply. As currently we are looking at having a replica to do indexing, and another replica to be use for searching, these 2 requests looks like it can archive this purpose. Will this be implemented in the Solr 7.4 release? Regards, Edwin On 7 June 2018 at 16:00, Ere

Collections unable to load after setting up SSL

2018-06-07 Thread Zheng Lin Edwin Yeo
Hi, I am running SolrCloud on Solr 7.3.1 on External ZooKeeper 3.4.11, and I am setting up the security aspect of Solr. After setting up the SSL based on the steps from https://lucene.apache.org/solr/guide/7_3/enabling-ssl.html, the collections that are with 2 replica are no longer able to be

Re: Streaming Expression intersect() behaviour

2018-06-07 Thread Joel Bernstein
This expression works as expected: intersect( cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA asc"), cartesianProduct(tuple(fieldA=array(a,c)), fieldA, productSort="fieldA asc"), on="fieldA" ) And when you transpose the "on" fields like this: intersect(

Re: Different solr score between stand alone vs cloud mode solr

2018-06-07 Thread Erick Erickson
Wei: That is odd. These should be the same so I'm puzzled too. I'm assuming that you're using the exact same schema on both with each field having the exact same definitions. And since you say it's the same release of Solr it's not like some default changed Here's an idea (and I'm shooting

Re: Different solr score between stand alone vs cloud mode solr

2018-06-07 Thread Wei
Thanks Erick. However our indexes on stand alone and cloud are both static -- we indexed them from the same source xmls, optimize and have no updates after it is done. Also in cloud there is only one single shard( with multiple replicas ). I assume distributed stats doesn't have effect in this

RE: Different solr score between stand alone vs cloud mode solr

2018-06-07 Thread Markus Jelsma
To add on that, keep in mind to disable queryResultCache or distributed stats won't work. And to add on that, i do not think distributed stats will work for a single shard index anyway. Regards, Markus -Original message- > From:Erick Erickson > Sent: Thursday 7th June 2018 21:19

Re: Running Solr on HDFS - Disk space

2018-06-07 Thread Hendrik Haddorp
The only option should be to configure Solr to just have a replication factor of 1 or HDFS to have no replication. I would go for the middle and configure both to use a factor of 2. This way a single failure in HDFS and Solr is not a problem. While in 1/3 or 3/1 option a single server error

Re: Solr for Content Management

2018-06-07 Thread David Hastings
When you are sending updates you are adjusting the segments which take them out of memory and the index becomes "cold" until it gets enough searches to cache the various aspects of the index. On Thu, Jun 7, 2018 at 2:10 PM, Moenieb Davids wrote: > Hi All, > > Background: > I am currently

Re: Different solr score between stand alone vs cloud mode solr

2018-06-07 Thread David Hastings
Also the score is a fluid number, you shouldnt use the score for any real reason aside from seeing that the documents are in the right order in relation to the scores from the other documents in the result set. or the occasional condition where two results switch in place from one to the other

Re: Different solr score between stand alone vs cloud mode solr

2018-06-07 Thread Erick Erickson
Short form: As docs are updated, they're marked as deleted until the segment is merged. This affects things like term frequency and doc frequency which in turn influences the score. Due to how commits happen, i.e. autocommit will hit at slightly skewed wall-clock time, different segments are

Different solr score between stand alone vs cloud mode solr

2018-06-07 Thread Wei
Hi, Recently we have an observation that really puzzled us. We have two instances of Solr, one in stand alone mode and one is a single-shard solr cloud with a couple of replicas. Both are indexed with the same documents and have same solr version 6.6.2. When issue the same query, the solr

Re: Apache and Apache Solr together

2018-06-07 Thread Shawn Heisey
On 6/6/2018 12:57 AM, azharuddin wrote: > I've got a question: I came across Apache Solr > as requirement for a module > I'm installing and even after reading the documentation on Apache Solr's > official homepage I'm still not sure whether Apache

Solr for Content Management

2018-06-07 Thread Moenieb Davids
Hi All, Background: I am currently testing a deployment of a content management framework where I am trying to punt Solr as the tool of choice for ingestion and searching. Current status: I have deployed SolrCloud across multiple servers with multiple shards and a replication factor of 2. In

Re: Query a particular index from a multivalued field.

2018-06-07 Thread Erick Erickson
there's no such syntax OOB. You could append an index to it. So your input doc would look something like: doc 1= { "id": "1", "status": [ "b1", "a2" ] } and search appropriately. Perhaps this would be a duplicated field used only when you wanted to

Query a particular index from a multivalued field.

2018-06-07 Thread root23
Hi all, is there a way i can query a particular index of a multivalued field. e.g lets say i have a document like this doc 1= { "id": "1", "status": [ "b", "a" ] } doc2= { "id": "1", "status": [ "c", "b" ]

Re: Delete then re-add a core

2018-06-07 Thread Erick Erickson
Amanda: Your Solr log will record each update that comes through. It's a little opaque, by default it'll show you the first 10 IDs of each batch it receives. Guesses: - you're somehow having the same ID () assigned to multiple documents - your schemas are a bit different and the docs can't be

Re: Solr start script

2018-06-07 Thread Cassandra Targett
The reason why you pass the DirectoryFactory at startup is so every collection/core that's created is automatically stored in HDFS before solrconfig.xml is read to know that's where they should be stored. If you prefer to only store certain collections/cores in HDFS, you would only set those

Re: Delete then re-add a core

2018-06-07 Thread Amanda Shuman
Thanks, Shawn, that is a remarkably clear description. I am able to create the core and all appears fine, but when I go to index I am unfortunately running into a new problem. I am indexing from the same site content as before (it's just an Omeka install with a solr plug-in that reindexes the

Re: HDP Search - Configuration & Data Directories

2018-06-07 Thread Cassandra Targett
The documentation for HDP Search is online (and included in the package actually). This page has the descriptions for the Ambari parameters: https://doc.lucidworks.com/lucidworks-hdpsearch/3.0.0/Guide-Install-Ambari.html . HDP Search is a package developed by Lucidworks but distributed by

Re: Solr start script

2018-06-07 Thread Greenhorn Techie
Shawn, Thanks for your response. Please find my follow-up questions: 1. My understanding is that Directory Factory settings are typically at a collection / core level. If thats the case, what is the advantage of passing it along with the start script? 2. In your below response, did you mean that

"ADDREPLICA failed to create replica"

2018-06-07 Thread solrnoobie
So we have a solr 6.6.3 deployed in AWS EC2 instance (dockerized) and during our load testing, our script for some reason removed 1 replica. So I decided to add 1 replica in the shard with only 1 replica and it returned the error message in the title. Whan can cause this? We have another

Re: Solr start script

2018-06-07 Thread Shawn Heisey
On 6/7/2018 7:37 AM, Greenhorn Techie wrote: When the above settings are passed as part of start script, does that mean whenever a new collection is created, Solr is going to store the indexes in HDFS? But what if I upload my solrconfig.xml to ZK which contradicts with this and contains

Re[2]: Sort hits in the order of subqueries

2018-06-07 Thread Robert K .
Hello, I had a look at the Constant Score approach suggested by Emir: (q0^=100) OR (q1)^=90 ... As observed by Alexandre it seems to introduce stratification at the cost of the intra-query ranking which is not satisfactory. So if I imagine Constant Score as a function f(x) = C operating on a

Difference in fieldLengh and avgFieldLength in Solr 6.6 vs Solr 7.1

2018-06-07 Thread rupali pol
Hi all, We are doing upgrade from Solr 6.6 to Solr 7.1, we are seeing lot of differneces in raking and scores of Solr 6.6 and Solr7.1 results. The major differences we observed are in fieldLengh and avgFieldLength parameters which are calculated per field, per document, per search term.

Re: HDP Search - Configuration & Data Directories

2018-06-07 Thread Greenhorn Techie
Thanks Shawn. Will check with Hortonworks! On 7 June 2018 at 14:19:43, Shawn Heisey (apa...@elyograg.org) wrote: On 6/7/2018 6:35 AM, Greenhorn Techie wrote: > A quick question on configuring Solr with Hortonworks HDP. I have installed > HDP and then installed HDP Search using the steps

Solr start script

2018-06-07 Thread Greenhorn Techie
Hi, For our project purposes, we need to store Solr collections on HDFS. While exploring the documentation for the same, I have found lucidworks documentation ( https://doc.lucidworks.com/lucidworks-hdpsearch/3.0.0/Guide-Install-Manual.html#hdfs-specific-changes) , where it has been mentioned

Re: Running Solr on HDFS - Disk space

2018-06-07 Thread Shawn Heisey
On 6/7/2018 6:41 AM, Greenhorn Techie wrote: As HDFS has got its own replication mechanism, with a HDFS replication factor of 3, and then SolrCloud replication factor of 3, does that mean each document will probably have around 9 copies replicated underneath of HDFS? If so, is there a way to

Re: Streaming Expression intersect() behaviour

2018-06-07 Thread Christian Spitzlay
> Am 07.06.2018 um 11:34 schrieb Christian Spitzlay > : > > intersect( > cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA > asc"), > cartesianProduct(tuple(fieldB=array(a,c)), fieldB, productSort="fieldB asc"), > on="fieldA=fieldB" > ) > > I simplified it a bit,

Re: Graph traversal: Bypass cycle detection?

2018-06-07 Thread Joel Bernstein
Ah. I'll do some testing to see exactly how nodes function behaves when a node links to itself. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jun 7, 2018 at 5:06 AM, Christian Spitzlay < christian.spitz...@biologis.com> wrote: > Hi, > > > > Am 07.06.2018 um 03:20 schrieb Joel Bernstein :

Re: Sort hits in the order of subqueries

2018-06-07 Thread Alexandre Rafalovitch
I think this solution will destroy intra-query ranking. So all results in q0 come before q1 but would be random within q0 results. Would instead just a bunch of boost queries with different weights (additive probably) be a beter way to introduce stratification? Regards, Alex On Thu, Jun 7,

Re: HDP Search - Configuration & Data Directories

2018-06-07 Thread Shawn Heisey
On 6/7/2018 6:35 AM, Greenhorn Techie wrote: A quick question on configuring Solr with Hortonworks HDP. I have installed HDP and then installed HDP Search using the steps described under the link - Within the various Solr config settings on Ambari, I am a bit confused on the role of

Re: Streaming Expression intersect() behaviour

2018-06-07 Thread Joel Bernstein
Nice example! I'll take a look at this today. I believe there was/is a bug with the some of the joins where the "on" parameter is transposing the fields. Its possible that is the case here as well. Joel Bernstein http://joelsolr.blogspot.com/ On Thu, Jun 7, 2018 at 5:34 AM, Christian Spitzlay

Re: Delete then re-add a core

2018-06-07 Thread Shawn Heisey
On 6/7/2018 4:12 AM, Amanda Shuman wrote: Definitely not a permissions problem - everything is run by the solr user, which owns everything in the directories. I just can't figure out why the default working directory is in opt rather than var (which is where it should be according to a previous

Re: Dataimport performance

2018-06-07 Thread Shawn Heisey
On 6/7/2018 12:19 AM, kotekaman wrote: sorry. may i know how to code it? Code *what*? Here's the same wiki page that I gave you for your last message: https://wiki.apache.org/solr/UsingMailingLists Even if I go to the Nabble website and discover that you've replied to a topic that's SEVEN

Re: Delta Import Configuration

2018-06-07 Thread Shawn Heisey
On 6/7/2018 12:22 AM, kotekaman wrote: Is the deltaimport should use the timestamp in sql table? The text above, and the subject, are the ONLY things I can see in this message.  Which makes this an extremely vague question.  This wiki page may be relevant:

Running Solr on HDFS - Disk space

2018-06-07 Thread Greenhorn Techie
Hi, As HDFS has got its own replication mechanism, with a HDFS replication factor of 3, and then SolrCloud replication factor of 3, does that mean each document will probably have around 9 copies replicated underneath of HDFS? If so, is there a way to configure HDFS or Solr such that only three

HDP Search - Configuration & Data Directories

2018-06-07 Thread Greenhorn Techie
Hi, A quick question on configuring Solr with Hortonworks HDP. I have installed HDP and then installed HDP Search using the steps described under the link - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_solr-search-installation/content/hdp-search30-install-mpack.html I have used

Re: Sort hits in the order of subqueries

2018-06-07 Thread Emir Arnautović
Hi Robert, If I get your requirement right, you can solve it with following: (q0)^=100 OR (q1)^=90…. Assuming there are no overlaps - otherwise, one matching multiple conditions can change the ordering. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch

Re: Delete then re-add a core

2018-06-07 Thread Amanda Shuman
Definitely not a permissions problem - everything is run by the solr user, which owns everything in the directories. I just can't figure out why the default working directory is in opt rather than var (which is where it should be according to a previous chain I was in). But at this point I'm at a

Sort hits in the order of subqueries

2018-06-07 Thread Robert K .
Hello, I am investigating the following use case. Suppose I have a list of queries q_0, q_1, ..., q_n which I combine to a boolean query using 'SHOULD'-clauses. The requirement for the hits sorting is that the results of q_0 precede the results of q_1, the results of q_1 precede the results of

Re: Dataimport performance

2018-06-07 Thread kotekaman
sorry. may i know how to code it? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Delta Import Configuration

2018-06-07 Thread kotekaman
Hi all, Is the deltaimport should use the timestamp in sql table? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Streaming Expression intersect() behaviour

2018-06-07 Thread Christian Spitzlay
Hi, I noticed that my mail program broke the test case by replacing a double quote with a different UTF-8 character. Here is the test case again and I hope it will work this time: intersect( cartesianProduct(tuple(fieldA=array(a,b,c,c)), fieldA, productSort="fieldA asc"),

Re: Graph traversal: Bypass cycle detection?

2018-06-07 Thread Christian Spitzlay
Hi, > Am 07.06.2018 um 03:20 schrieb Joel Bernstein : > > Hi, > > At this time cycle detection is built into the nodes expression and cannot > be turned off. The nodes expression is really designed to do a traditional > breadth first search through a graph where cycle detection is needed so

Re: Setting preferred replica for query/read

2018-06-07 Thread Ere Maijala
Hi, What I did in SOLR-11982 was meant to be used with replica types. The idea is that you could have a set of NRT replicas used for indexing and a set of PULL replicas used for queries. That's the easiest way to split the work since PULL replicas never do indexing work, and then you can say

Re: Issues in Solr-7.3

2018-06-07 Thread tapan1707
Hello Shawn, Thanks for the detailed explanation. > That would depend on the specific issues that concern you. Totally agree, most of the issues I saw on the mailing lists were quite subjective and might not be affecting us. But I thought it would be better to directly ask from 7.3 users and

Re: Issues in Solr-7.3

2018-06-07 Thread Shawn Heisey
On 6/6/2018 7:38 PM, tapan1707 wrote: We are planning to upgrade our Solr-6.4 to Solr-7.x. While considering the appropriate minor version, I saw that there are many ongoing issues for Solr-7.3 users on the mailing list. Just wanted to take an expert opinion if it's *safe* to just upgrade to 7.3