Re: Searching for a term which isn't a part of an expression

2016-12-14 Thread Dean Gurvitz
Hi, The list of phrases wil be relatively dynamic, so changing the indexing process isn't a very good solution for us. We also considered using a PostFilter or adding a SearchComponent to filter out the "bad" results, but obviously a true query-time support would be a lot better. On Wed, Dec

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Erick Erickson
Let's back up a bit. You say "This seems to cause two replicas to return different hits depending upon which one is queried." OK, _how_ are they different? I've been assuming different numbers of hits. If you're getting the same number of hits but different document ordering, that's a completely

Re: Nested JSON Facets (Subfacets)

2016-12-14 Thread Yonik Seeley
That should work... what version of Solr are you using? Did you change the type of the popularity field w/o completely reindexing? You can try to verify the number of documents in each bucket that have the popularity field by adding another sub-facet next to cat_pop:

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Webster Homer
Thanks for the quick feedback. We are not doing continuous indexing, we do a complete load once a week and then have a daily partial load for any documents that have changed since the load. These partial loads take only a few minutes every morning. The problem is we see this discrepancy long

RE: DocTransformer not always working

2016-12-14 Thread Markus Jelsma
Hello - i just looked up the DocTransformer Javadoc and spotted the getExtraRequestFields method. What you mention makes sense, so i immediately tried: solr/search/select?omitHeader=true=json=true=1=id asc=*:*=minhash,minhash:[binstr] { "response":{"numFound":97895,"start":0,"docs":[ {

Re: DocTransformer not always working

2016-12-14 Thread Chris Hostetter
Fairly certain you aren't overridding getExtraRequestFields, so when your DocTransformer is evaluated it can'd find the field you want it to transform. By default, the ResponseWriters don't provide any fields that aren't explicitly requested by the user, or specified as "extra" by the

Re: High increasing slab memory solr 6

2016-12-14 Thread Shawn Heisey
On 12/14/2016 7:12 AM, moscovig wrote: > Shawn, thanks for the reply > > Please take a look at that post. It's describing the same issue with ES > > They describe the issue as "dentry cache is bloating memory" > >

DocTransformer not always working

2016-12-14 Thread Markus Jelsma
Hello - I just spotted an oddity with all two custom DocTransformers we sometimes use on Solr 6.3.0. This particular transformer in the example just transforms a long (or int) into a sequence of bits. I just use it as an convenience to compare minhashes with my eyeballs. First example is very

Re: Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Erick Erickson
The commit points on different replicas will trip at different wall clock times so the leader and replica may return slightly different results depending on whether doc X was included in the commit on one replica but not on the second. After the _next_ commit interval (2 seconds in your case), doc

Solr Cloud Replica Cores Give different Results for the Same query

2016-12-14 Thread Webster Homer
We are using Solr Cloud 6.2 We have been noticing an issue where the index in a core shows as current = false We have autocommit set for 15 seconds, and soft commit at 2 seconds This seems to cause two replicas to return different hits depending upon which one is queried. What would lead to

RE: Traverse over response docs in SearchComponent impl.

2016-12-14 Thread Markus Jelsma
Thanks! Running the same code in cloud mode worked nicely almost right away. Getting it to work in non-cloud mode is still non-trivial. I can get the DocList in process(), but AFAIK it just provides Lucene docIds, not a nice DocumentList we could work with. The use-case is straightforward,

Re: Searching for a term which isn't a part of an expression

2016-12-14 Thread Ahmet Arslan
Hi, Do you have a common list of phrases that you want to prohibit partial match? You can index those phrases in a special way, for example, This is a new world hello_world hot_dog tap_water etc. ahmet On Wednesday, December 14, 2016 9:20 PM, deansg wrote: We would like to

Solr on HDFS: increase in query time with increase in data

2016-12-14 Thread Chetas Joshi
Hi everyone, I am running Solr 5.5.0 on HDFS. It is a solrCloud of 50 nodes and I have the following config. maxShardsperNode: 1 replicationFactor: 1 I have been ingesting data into Solr for the last 3 months. With increase in data, I am observing increase in the query time. Currently the size

Searching for a term which isn't a part of an expression

2016-12-14 Thread deansg
We would like to enable queries for a specific term that doesn't appear as a part of a given expression. Negating the expression will not help, as we still want to return items that contain the term independently, even if they contain full expression as well. For example, we would like to search

Nested JSON Facets (Subfacets)

2016-12-14 Thread CA
Hi all, this is about using a function in nested facets, specifically the „sum()“ function inside a „terms“ facet using the json.facet api. My json.facet parameter looks like this: json.facet={shop_cat: {type:terms, field:shop_cat, facet: {cat_pop:"sum(popularity)"}}} A snippet of the

Re: "on deck" searcher vs warming searcher

2016-12-14 Thread Chris Hostetter
: In a situation where searchers A-E are queued in the states : A: Current : B: Warming : C: Ondeck : D: Ondeck : E: Being created with newSearcher : : wouldn't it make sense to discard C before it gets promoted to Warming, : as the immediate action after warming C would be to start warming D? :

Re: Reg: Is there a way to query solr leader directly using solrj?

2016-12-14 Thread Erick Erickson
First off I'm a bit confused. You say you're working with an UpdateProcessorFactory but then want to use SolrJ to get a leader. Why do this? Why not just work entirely locally and reach into the _local_ index (note, you have to do this after the doc has been routed to the correct shard)? Once

Reg: Is there a way to query solr leader directly using solrj?

2016-12-14 Thread indhu priya
Hi, In my project I have one leader and one replica architecture. I am using custom code( using DocumentUpdateProcessorFactory) for merging old documents with incoming new documents. eg. 1. if 1st document have 10 fields, all 10 fields will be indexed. 2. if 2nd document have 8 fields, 5

Re: High increasing slab memory solr 6

2016-12-14 Thread moscovig
In the mean time I am removing all the explicit commits we have in the code. Will update if it got better -- View this message in context: http://lucene.472066.n3.nabble.com/High-increasing-slab-memory-solr-6-tp4309708p4309718.html Sent from the Solr - User mailing list archive at

Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-14 Thread GW
Thanks, I understand accessing solr directly. I'm doing REST calls to a single machine. If I have a cluster of five servers and say three Apache servers, I can round robin the REST calls to all five in the cluster? I guess I'm going to find out. :-) If so I might be better off just running

Re: High increasing slab memory solr 6

2016-12-14 Thread moscovig
Shawn, thanks for the reply Please take a look at that post. It's describing the same issue with ES They describe the issue as "dentry cache is bloating memory" https://discuss.elastic.co/t/memory-usage-of-the-machine-with-es-is-continuously-increasing/23537/5 Thanks Gilad -- View this

Re: Solr has a CPU% spike when indexing a batch of data

2016-12-14 Thread Shawn Heisey
On 12/14/2016 1:28 AM, forest_soup wrote: > We are doing index on the same http endpoint. But as we have shardnum=1 and > replicafactor=1, so each collection only has one core. So there should no > distributed update/query, as we are using solrj's CloudSolrClient which will > get the target URL of

Re: Collection API CREATE creates name like '_shard1_replica1'

2016-12-14 Thread Shawn Heisey
On 12/14/2016 1:36 AM, Sandeep Khanzode wrote: > I uploaded (upconfig) config (schema and solrconfig XMLs) to Zookeeper > and then linked (linkconfig) the confname to a collection name. When I > attempt to create a collection using the API like this >

Re: Solr - Amazon like search

2016-12-14 Thread Shawn Heisey
On 12/13/2016 10:55 PM, vasanth vijayaraj wrote: > We are building an e-commerce mobile app. I have implemented Solr search and > autocomplete. > But we like the Amazon search and are trying to implement something like > that. Attached a screenshot > of what has been implemented so far > > The

Re: High increasing slab memory solr 6

2016-12-14 Thread Shawn Heisey
On 12/14/2016 5:55 AM, moscovig wrote: > We have solr 6.2.1. > One of the collection is causing lots of updates. > We see the next logs: > > /INFO org.apache.solr.core.SolrDeletionPolicy : > SolrDeletionPolicy.onCommit: commits: num=2 > >

High increasing slab memory solr 6

2016-12-14 Thread moscovig
Hi We have solr 6.2.1. One of the collection is causing lots of updates. We see the next logs: /INFO org.apache.solr.core.SolrDeletionPolicy : SolrDeletionPolicy.onCommit: commits: num=2

Re: Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-14 Thread Dorian Hoxha
See replies inline: On Wed, Dec 14, 2016 at 11:16 AM, GW wrote: > Hello folks, > > I'm about to set up a Web service I created with PHP/Apache <--> Solr Cloud > > I'm hoping to index a bazillion documents. > ok , how many inserts/second ? > > I'm thinking about using

Has anyone used linode.com to run Solr | ??Best way to deliver PHP/Apache clients with Solr question

2016-12-14 Thread GW
Hello folks, I'm about to set up a Web service I created with PHP/Apache <--> Solr Cloud I'm hoping to index a bazillion documents. I'm thinking about using Linode.com because the pricing looks great. Any opinions?? I envision using an Apache/PHP round robin in front of a solr cloud My

Re: "on deck" searcher vs warming searcher

2016-12-14 Thread Toke Eskildsen
On Tue, 2016-12-13 at 16:07 -0700, Chris Hostetter wrote: > ** "warming" happens i na single threaded executor -- so if there > are multiple ondeck searchers, only one of them at a time is ever a > "warming" searcher > ** multiple ondeck searchers can be a sign of a potential performance > problem

Collection API CREATE creates name like '_shard1_replica1'

2016-12-14 Thread Sandeep Khanzode
Hi, I uploaded (upconfig) config (schema and solrconfig XMLs) to Zookeeper and then linked (linkconfig) the confname to a collection name. When I attempt to create a collection using the API like this .../solr/admin/collections?action=CREATE=abc=1=abc   ... it creates a collection core named

Re: Solr has a CPU% spike when indexing a batch of data

2016-12-14 Thread forest_soup
Thanks, Shawn! We are doing index on the same http endpoint. But as we have shardnum=1 and replicafactor=1, so each collection only has one core. So there should no distributed update/query, as we are using solrj's CloudSolrClient which will get the target URL of the solrnode when requesting to