Re: Solr suggestions, best practices

2018-11-06 Thread Zheng Lin Edwin Yeo
Maybe you can look into this: https://lucidworks.com/2015/03/04/solr-suggester/ Which version of Solr are you using? Regards, Edwin On Tue, 6 Nov 2018 at 17:00, Clemens Wyss DEV wrote: > At the moment we are using spellchecking-component for suggestions which > is suboptimal, to say the least.

Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Gus Heck
Tomáš, One thing that causes a clusterstatus call is alias resolution if the HttpClusterStateProvider is in use instead of the ZkClusterStateProvider. I've just been fixing spurious error messages generated by this in SOLR-12938. -Gus On Tue, Nov 6, 2018 at 1:08 PM Zimmermann, Thomas < tzimmerm.

Re: Retrieve field from docValues

2018-11-06 Thread Erick Erickson
You should until this is resolved. The original purpose of that JIRA doesn't count any longer, i.e. the speedup aspects since that's been taken care of though. On Tue, Nov 6, 2018 at 3:50 PM Wei wrote: > > Also I notice this issue is still open: > https://issues.apache.org/jira/browse/SOLR-10816 >

Re: Retrieve field from docValues

2018-11-06 Thread Wei
Also I notice this issue is still open: https://issues.apache.org/jira/browse/SOLR-10816 Does that mean we still need to have stored=true for uniqueKey? On Tue, Nov 6, 2018 at 2:14 PM Wei wrote: > I see there is also a docValuesFormat option, what's the default for this > setting? Performance wi

Re: Retrieve field from docValues

2018-11-06 Thread Erick Erickson
docValuesFormat="Memory" has been deprecated, so you shouldn't use it. On Tue, Nov 6, 2018 at 2:14 PM Wei wrote: > > I see there is also a docValuesFormat option, what's the default for this > setting? Performance wise is it good to set docValuesFormat="Memory" ? > > Best, > Wei > > > On Tue, Nov

Re: Retrieve field from docValues

2018-11-06 Thread Wei
I see there is also a docValuesFormat option, what's the default for this setting? Performance wise is it good to set docValuesFormat="Memory" ? Best, Wei On Tue, Nov 6, 2018 at 11:55 AM Erick Erickson wrote: > Yes, "the most efficient possible" is associated with that JIRA, so only > in 7x. >

Re: SolrCloud Replication Failure

2018-11-06 Thread Erick Erickson
Hmmm, ok. The replication failure could lead to the scenario I outlined, but that's a secondary issue to the update not getting to the follower in the first place as you say. On Tue, Nov 6, 2018 at 12:19 PM Jeremy Smith wrote: > > Thanks everyone. I added SOLR-12969. > > > Erick - those sound lik

Re: SolrCloud Replication Failure

2018-11-06 Thread Jeremy Smith
Thanks everyone. I added SOLR-12969. Erick - those sound like important questions, but I think this issue is slightly different. In this case, replication is failing even if the leader never goes down. From: Erick Erickson Sent: Tuesday, November 6, 2018 2:5

Re: Retrieve field from docValues

2018-11-06 Thread Erick Erickson
Yes, "the most efficient possible" is associated with that JIRA, so only in 7x. "Does this still hold if whole index is loaded into memory?" The decompression part yes, the disk seek part no. And it's also sensitive to whether the documentCache already has the document. I'd also make uniqueKey an

Re: SolrCloud Replication Failure

2018-11-06 Thread Erick Erickson
Kevin: Well, let's certainly raise it as a JIRA, blocker or not I'm not sure. I _think_ the new LIR work done in Solr 7.3 might make it possible to detect this condition but I'm not totally sure what to do about it. So let's say the leader gets an update while a follower is down. (one leader and

RE: Negative CDCR Queue Size?

2018-11-06 Thread Webster Homer
I'm sorry I should have included that. We are running Solr 7.2. We use CDCR for almost all of our collections. We have experienced several intermittent problems with CDCR, this one seems to be new, at least I hadn't seen it before -Original Message- From: Erick Erickson [mailto:erickeric

Re: SolrCloud Replication Failure

2018-11-06 Thread Kevin Risden
Erick Erickson - I don't have much time to chase this down. Do you think this a blocker for 7.6? It seems pretty serious. Jeremy - This would be a good JIRA to create - we can move the conversation there to try to get the right people involved. Kevin Risden On Fri, Nov 2, 2018 at 7:57 AM Jeremy

Re: Retrieve field from docValues

2018-11-06 Thread Wei
Thanks Yasufumi and Erick. ---. 2. "it depends". Solr will try to do the most efficient thing possible. If _all_ the fields are docValues, it will return the stored values from the docValues structure. I find this jira: https://issues.apache.org/jira/browse/SOLR-8344Does this mean "Solr

Re: Negative CDCR Queue Size?

2018-11-06 Thread Erick Erickson
What version of Solr? CDCR has changed quite a bit in the 7x code line so it's important to know the version. On Tue, Nov 6, 2018 at 10:32 AM Webster Homer wrote: > > Several times I have noticed that the CDCR action=QUEUES will return a > negative queueSize. When this happens we seem to be mis

Negative CDCR Queue Size?

2018-11-06 Thread Webster Homer
Several times I have noticed that the CDCR action=QUEUES will return a negative queueSize. When this happens we seem to be missing data in the target collection. How can this happen? What does a negative Queue size mean? The timestamp is an empty string. We have two targets for a source. One lo

Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Zimmermann, Thomas
Hi Shawn, We¹re equally impressed by how well the server is handling it. We¹re using Sematext for monitoring and the load on the box has been steady under 1 and not entering a swap state memory wise. We are 100% certain the traffic is coming from the 3 web hosts running this code. We have put som

Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Tomáš Hampl
This error comes every request, in solr client or if i call url in chrome browser or curl from console. I have no replicas actually for this test but it is NRT type. There is no writes or another reads on this server (solr cloud) completely isolated. (version 7.5 single docker container) I have 6

Re: is SearchComponent the correct way?

2018-11-06 Thread Mikhail Khludnev
Not really. It expect to work segment by segment. So it can buffer all doc from one segment, hit redis and push all results into delegating collector. On Tue, Nov 6, 2018 at 8:29 PM John Thorhauer wrote: > Mikhail, > > Thanks for the suggestion. After looking over the PostFilter interface and >

Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Shawn Heisey
On 11/6/2018 10:12 AM, Zimmermann, Thomas wrote: Shawn - Server performance is fine and request time are great. We are tolerating the level of traffic, but the server that is taking all the hits is obviously performing a bit slower than the others. Response times are under 5MS avg for queries on

Re: is SearchComponent the correct way?

2018-11-06 Thread John Thorhauer
Mikhail, Thanks for the suggestion. After looking over the PostFilter interface and the DelegatingCollector, it appears that this would require me to query my outside datastore (redis) for security information once for each document. This would be a big performance issue. I would like to be able

Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Zimmermann, Thomas
I should mention I¹m also hanging out in the Solr IRC Channel today under the nick ³apatheticnow² if anyone wants to follow up in real time during business hours EST. On 11/6/18, 11:39 AM, "Shawn Heisey" wrote: >On 11/6/2018 9:06 AM, Zimmermann, Thomas wrote: >> For example - 75k request per min

Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Zimmermann, Thomas
Erik - This box did have all the leaders for the dozen or so collections we have when the cloud spun up. We were able to force the leaders for other cores onto other nodes using the apis, but did not see this traffic load migrate to the new hosts when leadership changed. All nodes are NRT. The re

Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Shawn Heisey
On 11/6/2018 9:06 AM, Zimmermann, Thomas wrote: For example - 75k request per minute going to this one box, and 3.5k RPM to all other nodes in the cloud. All of those extra requests on the one box are "/solr/admin/collections?collection=collectionName&action=CLUSTERSTATUS&wt=javabin&version=2"

Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Jason Gerlowski
My understanding was that we always tried to use the cached version of this information until either (a) Solr responds in a way that indicates our cache is out of date, or (b) the TTL on the cache entry expires. Though there might very well be a code path that behaves differently as Erick suggests

Re: Retrieve field from docValues

2018-11-06 Thread Erick Erickson
2. "it depends". Solr will try to do the most efficient thing possible. If _all_ the fields are docValues, it will return the stored values from the docValues structure. This prevents a disk seek and decompress cycle. However, if even one field is docValues=false Solr will by default return the

Re: distributed grouping by date

2018-11-06 Thread Erick Erickson
Looks like: https://issues.apache.org/jira/browse/SOLR-11086 On Tue, Nov 6, 2018 at 8:19 AM Tomáš Hampl wrote: > > Hi, > > i have error while running grouping query by date in collection with 5 > shards. When i try same query on collection with only one shard everything > works. > > *query:* > > /

Re: CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Erick Erickson
Is the box you're seeing this on the Overseer? Or is it in any other way "special", like has all the leaders? And I'm assuming all these are NRT replicas, not TLOG or PULL. What are you doing when these occur? Queries? Updates? If you're doing updates, are these coincident with each request? Each

distributed grouping by date

2018-11-06 Thread Tomáš Hampl
Hi, i have error while running grouping query by date in collection with 5 shards. When i try same query on collection with only one shard everything works. *query:* /solr/search_cz/select?q=*:*&group=true&group.field=odjezd *part of schema.xml* ... *collection create * /solr/admin/colle

Re: How to handle List in Solr 6.6

2018-11-06 Thread Shawn Heisey
On 11/6/2018 12:52 AM, waseem-farooqui wrote: { "document": "Fuzzy based semantic search.pdf", "md5Hash": "md5", "rated": [ { "user": "John", "comments": "Not Very useful", "rating": 2, "date":

CloudSolrClient produces tons of CLUSTERSTATUS commands against single server in Cloud

2018-11-06 Thread Zimmermann, Thomas
Question about CloudSolrClient and CLUSTERSTATUS. We just deployed a 3 server ZK cluster and a 5 node solr cluster using the CloudSolrClient in Solr 7.4. We're seeing a TON of traffic going to one server with just cluster status commands. Every single query seems to be hitting this box for statu

Re: How to handle List in Solr 6.6

2018-11-06 Thread Tim Underwood
Hi, It sounds like you are looking for the "Nested Child Documents"[1] and "Block Join Query Parsers"[2] features in Solr. The terminology is weird (block join, child/of, parent/which) but it should do what you want. Do take note of the warning in the docs: One limitation of indexing nested doc

Re: SolrCloud scaling/optimization for high request rate

2018-11-06 Thread Sofiya Strochyk
Hi Toke, sorry for the late reply. The query i wrote here is edited to hide production details, but I can post additional info if this helps. I have tested all of the suggested changes none of these seem to make a noticeable difference (usually response time and other metrics fluctuate over

Re: is SearchComponent the correct way?

2018-11-06 Thread Mikhail Khludnev
It should be postfilter https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/, I believe. On Tue, Nov 6, 2018 at 2:24 PM John Thorhauer wrote: > We have a need to check the results of a search against a set of security > lists that are maintained in a redis cache. I need to b

is SearchComponent the correct way?

2018-11-06 Thread John Thorhauer
We have a need to check the results of a search against a set of security lists that are maintained in a redis cache. I need to be able to take each document that is returned for a search and check the redis cache to see if the document should be displayed or not. I am attempting to do this by cr

Re: Retrieve field from docValues

2018-11-06 Thread Yasufumi Mizoguchi
Hi, > 1. For schema version 1.6, useDocValuesAsStored=true is default, so there > is no need to explicitly set it in schema.xml? Yes. > 2. With useDocValuesAsStored=true and the following definition, will Solr > retrieve id from docValues instead of stored field? No. AFAIK, if you define both

AW: AW: AW: 6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

2018-11-06 Thread Clemens Wyss DEV
Hi Shalin, > You can expect as many connection evictor threads I have (whysoever (*)) 27 SolrClient instances instantiated but I see ~95 "Connection Evictor" threads ... >It turns out that I made a mistake in the patch I committed in...which names >threads like pool-123-thread-1282. >So if you

Solr suggestions, best practices

2018-11-06 Thread Clemens Wyss DEV
At the moment we are using spellchecking-component for suggestions which is suboptimal, to say the least. What are best pracitces for suggestions using Solr? googling (with excellent suggestions 😉) I came along https://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-sol

Re: Java Advanced Imaging (JAI) Image I/O Tools are not installed

2018-11-06 Thread Yasufumi Mizoguchi
Hi, It seems a PDFBox issue, I think. ( https://pdfbox.apache.org/2.0/dependencies.html ) Thanks, Yasufumi 2018年11月6日(火) 16:10 Furkan KAMACI : > Hi All, > > I use Solr 6.5.0 and test OCR capabilities. It OCRs pdf files even it is so > slow. However, I see that error when I check logs: > > o.a.

How to handle List in Solr 6.6

2018-11-06 Thread waseem-farooqui
I am new with Solr and using Spring-data-solr to store my complete **pdf** files with its contents now there raise a situation in which I want to store the file rating, that can be rate by list of users means I would have object something like this in my **DataModel** `List` in which `FileRating` w