Re: How to handle nested documents in solr (SolrJ)

2017-05-24 Thread David Lee
Hi Rick, Adding to this subject, I do appreciate you pointing us to these articles, but I'm curious about how much of these take into account the latest versions of Solr (ie: +6.5 and 7) given the JSON split capabilities, etc. I know that is just on the indexing side so the searches may be

Re: High CPU when use grouping group.ngroups=true

2017-05-24 Thread Nguyen Manh Tien
Without using ngroups=true, is there any way to handle pagination correctly when we collapse result using grouping? Regards, Tien On Tue, May 23, 2017 at 9:55 PM, Nguyen Manh Tien wrote: > The collapse field is high-cardinality field. I haven't profiling yet but >

Re: Spread SolrCloud across two locations

2017-05-24 Thread Shawn Heisey
On 5/24/2017 4:14 PM, Jan Høydahl wrote: > Sure, ZK does by design not support a two-node/two-location setup. But still, > users may want/need to deploy that, > and my question was if there are smart ways to make such a setup as little > painful as possible in case of failure. > > Take the

Re: Spread SolrCloud across two locations

2017-05-24 Thread Anirudha Jadhav
Latest zk supports auto reconfigure. Keep one DC as quorum and another as observers. When a DC goes down initiate a zk reconfigure action. To flip quorum and observers. When I tested this solr survived just fine, but it been a while. Ani On Wed, May 24, 2017 at 6:35 PM Pushkar Raste

Re: Spread SolrCloud across two locations

2017-05-24 Thread Pushkar Raste
A setup I have used in the past was to have an observer I DC2. If DC1 one goes boom you need manual intervention to change observer's role to make it a follower. When DC1 comes back up change on instance in DC2 to make it a observer again On May 24, 2017 6:15 PM, "Jan Høydahl"

RE: How to avoid unnecessary query parsing on distributed search in QueryComponent.prepare()?

2017-05-24 Thread Markus Jelsma
I've asked myself this question too some times. In this case extending MLT QParser. So far, i've not found a simple means to propagate a parsed top-level Lucene query object over the wire. But, since there is a clear toString for that Query object, if we could retranslate that String to a

RE: Spread SolrCloud across two locations

2017-05-24 Thread Markus Jelsma
Hi - Again, hiring a simple VM at a third location without a Solr cloud sounds like the simplest solution. It keeps the quorum tight and sound. This simple solution is the one i would try first. Or am i completely missing something and sound like an idiot? Could be, of course. Regards, Markus

Re: Spread SolrCloud across two locations

2017-05-24 Thread Jan Høydahl
Sure, ZK does by design not support a two-node/two-location setup. But still, users may want/need to deploy that, and my question was if there are smart ways to make such a setup as little painful as possible in case of failure. Take the example of DC1: 3xZK and DC2: 2xZK again. And then DC1

Re: solr 6 at scale

2017-05-24 Thread Toke Eskildsen
Nawab Zada Asad Iqbal wrote: > @Toke, I stumbled upon your page last week but it seems that your huge > index doesn't receive a lot of query traffic. It switches between two kinds of usage: Everyday use is very low traffic by researchers using it interactively: 1-2

Re: Securing Solr with BasicAuth

2017-05-24 Thread Shawn Heisey
On 5/24/2017 2:08 PM, Warden, Jesse wrote: > We don’t want people modifying Solr on our website. We found this plugin: > https://home.apache.org/~ctargett/RefGuidePOC/jekyll-full/basic-authentication-plugin.html#BasicAuthenticationPlugin-EnableBasicAuthentication > > However, if someone goes to

Re: [Simplified my question] How to enhance solr.StandardTokenizerFactory? (was: Why is Standard Tokenizer not separating at this comma?)

2017-05-24 Thread Steve Rowe
Hi Robert, Two possibilities come to mind: 1. Use a char filter factory (runs before the tokenizer) to convert commas between digits to spaces, e.g. PatternReplaceCharFilterFactory

Re: Why is Standard Tokenizer not separating at this comma?

2017-05-24 Thread Steve Rowe
Hi Robert, The StandardTokenizer implements the word boundaries rules from UAX#29 , discarding anything between boundaries that is exclusively non-alphanumeric (e.g. punctuation). -- Steve www.lucidworks.com > On May 24, 2017, at 3:05 PM,

[Simplified my question] How to enhance solr.StandardTokenizerFactory? (was: Why is Standard Tokenizer not separating at this comma?)

2017-05-24 Thread Robert Hume
Hi, Following up on my last email question ... I've learned more and I simplified by question ... I have a Solr 3.6 deployment. Currently I'm using solr.StandardTokenizerFactory to parse tokens during indexing. Here's two example streams that demonstrate my issue: Example 1:

Securing Solr with BasicAuth

2017-05-24 Thread Warden, Jesse
We don’t want people modifying Solr on our website. We found this plugin: https://home.apache.org/~ctargett/RefGuidePOC/jekyll-full/basic-authentication-plugin.html#BasicAuthenticationPlugin-EnableBasicAuthentication However, if someone goes to search on our website, they’re presented with an

Re: solrcloud replicas not in sync

2017-05-24 Thread Walter Underwood
Funny, I took a different approach to the same monitoring problem. Each document has a published_timestamp field set when it is generated. The schema has an indexed_timestamp field with a default of NOW. I wrote some Python to get the set of nodes in the collection, query each one, then report

Re: solrcloud replicas not in sync

2017-05-24 Thread Webster Homer
Actually I wrote a service that calls the collections API Cluster Status, but it adds data for each replica by calling the Core Admin STATUS https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-STATUS my service fills in the index information for more data This returns the

Re: solrcloud replicas not in sync

2017-05-24 Thread Webster Homer
oh, those logs probably reflect the update job that runs every 15 minutes if there are updates, typically 1 or 2 changes. thanks for the info On Wed, May 24, 2017 at 10:37 AM, Erick Erickson wrote: > By default, enough closed log files will be kept to hold the last 100

Re: solr 6 at scale

2017-05-24 Thread Walter Underwood
I remembered why we waited for 6.5.1. It is the object leak in the Zookeeper client code. A very slow leak, but worth getting a fix. I tested our cluster at 6000 requests/minute. It is 18 million documents, four shards by four replicas on big AWS instances (c4.8xlarge). We have very long free

Re: solr 6 at scale

2017-05-24 Thread Nawab Zada Asad Iqbal
Thanks everyone for the responses, I will go with the latest bits for now; and will share how it goes. @Toke, I stumbled upon your page last week but it seems that your huge index doesn't receive a lot of query traffic. Mine is around 60TB and receives around 120 queries per second; ~90 shards on

Why is Standard Tokenizer not separating at this comma?

2017-05-24 Thread Robert Hume
I have a Solr 3.6 deployment I inherited. The schema.xml specifies the use of StandardTokenizerFactory like so ... ... ... According to this reference guide ( https://home.apache.org/~ctargett/RefGuidePOC/jekyll/Tokenizers.html) ... the StandardTokenizer will treat

Re: Indexing word with plus sign

2017-05-24 Thread Fundera Developer
Thank you very much Erick! You're right! The "Char" part in PatternReplaceCharFilterFactory misguided me and I tought it was just for Char replacements. One I have gone through the documentation of CharFilters (my fault...) I realized that I could use the very same regex I was using with the

Re: solrcloud replicas not in sync

2017-05-24 Thread Erick Erickson
By default, enough closed log files will be kept to hold the last 100 documents indexed. This is for "peer sync" purposes. Say replica1 goes offline for a bit. When it comes back online, if it's fallen behind by no more than 100 docs, the docs are replayed from another replica's tlog. Having such

Re: solrcloud replicas not in sync

2017-05-24 Thread Webster Homer
The tlog sizes are strange In the case of the collection where we had issues with the replicas the tlog sizes are 740 bytes and 938 bytes on the target side and the same on the source side. There are a lot of them on the source side, when do tlog files get deleted? On Tue, May 23, 2017 at 12:52

Re: solrcloud replicas not in sync

2017-05-24 Thread Erick Erickson
I wouldn't rely on the "current" flag in the admin UI as an indicator. As long as your numDocs and the like match I'd say it's a UI issue. Best, Erick On Wed, May 24, 2017 at 8:15 AM, Webster Homer wrote: > We see data in the target clusters. CDCR replication is working.

Re: solrcloud replicas not in sync

2017-05-24 Thread Webster Homer
We see data in the target clusters. CDCR replication is working. We first noticed the current=false flag on the target replicas, but since I started looking I see it on the source too. I have removed the IgnoreCommitOptimizeUpdateProcessorFactory from our update processor chain, I did two data

Re: solr 6 at scale

2017-05-24 Thread Toke Eskildsen
Shawn Heisey wrote: > On 5/24/2017 3:44 AM, Toke Eskildsen wrote: >> It is relatively easy to downgrade to an earlier release within the >> same major version. We have not switched to 6.5.1 simply because we >> have no pressing need for it - Solr 6.3 works well for us. >

Re: How to handle nested documents in solr (SolrJ)

2017-05-24 Thread Erick Erickson
I would ask if you need nested documents at all. If you can denormlize the docs it's often much easier. In your case I can think of several options: 1> just index a separate field for each subject. Solr handles a couple of hundred fields with ease. student id : 123 student name : john maths: 90

Re: Too many logs recorded in zookeeper.out

2017-05-24 Thread Noriyuki TAKEI
Hi Tahnks for your reply.I,ll try to join zookeeper mailg list!! -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-logs-recorded-in-zookeeper-out-tp4335238p4336914.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 6 at scale

2017-05-24 Thread Shawn Heisey
On 5/24/2017 3:44 AM, Toke Eskildsen wrote: > It is relatively easy to downgrade to an earlier release within the > same major version. We have not switched to 6.5.1 simply because we > have no pressing need for it - Solr 6.3 works well for us. That strikes me as a little bit dangerous, unless

Re: JSON facet performance for aggregations

2017-05-24 Thread Yonik Seeley
On Mon, May 8, 2017 at 11:27 AM, Yonik Seeley wrote: > I opened https://issues.apache.org/jira/browse/SOLR-10634 to address > this performance issue. OK, this has been committed. A quick test shows about a 30x speedup when faceting on a string/numeric docvalues field with 100K

Re: solr 6 at scale

2017-05-24 Thread Toke Eskildsen
On Tue, 2017-05-23 at 17:27 -0700, Nawab Zada Asad Iqbal wrote: > Anyone using solr.6.x for multi-terabytes index size: how did you > decide which version to upgrade to? We are still stuck with 4.10 for our 70TB+ (split in 83 shards) index, due to some custom hacks that has not yet been ported.

Re: How to handle nested documents in solr (SolrJ)

2017-05-24 Thread Rick Leir
Prasad, Gee, you get confusion from a google search for: nested documents site:mail-archives.apache.org/mod_mbox/lucene-solr-user/

How to handle nested documents in solr (SolrJ)

2017-05-24 Thread prasad chowdary
Dear All, I have a requirement that I need to index the documents in solr using Java code. Each document contains a sub documents like below ( Its just for underastanding my question). student id : 123 student name : john marks : maths: 90 English :95 student id : 124 student