Re: Determining replication status

2018-04-01 Thread Jeff Wartes
There're some edge cases around the response based on the timing. In case it's useful: Here's the bit from solrcloud-haft: (java)

Re: Routing a subquery directly to the shard a document came from

2018-03-29 Thread Jeff Wartes
't a query so it isn't parsed. So I have no way to dereference the "$row.[shard]". On 3/27/18, 4:00 PM, "Jeff Wartes" <jwar...@whitepages.com> wrote: I have a large 7.2 index with nested documents and many shards. For each result (parent doc) in a query,

Re: Copying a SolrCloud collection to other hosts

2018-03-28 Thread Jeff Wartes
ere is a shared filesystem requirement. It would be nice if this > Solr feature could be enhanced to have more options like backing up > directly to another SolrCloud using replication/fetchIndex like your cool > solrcloud_manager thing. > > On Wed, Mar 28, 2018 at

Re: Copying a SolrCloud collection to other hosts

2018-03-28 Thread Jeff Wartes
for the duration of the restore But the former isn't tenable if you're sharding due to space constraints, and the latter can't be easily predicted. On 3/28/18, 11:30 AM, "Shawn Heisey" <apa...@elyograg.org> wrote: On 3/28/2018 10:34 AM, Jeff Wartes wrote: > The backup/res

Re: Copying a SolrCloud collection to other hosts

2018-03-28 Thread Jeff Wartes
The backup/restore still requires setting up a shared filesystem on all your nodes though right? I've been using the fetchindex trick in my solrcloud_manager tool for ages now: https://github.com/whitepages/solrcloud_manager#cluster-commands Some of the original features in that tool have been

Routing a subquery directly to the shard a document came from

2018-03-27 Thread Jeff Wartes
I have a large 7.2 index with nested documents and many shards. For each result (parent doc) in a query, I want to gather a relevance-ranked subset of the child documents. It seemed like the subquery transformer would be ideal:

Re: Solr Autoscaling multi-AZ rules

2018-02-22 Thread Jeff Wartes
lica": "<7", "node":"#ANY"} , means don't put more than 7 replicas of the collection (irrespective of the shards) in a given node what do you mean by distinct 'RF' ? I think we are screwing up the terminologies a bit here On Wed, Feb 7, 2018

Solr Autoscaling multi-AZ rules

2018-02-07 Thread Jeff Wartes
I’ve been messing around with the Solr 7.2 autoscaling framework this week. Some things seem trivial, but I’m also running into questions and issues. If anyone else has experience with this stuff, I’d be glad to hear it. Specifically: Context: -One collection, consisting of 42 shards, where

Re: Solr performance on EC2 linux

2017-05-03 Thread Jeff Wartes
It’s presumably not a small degradation - this guy very recently suggested it’s 77% slower: https://blog.packagecloud.io/eng/2017/03/08/system-calls-are-much-slower-on-ec2/ The other reason that blog post is interesting to me is that his benchmark utility showed the work of entering the kernel

Re: Solr performance on EC2 linux

2017-05-01 Thread Jeff Wartes
Yes, that’s the Xenial I tried. Ubuntu 16.04.2 LTS. On 5/1/17, 7:22 PM, "Will Martin" <wmartin...@outlook.com> wrote: Ubuntu 16.04 LTS - Xenial (HVM) Is this your Xenial version? On 5/1/2017 6:37 PM, Jeff Wartes wrote: > I tri

Re: Solr performance on EC2 linux

2017-05-01 Thread Jeff Wartes
I started with the same three-node 15-shard configuration I’d been used to, in an RF1 cluster. (the index is almost 700G so this takes three r4.8xlarge’s if I want to be entirely memory-resident) I eventually dropped down to a 1/3rd size index on a single node (so 5 shards, 100M docs each) so I

Re: Solr performance on EC2 linux

2017-05-01 Thread Jeff Wartes
We settled on the R4.2XL... The R series is labeled "High-Memory" Which instance type did you end up using? On Mon, May 1, 2017 at 8:22 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 4/28/2017 10:09 AM, Jeff Wartes wrote: > > tldr: Recen

Re: Solr performance on EC2 linux

2017-04-30 Thread Jeff Wartes
with you having such different performance between local and EC2 But thanks for telling us about this! It's totally baffling Erick On Fri, Apr 28, 2017 at 9:09 AM, Jeff Wartes <jwar...@whitepages.com> wrote: > > tldr: Recently, I tried moving an existing

Solr performance on EC2 linux

2017-04-28 Thread Jeff Wartes
tldr: Recently, I tried moving an existing solrcloud configuration from a local datacenter to EC2. Performance was roughly 1/10th what I’d expected, until I applied a bunch of linux tweaks. This should’ve been a straight port: one datacenter server -> one EC2 node. Solr 5.4, Solrcloud, Ubuntu

Re: Collection will not replicate

2017-02-01 Thread Jeff Wartes
Sounds similar to a thread last year: http://lucene.472066.n3.nabble.com/Node-not-recovering-leader-elections-not-occuring-tp4287819p4287866.html On 2/1/17, 7:49 AM, "tedsolr" wrote: I have version 5.2.1. Short of an upgrade, are there any remedies?

Re: Latest advice on G1 collector?

2017-01-26 Thread Jeff Wartes
Adding my anecdotes: I’m using heavily tuned ParNew/CMS. This is a SolrCloud collection, but per-node I’ve got a 28G heap and a 200G index. The large heap turned out to be necessary because certain operations in Lucene allocate memory based on things other than result size, (index size

Re: Latest advice on G1 collector?

2017-01-25 Thread Jeff Wartes
Hah, interesting. The fact that the CMS collector fails back to a *single-threaded* collection on concurrent-mode-failure had me seriously considering trying the Parallel collector a year or two ago. I figured out (and stopped) the queries that were doing the sudden massive allocations that

Re: CREATEALIAS to non-existing collections

2016-12-09 Thread Jeff Wartes
I’d prefer it if the alias was required to be removed, or pointed elsewhere, before the collection could be deleted. As a best practice, I encourage all SolrCloud users to configure an alias to each collection, and use only the alias in their clients. This allows atomic switching between

Re: Memory leak in Solr

2016-12-04 Thread Jeff Wartes
Here’s an earlier post where I mentioned some GC investigation tools: https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3c8f8fa32d-ec0e-4352-86f7-4b2d8a906...@whitepages.com%3E In my experience, there are many aspects of the Solr/Lucene memory allocation model that scale

Re: Queries regarding solr cache

2016-12-01 Thread Jeff Wartes
I found this, which intends to explore the usage of RoaringDocIdSet for solr: https://issues.apache.org/jira/browse/SOLR-9008 This suggests Lucene’s filter cache already uses it, or did at one point: https://issues.apache.org/jira/browse/LUCENE-6077 I was playing with id set implementations

Re: CodaHale metrics for Solr 6?

2016-11-04 Thread Jeff Wartes
Expanding on my comment on the ticket, I’m really quite happy with using codahale/dropwizard metrics with Solr. I don’t know if I’m comfortable just sharing a screenshot of the resulting grafana dashboard, but I’ve got, per-host: - Percentile latencies and rates for GET vs POST (which in

Re: Facets based on sampling

2016-11-04 Thread Jeff Wartes
https://issues.apache.org/jira/browse/SOLR-5894 had some pretty interesting looking work on heuristic counts for facets, among other things. Unfortunately, it didn’t get picked up, but if you don’t mind using Solr 4.10, there’s a jar. On 11/4/16, 12:02 PM, "John Davis"

Re: Result Grouping vs. Collapsing Query Parser -- Can one be deprecated?

2016-10-20 Thread Jeff Wartes
I’ll also mention the choice to improve processing speed by allocating more memory, which increases the importance of GC tuning. This bit me when I tried using it on a larger index. https://issues.apache.org/jira/browse/SOLR-9125 I don’t know if the result grouping feature shares the same

Re: Effects of insert order on query performance

2016-08-12 Thread Jeff Wartes
h routing: https://sematext.com/blog/2015/09/29/solrcloud-large-tenants-and-routing/ Regards, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 11.08.2016 19:39, Je

Effects of insert order on query performance

2016-08-11 Thread Jeff Wartes
This isn’t really a question, although some validation would be nice. It’s more of a warning. Tldr is that the insert order of documents in my collection appears to have had a huge effect on my query speed. I have a very large (sharded) SolrCloud 5.4 index. One aspect of this index is a

Re: Node not recovering, leader elections not occuring

2016-07-19 Thread Jeff Wartes
It sounds like the node-local version of the ZK clusterstate has diverged from the ZK cluster state. You should check the contents of zookeeper and verify the state there looks sane. I’ve had issues (v5.4) on a few occasions where leader election got screwed up to the point where I had to

Re: solrcloud consumes more time than solr when write index

2016-07-13 Thread Jeff Wartes
data? > >Thanks! >Kent > >2016-07-12 23:02 GMT+08:00 Jeff Wartes <jwar...@whitepages.com>: > >> Well, two thoughts: >> >> >> 1. If you’re not using solrcloud, presumably you don’t have any replicas. >> If you are, presumably you do. This makes fo

Re: solrcloud consumes more time than solr when write index

2016-07-12 Thread Jeff Wartes
Well, two thoughts: 1. If you’re not using solrcloud, presumably you don’t have any replicas. If you are, presumably you do. This makes for a biased comparison, because SolrCloud won’t acknowledge a write until it’s been safely written to all replicas. In short, solrcloud write time is

Re: Full re-index without downtime

2016-07-06 Thread Jeff Wartes
A variation on #1 here - Use the same cluster, create a new collection, but use the createNodeSet option to logically partition your cluster so no node has both the old and new collection. If your clients all reference a collection alias, instead of a collection name, then all you need to do

Re: Help with recovering shard range after zookeeper disaster

2016-06-28 Thread Jeff Wartes
This might come a little late to be helpful, but I had a similar situation with Solr 5.4 once. We ended up finding a ZK snapshot we could restore, but we did also get the cluster back up for most of the interim by taking the now-empty ZK cluster, re-uploading the configs that the collections

Re: SolrCloud: Adding a very large collection to a pre-existing cluster

2016-06-21 Thread Jeff Wartes
There’s no official way of doing #1, but there are some less official ways: 1. The Backup/Restore API provides some hooks into loading pre-existing data dirs into an existing collection. Lots of caveats. 2. If you don’t have many shards, there’s always rsync/reload. 3. There are some third-party

Re: Long STW GCs with Solr Cloud

2016-06-17 Thread Jeff Wartes
to promotion failures. I suspect there's a lot of garbage building up. >We're going to run tests with field collapsing disabled and see if that >makes a difference. > >Cas > > >On Thu, Jun 16, 2016 at 1:08 PM, Jeff Wartes <jwar...@whitepages.com> wrote: > >> Check y

Re: Long STW GCs with Solr Cloud

2016-06-16 Thread Jeff Wartes
Check your gc log for CMS “concurrent mode failure” messages. If a concurrent CMS collection fails, it does a stop-the-world pause while it cleans up using a *single thread*. This means the stop-the-world CMS collection in the failure case is typically several times slower than a concurrent

Re: Multiple calls across the distributed nodes for a query

2016-06-15 Thread Jeff Wartes
Any distributed query falls into the two-phase process. Actually, I think some components may require a third phase. (faceting?) However, there are also cases where only a single pass is required. A fl=id,score will only be a single pass, for example, since it doesn’t need to get the field

Re: Solr off-heap FieldCache & HelioSearch

2016-06-03 Thread Jeff Wartes
For what it’s worth, I’d suggest you go into a conversation with Azul with a more explicit “I’m looking to buy” approach. I reached out to them with a more “I’m exploring my options” attitude, and never even got a trial. I get the impression their business model involves a fairly expensive (to

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-01 Thread Jeff Wartes
r on the linux command line I get: > >/opt/solr-5.4.0/server/solr-webapp/webapp/WEB-INF/lib/hon-lucene-synonyms-2.0.0.jar > >But the log file is still carrying class not found exceptions when I >restart... > >Are you in "Cloud" mode? What version of Solr are you using?

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-31 Thread Jeff Wartes
t; > >> https://github.com/LucidWorks/auto-phrase-tokenfilter >> > > > >> > >> > > > >> > Is there anything else out there that you would recommend I look >> > at? >> > > > >> > >> > > > >>

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-05-26 Thread Jeff Wartes
Oh, interesting. I’ve certainty encountered issues with multi-word synonyms, but I hadn’t come across this. If you end up using it with a recent solr verison, I’d be glad to hear your experience. I haven’t used it, but I am aware of one other project in this vein that you might be interested

Re: What if adding 3rd node exceeds replication Factor? [scottchu]

2016-05-25 Thread Jeff Wartes
SolrCloud never creates replicas automatically, unless perhaps you’re using the HDFS-only autoAddReplicas option. Start the new node using the same ZK, and then use the Collections API (https://cwiki.apache.org/confluence/display/solr/Collections+API) to ADDREPLICA. The replicationFactor you

Re: Solr cloud with Grouping query gives inconsistent results

2016-05-23 Thread Jeff Wartes
My first thought is that you haven’t indexed such that all values of the field you’re grouping on are found in the same cores. See the end of the article here: (Distributed Result Grouping Caveats) https://cwiki.apache.org/confluence/display/solr/Result+Grouping And the “Document Routing”

Re: SolrCloud increase replication factor

2016-05-23 Thread Jeff Wartes
https://github.com/whitepages/solrcloud_manager was designed to provide some easier operations for common kinds of cluster operation. It hasn’t been tested with 6.0 though, so if you try it, please let me know your experience. On 5/23/16, 6:28 AM, "Tom Evans"

Re: How to stop searches to solr while full data import is going in SOLR

2016-05-23 Thread Jeff Wartes
The PingRequestHandler contains support for a file check, which allows you to control whether the ping request succeeds based on the presence/absence of a file on disk on the node. http://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/handler/PingRequestHandler.html I suppose you could

Re: SolrCloud replicas consistently out of sync

2016-05-19 Thread Jeff Wartes
That case related to consistency after a ZK outage or network connectivity issue. Your case is standard operation, so I’m not sure that’s really the same thing. I’m aware of a few issues that cam happen if ZK connectivity goes wonky, that I hope are fixed in SOLR-8697. This one might be a

Re: state.json being downloaded every 10 seconds

2016-05-16 Thread Jeff Wartes
have replicas B and C. > >What the "something" is that sends requests I'm not quite sure, but >that's a place >to start. > >Best, >Erick > >On Mon, May 16, 2016 at 11:08 AM, Jeff Wartes <jwar...@whitepages.com> wrote: >> >> I have a solr 5.4 clus

state.json being downloaded every 10 seconds

2016-05-16 Thread Jeff Wartes
I have a solr 5.4 cluster with three collections, A, B, C. Nodes either host replicas for collection A, or B and C. Collections B and C are not currently used - no inserts or queries. Collection A is getting significant query traffic, but no insert traffic, and queries are only directed to

Re: Passing Ids in query takes more time

2016-05-05 Thread Jeff Wartes
An ID lookup is a very simple and fast query, for one ID. Or’ing a lookup for 80k ids though is basically 80k searches as far as Solr is concerned, so it’s not altogether surprising that it takes a while. Your complaint seems to be that the query planner doesn’t know in advance that should be

Re: Solr 5.2.1 on Java 8 GC

2016-04-28 Thread Jeff Wartes
Shawn Heisey’s page is the usual reference guide for GC settings: https://wiki.apache.org/solr/ShawnHeisey Most of the learnings from that are in the Solr 5.x startup scripts already, but your heap is bigger, so your mileage may vary. Some tools I’ve used while doing GC tuning: * VisualVM -

Re: Replicas for same shard not in sync

2016-04-27 Thread Jeff Wartes
some retry logic in the code that distributes the updates from >the leader as well. > >Best, >Erick > >On Tue, Apr 26, 2016 at 12:51 PM, Jeff Wartes <jwar...@whitepages.com> wrote: >> >> At the risk of thread hijacking, this is an area where I don’t know I full

Re: Replicas for same shard not in sync

2016-04-26 Thread Jeff Wartes
At the risk of thread hijacking, this is an area where I don’t know I fully understand, so I want to make sure. I understand the case where a node is marked “down” in the clusterstate, but what if it’s down for less than the ZK heartbeat? That’s not unreasonable, I’ve seen some

Re: Indexing 700 docs per second

2016-04-19 Thread Jeff Wartes
I have no numbers to back this up, but I’d expect Atomic Updates to be slightly slower than a full update, since the atomic approach has to retrieve the fields you didn't specify before it can write the new (updated) document. On 4/19/16, 11:54 AM, "Tim Robertson"

Re: Adding replica on solr - 5.50

2016-04-14 Thread Jeff Wartes
I’m all for finding another way to make something work, but I feel like this is the wrong advice. There are two options: 1) You are doing something wrong. In which case, you should probably invest in figuring out what. 2) Solr is doing something wrong. In which case, you should probably invest

Re: HTTP Client Only

2016-04-14 Thread Jeff Wartes
If you’re already using java, just use the CloudSolrClient. If you’re using the default router, (CompositeId) it’ll figure out the leaders and send documents to the right place for you. If you’re not using java, then I’d still look there for hints on how to duplicate the functionality. On

Re: SolrCloud backup/restore

2016-04-05 Thread Jeff Wartes
There is some automation around this process in the backup commands here: https://github.com/whitepages/solrcloud_manager It’s been tested with 5.4, and will restore arbitrary replication factors. Ever assuming the shared filesystem for backups, of course. On 4/5/16, 3:18 AM, "Reth RM"

Re: SolrCloud no leader for collection

2016-04-05 Thread Jeff Wartes
I recall I had some luck fixing a leader-less shard (after a ZK quorum failure) by forcably removing the records for the down-state replicas from the leader election list, and then forcing an election. The ZK path looks like collections//leader_elect/shardX/election. Usually you’ll find the

Re: Separating cores from Solr home

2016-03-03 Thread Jeff Wartes
It’s a bit backwards feeling, but I’ve had luck setting the install dir and solr home, instead of the data dir. Something like: -Dsolr.solr.home=/data/solr -Dsolr.install.dir=/opt/solr So all of the Solr files are in in /opt/solr and all of the index/core related files end up in /data/solr.

Re: XX:ParGCCardsPerStrideChunk

2016-03-03 Thread Jeff Wartes
I've experimented with that a bit, and Shawn added my comments in IRC to his Solr/GC page here: https://wiki.apache.org/solr/ShawnHeisey The relevant bit: "With values of 4096 and 32768, the IRC user was able to achieve 15% and 19% reductions in average pause time, respectively, with the

Re: SolrCloud - Strategy for recovering cluster states

2016-03-02 Thread Jeff Wartes
n zookeeper? > > > >Your tool is very interesting, I just thought about writing such a tool >myself. >From the sources I understand that you represent each node as a path in the >git repository. >So, I guess that for restore purposes I will have to do >the opposite direction a

Re: SolrCloud - Strategy for recovering cluster states

2016-03-01 Thread Jeff Wartes
I’ve been running SolrCloud clusters in various versions for a few years here, and I can only think of two or three cases that the ZK-stored cluster state was broken in a way that I had to manually intervene by hand-editing the contents of ZK. I think I’ve seen Solr fixes go by for those

Re: Shard State vs Replica State

2016-02-26 Thread Jeff Wartes
I believe the shard state is a reflection of whether that shard is still in use by the collection, and has nothing to do with the state of the replicas. I think doing a split-shard operation would create two new shards, and mark the old one as inactive, for example. On 2/26/16, 8:50 AM,

Re: very slow frequent updates

2016-02-24 Thread Jeff Wartes
;> of >> > SOLR as the field which is the basis of the sort is not included in the >> > schema for example the price. The customer wants the list in descending >> > order of the price. >> > >> > So I have to get all the 1000 docids from solr an

Re: very slow frequent updates

2016-02-23 Thread Jeff Wartes
My suggestion would be to split your problem domain. Use Solr exclusively for search - index the id and only those fields you need to search on. Then use some other data store for retrieval. Get the id’s from the solr results, and look them up in the data store to get the rest of your fields.

Re: Adding nodes

2016-02-17 Thread Jeff Wartes
Solrcloud does not come with any autoscaling functionality. If you want such a thing, you’ll need to write it yourself. https://github.com/whitepages/solrcloud_manager might be a useful head start though, particularly the “fill” and “cleancollection” commands. I don’t do *auto* scaling, but I

Re: Shard allocation across nodes

2016-02-01 Thread Jeff Wartes
You could write your own snitch: https://cwiki.apache.org/confluence/display/solr/Rule-based+Replica+Placement Or, it would be more annoying, but you can always add/remove replicas manually and juggle things yourself after you create the initial collection. On 2/1/16, 8:42 AM, "Tom Evans"

Re: Restoring backups of solrcores

2016-02-01 Thread Jeff Wartes
Aliases work when indexing too. Create collection: collection1 Create alias: this_week -> collection1 Index to: this_week Next week... Create collection: collection2 Create (Move) alias: this_week -> collection2 Index to: this_week On 2/1/16, 2:14 AM, "vidya" wrote:

Re: collection aliasing

2016-01-28 Thread Jeff Wartes
I enjoy using collection aliases in all client references, because that allows me to change the collection all clients use without updating the clients. I just move the alias. This is particularly useful if I’m doing a full index rebuild and want an atomic, zero-downtime switchover. On

Re: SolrCloud replicas out of sync

2016-01-27 Thread Jeff Wartes
On 1/27/16, 8:28 AM, "Shawn Heisey" wrote: > >I don't think any documentation states this, but it seems like a good >idea to me use an alias from day one, so that you always have the option >of swapping the "real" collection that you are using without needing to >change

Re: SolrCloud replicas out of sync

2016-01-27 Thread Jeff Wartes
If you can identify the problem documents, you can just re-index those after forcing a sync. Might save a full rebuild and downtime. You might describe your cluster setup, including ZK. it sounds like you’ve done your research, but improper ZK node distribution could certainly invalidate some

Re: SolrCloud replicas out of sync

2016-01-26 Thread Jeff Wartes
My understanding is that the "version" represents the timestamp the searcher was opened, so it doesn’t really offer any assurances about your data. Although you could probably bounce a node and get your document counts back in sync (by provoking a check), it’s interesting that you’re in this

Re: SolrCloud replicas out of sync

2016-01-26 Thread Jeff Wartes
t; >>> >>> You might watch the achieved replication factor of your updates and see if >>> it ever changes >>> > >This is a good tip. I’m not sure I like the implication that any failure to >write all 3 of our replicas must be retried at the app layer. Is t

Re: SolrCloud: Setting/finding node names for deleting replicas

2016-01-08 Thread Jeff Wartes
be... > >=xxx > >btw, for your app, isn't "slice" old notation? > > > > >On 08/01/16 22:05, Jeff Wartes wrote: >> >> I’m pretty sure you could change the name when you ADDREPLICA using a >> core.name property. I don’t know if you can when you

Re: SolrCloud: Setting/finding node names for deleting replicas

2016-01-08 Thread Jeff Wartes
I’m pretty sure you could change the name when you ADDREPLICA using a core.name property. I don’t know if you can when you initially create the collection though. The CLUSTERSTATUS command will tell you the core names:

Re: How to check when a search exceeds the threshold of timeAllowed parameter

2015-12-23 Thread Jeff Wartes
Looks like it’ll set partialResults=true on your results if you hit the timeout. https://issues.apache.org/jira/browse/SOLR-502 https://issues.apache.org/jira/browse/SOLR-5986 On 12/22/15, 5:43 PM, "Vincenzo D'Amore" wrote: >Well... I can write everything, but

Re: Moving to SolrCloud, specifying dataDir correctly

2015-12-14 Thread Jeff Wartes
Don’t set solr.data.dir. Instead, set the install dir. Something like: -Dsolr.solr.home=/data/solr -Dsolr.install.dir=/opt/solr I have many solrcloud collections, and separate data/install dirs, and I’ve never had to do anything with manual per-collection or per-replica data dirs. That said,

Re: Fully automated replica creation in AWS

2015-12-09 Thread Jeff Wartes
It’s a pretty common misperception that since solr scales, you can just spin up new nodes and be done. Amazon ElasticSearch and older solrcloud getting-started docs encourage this misperception, as does the HDFS-only autoAddReplicas flag. I agree that auto-scaling should be approached carefully,

Re: Solrcloud: 1 server, 1 configset, multiple collections, multiple schemas

2015-12-04 Thread Jeff Wartes
If you want two different collections to have two different schemas, those collections need to reference two different configsets. So you need another copy of your config available using a different name, and to reference that other name when you create the second collection. On 12/4/15, 6:26

Re: Solr 5: Schema.xml vs. Managed Schema - which is advisable?

2015-12-03 Thread Jeff Wartes
I’ve never used the managed schema, so I’m probably biased, but I’ve never seen much of a point to the Schema API. I need to make changes sometimes to solrconfig.xml, in addition to schema.xml and other config files, and there’s no API for those, so my process has been like: 1. Put the entire

Re: How to list all collections in solr-4.7.2

2015-12-03 Thread Jeff Wartes
Looks like LIST was added in 4.8, so I guess you’re stuck looking at ZK, or finding some tool that looks in ZK for you. The zkCli.sh that ships with zookeeper would probably suffice for a one-off manual inspection: https://zookeeper.apache.org/doc/trunk/zookeeperStarted.html#sc_ConnectingT

Re: Data Import Handler / Backup indexes

2015-11-23 Thread Jeff Wartes
dentally and the DIH cannot be run >because the database is unavailable. > >Our collection is simple: 2 nodes - 1 collection - 2 shards with 2 >replicas >each > >So a simple copy (cp command) for both the nodes/shards might work for us? >How do I restore the data back?

Re: replica recovery

2015-11-19 Thread Jeff Wartes
he >limit on each server but it isn't clear to me how high it should be or if >raising the limit will cause new problems. > >Any advice you could provide in this situation would be awesome! > >Cheers, >Brian > > > >> On Oct 27, 2015, at 20:50, Jeff Wartes <jwar

Re: Data Import Handler / Backup indexes

2015-11-17 Thread Jeff Wartes
https://github.com/whitepages/solrcloud_manager supports 5.x, and I added some backup/restore functionality similar to SOLR-5750 in the last release. Like SOLR-5750, this backup strategy requires a shared filesystem, but note that unlike SOLR-5750, I haven’t yet added any backup functionality

Re: Facet queries blow out the filterCache

2015-10-28 Thread Jeff Wartes
FWIW, since it seemed like there was at least one bug here (and possibly more), I filed https://issues.apache.org/jira/browse/SOLR-8171 On 10/6/15, 3:58 PM, "Jeff Wartes" <jwar...@whitepages.com> wrote: > >I dug far enough yesterday to find the GET_DOCSET, but not f

Re: replica recovery

2015-10-27 Thread Jeff Wartes
On the face of it, your scenario seems plausible. I can offer two pieces of info that may or may not help you: 1. A write request to Solr will not be acknowledged until an attempt has been made to write to all relevant replicas. So, B won’t ever be missing updates that were applied to A, unless

Re: copy data between collection

2015-10-26 Thread Jeff Wartes
The “copy” command in this tool automatically does what Upayavira describes, including bringing the replicas up to date. (if any) https://github.com/whitepages/solrcloud_manager I’ve been using it as a mechanism for copying a collection into a new cluster (different ZK), but it should work

Re: DevOps question : auto deployment/setup of Solr & Zookeeper on medium-large clusters

2015-10-20 Thread Jeff Wartes
If you’re using AWS, there’s this: https://github.com/LucidWorks/solr-scale-tk If you’re using chef, there’s this: https://github.com/vkhatri/chef-solrcloud (There are several other chef cookbooks for Solr out there, but this is the only one I’m aware of that supports Solr 5.3.) For ZK, I’m

Re: are there any SolrCloud supervisors?

2015-10-14 Thread Jeff Wartes
I’m aware of two public administration tools: This was announced to the list just recently: https://github.com/bloomreach/solrcloud-haft And I’ve been working in this: https://github.com/whitepages/solrcloud_manager Both of these hook the Solrcloud client’s ZK access to inspect the cluster state

Re: Facet queries blow out the filterCache

2015-10-06 Thread Jeff Wartes
I dug far enough yesterday to find the GET_DOCSET, but not far enough to find why. Thanks, a little context is really helpful sometimes. So, starting with an empty filterCache... http://localhost:8983/solr/techproducts/select?q=name:foo=1=true =popularity New values: lookups: 0,

Re: Facet queries blow out the filterCache

2015-10-02 Thread Jeff Wartes
ert, but not a lookup, so the cache hit ratio is always exactly 1. On 10/2/15, 4:18 AM, "Toke Eskildsen" <t...@statsbiblioteket.dk> wrote: >On Thu, 2015-10-01 at 22:31 +, Jeff Wartes wrote: >> It still inserts if I address the core directly and use distrib=f

Facet queries blow out the filterCache

2015-10-01 Thread Jeff Wartes
I’m doing some fairly simple facet queries in a two-shard 5.3 SolrCloud index on fields like this:

Re: Facet queries blow out the filterCache

2015-10-01 Thread Jeff Wartes
; wrote: >what if you set f.city.facet.limit=-1 ? > >On Thu, Oct 1, 2015 at 7:43 PM, Jeff Wartes <jwar...@whitepages.com> >wrote: > >> >> I’m doing some fairly simple facet queries in a two-shard 5.3 SolrCloud >> index on fields like this: >> >> > docValue

Re: Facet queries blow out the filterCache

2015-10-01 Thread Jeff Wartes
stributed requests, it expained here >https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Over-Re >questParameters >eg does it happen if you run with distrib=false? > >On Fri, Oct 2, 2015 at 12:27 AM, Jeff Wartes <jwar...@whitepages.com> >wrote: > &

Re: Cost of having multiple search handlers?

2015-09-29 Thread Jeff Wartes
ibute it. We’ve been running it in production for a year, >but the config is pretty manual. > >wunder >Walter Underwood >wun...@wunderwood.org >http://observer.wunderwood.org/ (my blog) > > >> On Sep 28, 2015, at 4:41 PM, Jeff Wartes <jwar...@whitepages.com> wrote: >

Re: Cost of having multiple search handlers?

2015-09-28 Thread Jeff Wartes
One would hope that https://issues.apache.org/jira/browse/SOLR-4735 will be done by then. On 9/28/15, 11:39 AM, "Walter Underwood" wrote: >We did the same thing, but reporting performance metrics to Graphite. > >But we won’t be able to add servlet filters in 6.x,

Re: How to know index file in OS Cache

2015-09-25 Thread Jeff Wartes
I’ve been relying on this: https://code.google.com/archive/p/linux-ftools/ fincore will tell you what percentage of a given file is in cache, and fadvise can suggest to the OS that a file be cached. All of the solr start scripts at my company first call fadvise (FADV_WILLNEED) on all the

Autowarm and filtercache invalidation

2015-09-24 Thread Jeff Wartes
If I configure my filterCache like this: and I have <= 10 distinct filter queries I ever use, does that mean I’ve effectively disabled cache invalidation? So my cached filter query results will never change? (short of JVM restart) I’m unclear on whether autowarm simply copies the value into

Re: Autowarm and filtercache invalidation

2015-09-24 Thread Jeff Wartes
of whether it was populated via autowarm. On 9/24/15, 11:28 AM, "Jeff Wartes" <jwar...@whitepages.com> wrote: > >If I configure my filterCache like this: >autowarmCount="10"/> > >and I have <= 10 distinct filter queries I ever use, does that mean I’ve

Re: Cached fq decreases performance

2015-09-04 Thread Jeff Wartes
On 9/4/15, 7:06 AM, "Yonik Seeley" wrote: > >Lucene seems to always be changing it's execution model, so it can be >difficult to keep up. What version of Solr are you using? >Lucene also changed how filters work, so now, a filter is >incorporated with the query like so: >

Re: Cached fq decreases performance

2015-09-03 Thread Jeff Wartes
Tokenizers, Filters, URPs and even a newsletter: >http://www.solr-start.com/ > > >On 3 September 2015 at 16:45, Jeff Wartes <jwar...@whitepages.com> wrote: >> >> I have a query like: >> >> q==enabled:true >> >> For purposes of this conversation

Cached fq decreases performance

2015-09-03 Thread Jeff Wartes
I have a query like: q==enabled:true For purposes of this conversation, "fq=enabled:true" is set for every query, I never open a new searcher, and this is the only fq I ever use, so the filter cache size is 1, and the hit ratio is 1. The fq=enabled:true clause matches about 15% of my

Re: Solr API for getting shard's leader/replica status

2014-09-08 Thread Jeff Wartes
I had a similar need. The resulting tool is in scala, but it still might be useful to look at. I had to work through some of those same issues: https://github.com/whitepages/solrcloud_manager From a clusterstate perspective, I mostly cared about active vs non-active, so here¹s a sample output

Re: Solr Sharding Help

2014-09-08 Thread Jeff Wartes
You need to specify a replication factor of 2 if you want two copies of each shard. Solr doesn¹t ³auto fill² available capacity, contrary to the misleading examples on the http://wiki.apache.org/solr/SolrCloud page. Those examples only have that behavior because they ask you to copy the examples

  1   2   >