Solr 4.10.x regression in map-reduce contrib

2015-04-21 Thread ralph tice
Hello list, I'm using mapreduce from contrib and I get this stack trace: https://gist.github.com/ralph-tice/b1e84bdeb64532c7ecab Whenever I specify luceneMatchVersion4.10/luceneMatchVersion in my solrconfig.xml. 4.9 works fine. I'm using 4.10.4 artifacts for both map reduce runs. I tried

Re: change maxShardsPerNode for existing collection?

2015-04-08 Thread ralph tice
It looks like there's a patch available: https://issues.apache.org/jira/browse/SOLR-5132 Currently the only way without that patch is to hand-edit clusterstate.json, which is very ill advised. If you absolutely must, it's best to stop all your Solr nodes, backup the current clusterstate in ZK,

Too many merges, stalling...

2015-02-16 Thread ralph tice
We index lots of relatively small documents, minimum of around 6k/second, but up to 20k/second. At the same time we are deleting 400-900 documents a second. We have our shards organized by time, so the bulk of our indexing happens in one 'hot' shard, but deletes can go back in time to our epoch.

Re: [MASSMAIL]Re: Trending functionality in Solr

2015-02-08 Thread ralph tice
You might want to also look at Trulia's Thoth project https://github.com/trulia/thoth-ml/ -- it doesn't supply this feature out of the box, but it gives you a nice framework for implementing it. On Sun, Feb 8, 2015 at 5:07 PM, Jorge Luis Betancourt González jlbetanco...@uci.cu wrote: For a

Re: How large is your solr index?

2014-12-29 Thread ralph tice
Like all things it really depends on your use case. We have 160B documents in our largest SolrCloud and doing a *:* to get that count takes ~13-14 seconds. Doing a text:happy query only takes ~3.5-3.6 seconds cold, subsequent queries for the same terms take 500ms. We have a little over 3TB of

Re: Backuping SolrCloud

2014-11-24 Thread ralph tice
I have a writeup of how to perform safe backups here: https://gist.github.com/ralph-tice/887414a7f8082a0cb828 There are some tickets around this work to further the ease of backups, especially https://issues.apache.org/jira/browse/SOLR-5750 On Mon, Nov 24, 2014 at 9:45 AM, Vivek Pathak vpat

Re: Dealing with bad apples in a SolrCloud cluster

2014-11-21 Thread ralph tice
bq. We ran into one of failure modes that only AWS can dream up recently, where for an extended amount of time, two nodes in the same placement group couldn't talk to one another, but they could both see Zookeeper, so nothing was marked as down. I had something similar happen with one of my

Re: Migrating shards

2014-11-07 Thread ralph tice
I think ADD/DELETE replica APIs are best for within a SolrCloud, however if you need to move data across SolrClouds you will have to resort to older APIs, which I didn't find good documentation of but many references to. So I wrote up the instructions to do so here: https://gist.github.com/ralph

Re: SolrCloud shard distribution with Collections API

2014-11-06 Thread ralph tice
I've had a bad enough experience with the default shard placement that I create a collection with one shard, add the shards where I want them, then use add/delete replica to move the first one to the right machine/port. Typically this is in a SolrCloud of dozens or hundreds of shards. Our shards

Re: Loading an index (generated by map reduce) in SolrCloud

2014-09-17 Thread ralph tice
FWIW, I do a lot of moving Lucene indexes around and as long as the core is unloaded it's never been an issue for Solr to be running at the same time. If you move a core into the correct hierarchy for a replica, you can call the Collections API's CREATESHARD action with the appropriate params

Re: Loading an index (generated by map reduce) in SolrCloud

2014-09-17 Thread ralph tice
not reload the collection or core. Have not tried re-starting the solr cloud. Can someone point out the best way to achieve the goal? I prefer not to re-start solr cloud. Shushuai From: ralph tice ralph.t...@gmail.com To: solr-user@lucene.apache.org Sent

ADDREPLICA doesn't respect requested solr_port assignment, replicas can report green w/o replicating

2014-08-24 Thread ralph tice
Hi all, Two issues, first, when I issue an ADDREPLICA call like so: http://localhost:8983/solr/admin/collections?action=ADDREPLICAshard=myshardcollection=mycollectioncreateNodeSet=solr18.mycorp.com:8983_solr It does not seem to respect the 8983_solr designation in the createNodeSet parameter

Re: ADDREPLICA doesn't respect requested solr_port assignment, replicas can report green w/o replicating

2014-08-24 Thread ralph tice
/markrmiller On August 24, 2014 at 12:35:13 PM, ralph tice (ralph.t...@gmail.com) wrote: Hi all, Two issues, first, when I issue an ADDREPLICA call like so: http://localhost:8983/solr/admin/collections?action=ADDREPLICAshard=myshardcollection=mycollectioncreateNodeSet=solr18.mycorp.com

Re: Announcing Splainer -- Open Source Solr Sandbox

2014-08-22 Thread ralph tice
What are the dependencies here in terms of solr config? Looks like it's dependent on highlighting at a minimum? I tried the example url and got a 500 with this stack trace once I inspected the response of the generated URI: java.lang.NullPointerException at

Creating new replicas, replication reports false positive success

2014-06-17 Thread ralph tice
, as well as clusterstate for the shard in question, which describe what I see via the UI also -- the newly created replica shard erroneously thinks it has fully replicated. https://gist.github.com/ralph-tice/18796de6393f48fb0192 The logs are after issuing a REQUESTRECOVERY call. The only message

Re: Help me understand these newrelic graphs

2014-03-13 Thread ralph tice
I think your response time is including the average response for an add operation, which generally returns very quickly and due to sheer number are averaging out the response time of your queries. New Relic should break out requests based on which handler they're hitting but they don't seem to.

UpdateHandler issues with SolrCloud 4.7

2014-03-10 Thread ralph tice
We have a cluster running SolrCloud 4.7 built 2/25. 10 shards with 2 replicas each (20 shards total) at about ~20GB/shard. We index around 1k-1.5k documents/second into this cluster constantly. To manage growth we have a scheduled job that runs every 3 hours to prune documents based on business