Re: SolrCloud 5.1 startup looking for standalone config

2015-06-05 Thread tuxedomoon
>> I would need to look at the code to figure out how it works, but I would >> imagine that the shards are shuffled randomly among the hosts so that >> multiple collections will be evenly distributed across the cluster. It >> would take me quite a while to familiarize myself with the code before I

Re: SolrCloud 5.1 startup looking for standalone config

2015-06-03 Thread tuxedomoon
Yes adding _solr worked, thx. But I also had to populate the SOLR_HOST param for each of the 4 hosts, as in SOLR_HOST=ec2-52-4-232-216.compute-1.amazonaws.com. I'm in an EC2 VPN environment which might be the problem. This command now works (leaving off port) http://s1/solr/admin/collections?a

Re: SolrCloud 5.1 startup looking for standalone config

2015-06-02 Thread tuxedomoon
I ran this command with Solr hosts s1 & s2 running. http://s1:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&collection.configName=mycollection_cloud_conf&createNodeSet=s1:8983,s2:8983 I referred to this link

Re: SolrCloud 5.1 startup looking for standalone config

2015-06-02 Thread tuxedomoon
ok thanks, continuing... >> numShards in SOLR_OPTS isn't a good idea, what happens if you want to >> create a collection with 5 shards?) yes I was following my old pattern CATALINA_OPTS="${CATALINA_OPTS} -DnumShards=n >> down the nodes and nuke the directories you created by hand and bring the >>

SolrCloud 5.1 startup looking for standalone config

2015-06-01 Thread tuxedomoon
I followed these steps and I am unable to launch in cloud mode. 1. created / started 3 external Zookeeper hosts: zk1, zk2, zk3 2. installed Solr 5.1 as a service called solrsvc on two hosts: s1, s2 3. uploaded a configset to zk1 (solr home is /volume/solr/data) -

Re: Reindex of document leaves old fields behind

2015-05-22 Thread tuxedomoon
This is fixed. My SolrJ client was putting a JSON object into a multivalued field in the SolrInputDocument. Solr returned a 0 status code but did not add the bad object, instead it performed what looks like an atomic index as described above. Once I removed the illegal JSON object from the SolrI

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm posting the fields from one of my problem document, based on this comment I found from Shawn on Grokbase. >> If you are trying to use a Map object as the value of a field, that is >> probably why it is interpreting your add request as an atomic update. >> If this is the case, and you're doin

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
a few further clues to this unresolved problem 1. I found one of my 5 zookeeper instances was down 2. I tried another reindex of a bad document but no change on the SOLR side 3. I deleted and reindexed the same doc, that worked (obviously, but at this point I don't know what to expect) -- View

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm relying on an autocommit of 60 secs. I just ran the same test via my SolrJ client and result was the same, SolrCloud query always returns correct number of fields. Is there a way to find out which shard and replica a particular document lives on? -- View this message in context: http://

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
OK it is composite I've just used post.sh to index a test doc with 3 fields to leader 1 of my SolrCloud. I then reindexed it with 1 field removed and the query on it shows 2 fields. I repeated this a few times and always get the correct field count from Solr. I'm now wondering if SolrJ is so

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
I'm doing all my index to leader 1 and have not specified any router configuration. But there is an equal distribution of 240M docs across 5 shards. I think I've been stating I have 3 shards in these posts, I have 5, sorry. How do I know what kind of routing I am using? -- View this messag

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
l>> If it is "implicit" then >> you may have indexed the new document to a different shard, which means >> that it is now in your index more than once, and which one gets returned >> may not be predictable. If a document with uniqueKey "1234" is assigned to a shard by SolrCloud, implicit routing w

Re: Reindex of document leaves old fields behind

2015-05-21 Thread tuxedomoon
>> let's see the code. simplified code and some comments 1. solrUrl points at leader 1 of 3 leaders, each with a replica 2. createSolrDoc takes a full Mongo doc and returns a valid SolrInputDocument 3. I have done dumps of the returned solrDoc and verified it does not have the unwanted fiel

Re: Reindex of document leaves old fields behind

2015-05-20 Thread tuxedomoon
The uniqueKey value is the same. The new documents contain fewer fields than the already indexed ones. Could this cause the updates to be treated as atomic? With the persisting fields treated as un-updated? Routing should be implicit since the collection was created using numShards. Many req

Reindex of document leaves old fields behind

2015-05-20 Thread tuxedomoon
I'm reindexing Mongo docs into SolrCloud. The new docs have had a few fields removed so upon reindexing those fields should be gone in Solr. They are not. So the result is a new doc merged with an old doc rather than a replacement which is what I need. I do not know whether the issue is with my

Re: Can a single SolrServer instance update multiple collections?

2015-03-11 Thread tuxedomoon
@Shawn I'm getting the Bad Request again, with the original code snippet I posted, it appears to be an 'illegal' string field. SOLR log - INFO: {add=[mgid:arc:content:jokers.com:694d5bf8-ecfd-11e0-aca6-0026b9414f30]} 0 7

Re: Can a single SolrServer instance update multiple collections?

2015-03-11 Thread tuxedomoon
@Shawn, I can definitely upgrade to SolrJ 4.x and would prefer that so as to target 4.x cores as well. I'm already on Java 7. One attempt I made was this UpdateRequest updateRequest = new UpdateRequest(); updateRequest.setParam("collection", collectionName); updateRequest.setMethod

Can a single SolrServer instance update multiple collections?

2015-03-11 Thread tuxedomoon
I have a SolrJ application that reads from a Redis queue and updates different collections based on the message content. New collections are added without my knowledge, so I am creating SolrServer objects on the fly as follows: def solrHost = "http://myhost/solr/"; (defined at startup) d

How to direct SOLR 4.9 log output to regular Tomcat logs

2015-03-06 Thread tuxedomoon
I want SOLR 4.9 to log to my rolling tomcat logs like catalina.2015-03-06.log. Instead I'm just getting a solr.log with no timestamp. Maybe this is this just the way it has to be now? I'm also not sure if I need to copy more SOLR jars into my tomcat lib. This is my setup. tomcat6/conf/log4j

Re: Does shard splitting double host count

2015-03-02 Thread tuxedomoon
Shawn, in light of Garth's response below "You can't just add a new core to an existing collection. You can add the new node to the cloud, but it won't be part of any collection. You're not going to be able to just slide it in as a 4th shard to an established collection of 3 shards." how is it

Re: Does shard splitting double host count

2015-02-27 Thread tuxedomoon
I'd forgotten that DzkHost refers to the Zookeeper hosts not SOLR hosts. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4189703.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Does shard splitting double host count

2015-02-27 Thread tuxedomoon
What about adding one new leader/replica pair? It seems that would entail a) creating the r3.large instances and volumes b) adding 2 new Zookeeper hosts? c) updating my Zookeeper configs (new hosts, new ids, new SOLR config) d) restarting all ZKs e) restarting SOLR hosts in sequence needed for co

Does shard splitting double host count

2015-02-27 Thread tuxedomoon
I currently have a SolrCloud with 3 shards + replicas, it is holding 130M documents and the r3.large hosts are running out of memory. As it's on 4.2 there is no shard splitting, I will have to reindex to a 4.3+ version. If I had that feature would I need to split each shard into 2 subshards result

Re: SolrCloud OOM Problem

2014-08-13 Thread tuxedomoon
I applied the OPTS you pointed me to, here's the full string: CATALINA_OPTS="${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms12288m -Xmx12288m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+CMSScavengeBeforeRemark -

Re: SolrCloud OOM Problem

2014-08-13 Thread tuxedomoon
Have you used a queue to intercept queries and if so what was your implementation? We are indexing huge amounts of data from 7 SolrJ instances which run independently, so there's a lot of concurrent indexing. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Pr

Re: SolrCloud OOM Problem

2014-08-13 Thread tuxedomoon
Great info. Can I ask how much data you are handling with that 6G or 7G heap? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152712.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud OOM Problem

2014-08-12 Thread tuxedomoon
I have modified my instances to m2.4xlarge 64-bit with 68.4G memory. Hate to ask this but can you recommend Java memory and GC settings for 90G data and the above memory? Currently I have CATALINA_OPTS="${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms5120m -Xmx5120m -XX:+UseParNewGC