Segment information gets deleted

2017-06-08 Thread Chetas Joshi
Hi, I am trying to understand what the possible root causes for the following exception could be. java.io.FileNotFoundException: File does not exist: hdfs://*/*/*/*/data/index/_2h.si I had some long GC pauses while executing some queries which took some of the replicas down. But how can that

Re: Solr coreContainer shut down

2017-05-23 Thread Chetas Joshi
n 5/19/2017 5:05 PM, Chetas Joshi wrote: > > If I don't wanna upgrade and there is an already installed service, why > > should it be exit 1 and not exit 0? Shouldn't it be like > > > > if [ ! "$SOLR_UPGRADE" = "YES" ]; then > > > > if [

Re: Solr coreContainer shut down

2017-05-19 Thread Chetas Joshi
lr is already setup as a service on this host? To upgrade Solr use the -f option." *exit 0* fi Thanks! On Fri, May 19, 2017 at 1:59 PM, Chetas Joshi <chetas.jo...@gmail.com> wrote: > Hello, > > I am trying to set up a solrCloud (6.5.0/6.5.1). I have installed Solr as &

Solr coreContainer shut down

2017-05-19 Thread Chetas Joshi
Hello, I am trying to set up a solrCloud (6.5.0/6.5.1). I have installed Solr as a service. Every time I start solr servers, they come up but one by one the coreContainers start shutting down on their own within 1-2 minutes of their being up. Here are the solr logs 2017-05-19 20:45:30.926 INFO

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-13 Thread Chetas Joshi
017 at 7:36 PM, Shawn Heisey <apa...@elyograg.org> wrote: > > On 4/12/2017 5:19 PM, Chetas Joshi wrote: > >> I am getting back 100K results per page. > >> The fields have docValues enabled and I am getting sorted results based > on "id" and 2 more fi

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-12 Thread Chetas Joshi
g options. > > Best, > Erick > > On Wed, Apr 12, 2017 at 12:59 PM, Chetas Joshi <chetas.jo...@gmail.com> > wrote: > > I am running a query that returns 10 MM docs in total and the number of > > rows per page is 100K. > > > > On Wed, Apr 12, 2017 at

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-12 Thread Chetas Joshi
I am running a query that returns 10 MM docs in total and the number of rows per page is 100K. On Wed, Apr 12, 2017 at 12:53 PM, Mikhail Khludnev <gge...@gmail.com> wrote: > And what is the rows parameter? > > 12 апр. 2017 г. 21:32 пользователь "Chetas Joshi" <cheta

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-12 Thread Chetas Joshi
’t even get > that fussy. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Apr 11, 2017, at 8:22 PM, Shawn Heisey <apa...@elyograg.org> wrote: > > > > On 4/11/2017 2:56 PM, Chetas Joshi wrote:

Long GC pauses while reading Solr docs using Cursor approach

2017-04-11 Thread Chetas Joshi
Hello, I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold collection with number of shards = 80 and replication Factor=2 Sold JVM heap size = 20 GB solr.hdfs.blockcache.enabled = true solr.hdfs.blockcache.direct.memory.allocation = true MaxDirectMemorySize = 25 GB I am querying a solr

Re: CloudSolrClient stuck in a loop with a recurring exception

2017-02-22 Thread Chetas Joshi
Yes, it is scala. And yes, I just wanted to confirm that I had to add exception handling and break out of the loop. Chetas. On Wed, Feb 22, 2017 at 4:25 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 2/22/2017 4:59 PM, Chetas Joshi wrote: > > 2017-02-22 15:

CloudSolrClient stuck in a loop with a recurring exception

2017-02-22 Thread Chetas Joshi
Hello, I am using Solr 5.5.1. Solr Cloud of 80 nodes deployed on HDFS. To get back results from Solr, I use the cursor approach and the cloudSolrClient object. While a query was running, I took the solr Cloud down. The client got stuck in a loop with the following exception: 2017-02-22

Re: A collection gone missing: uninteresting collection

2017-01-21 Thread Chetas Joshi
Is this visible in the logs? I mean how do I find out that a "DELETE collection" API​ call was made? Is the following indicative of the fact that the API call was made? 2017-01-20 20:42:39,822 INFO org.apache.solr.cloud. ShardLeaderElectionContextBase: Removing leader registration node on

A collection gone missing: uninteresting collection

2017-01-20 Thread Chetas Joshi
Hello, I have been running Solr (5.5.0) on HDFS. Recently a collection just went missing with all the instanceDirs and Datadirs getting deleted. The following logs in the solrCloud overseer. 2017-01-20 20:42:39,515 INFO org.apache.solr.core.SolrCore: [3044_01_17_shard4_replica1] CLOSING

Re: Solr on HDFS: AutoAddReplica does not add a replica

2017-01-13 Thread Chetas Joshi
Erick > > On Thu, Jan 12, 2017 at 8:42 AM, Shawn Heisey <apa...@elyograg.org> wrote: > > On 1/11/2017 7:14 PM, Chetas Joshi wrote: > >> This is what I understand about how Solr works on HDFS. Please correct > me > >> if I am wrong. > >> > >>

Solr on HDFS: AutoAddReplica does not add a replica

2017-01-11 Thread Chetas Joshi
Hello, I have deployed a SolrCloud (solr 5.5.0) on hdfs using cloudera 5.4.7. The cloud has 86 nodes. This is my config for the collection numShards=80 ReplicationFactor=1 maxShardsPerNode=1 autoAddReplica=true I recently decommissioned a node to resolve some disk issues. The shard that was

Re: changing state.json using ZKCLI

2017-01-11 Thread Chetas Joshi
You can get the same functionality out of either, it's a matter of > which one you're more comfortable with. > > Erick > > > > On Tue, Jan 10, 2017 at 11:12 PM, Shawn Heisey <apa...@elyograg.org> > wrote: > > On 1/10/2017 5:28 PM, Chetas Joshi wrote: > >> I

changing state.json using ZKCLI

2017-01-10 Thread Chetas Joshi
Hello, I have got 2 shards having hash range set to null due to some index corruption. I am trying to manually get, edit and put the file. ./zkcli.sh -zkhost ${zkhost} -cmd getfile /collections/colName/state.json ~/colName_state.json ./zkcli.sh -zkhost ${zkhost} -cmd clear

Re: Missing shards/hash range

2017-01-10 Thread Chetas Joshi
Want to add a couple of things 1) Shards were not deleted using the delete replica collection API endpoint. 2) instanceDir and dataDir exist for all 20 shards. On Tue, Jan 10, 2017 at 11:34 AM, Chetas Joshi <chetas.jo...@gmail.com> wrote: > Hello, > > The following is my config

Missing shards/hash range

2017-01-10 Thread Chetas Joshi
Hello, The following is my config Solr 5.5.0 on HDFS (SolrCloud of 25 nodes) collection with shards=20, maxShards per node=1, replicationFactor=1, autoAddReplicas=true The ingestion process had been working fine for the last 3 months. Yesterday, the ingestion process started throwing the

Re: Solr Initialization failure

2017-01-04 Thread Chetas Joshi
? Thanks! On Wed, Jan 4, 2017 at 4:11 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 1/4/2017 1:43 PM, Chetas Joshi wrote: > > while creating a new collection, it fails to spin up solr cores on some > > nodes due to "insufficient direct memory&q

Solr Initialization failure

2017-01-04 Thread Chetas Joshi
Hello, while creating a new collection, it fails to spin up solr cores on some nodes due to "insufficient direct memory". Here is the error: - *3044_01_17_shard42_replica1:* org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: The max direct memory is likely too low.

Re: Solr on HDFS: Streaming API performance tuning

2016-12-19 Thread Chetas Joshi
ch in the 5x branch could produce null pointers if a segment had > no values for a sort field. This is also fixed in the Solr 6x branch. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Sat, Dec 17, 2016 at 2:44 PM, Chetas Joshi <chetas.jo...@gmail.com> &g

Re: Solr on HDFS: Streaming API performance tuning

2016-12-17 Thread Chetas Joshi
rStream.read(CloudSolrStream.java:353) Thanks! On Fri, Dec 16, 2016 at 11:45 PM, Reth RM <reth.ik...@gmail.com> wrote: > If you could provide the json parse exception stack trace, it might help to > predict issue there. > > > On Fri, Dec 16, 2016 at 5:52 PM, Chetas Joshi <chetas.j

Re: Solr on HDFS: Streaming API performance tuning

2016-12-16 Thread Chetas Joshi
have been throwing exceptions because the JSON > special characters were not escaped. This was fixed in Solr 6.0. > > > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Fri, Dec 16, 2016 at 4:34 PM, Chetas Joshi <chetas.jo...@gmail.com> >

Solr on HDFS: Streaming API performance tuning

2016-12-16 Thread Chetas Joshi
Hello, I am running Solr 5.5.0. It is a solrCloud of 50 nodes and I have the following config for all the collections. maxShardsperNode: 1 replicationFactor: 1 I was using Streaming API to get back results from Solr. It worked fine for a while until the index data size reached beyond 40 GB per

Re: Solr on HDFS: increase in query time with increase in data

2016-12-16 Thread Chetas Joshi
? Thanks! On Fri, Dec 16, 2016 at 6:52 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 12/14/2016 11:58 AM, Chetas Joshi wrote: > > I am running Solr 5.5.0 on HDFS. It is a solrCloud of 50 nodes and I have > > the following config. > > maxShardsperNode: 1 > > re

Solr on HDFS: increase in query time with increase in data

2016-12-14 Thread Chetas Joshi
Hi everyone, I am running Solr 5.5.0 on HDFS. It is a solrCloud of 50 nodes and I have the following config. maxShardsperNode: 1 replicationFactor: 1 I have been ingesting data into Solr for the last 3 months. With increase in data, I am observing increase in the query time. Currently the size

Re: CloudSolrClient$RouteException: Cannot talk to ZooKeeper - Updates are disabled.

2016-11-21 Thread Chetas Joshi
cs is assembled and > returned to the client. > - this sucks up bandwidth and resources > - that's bad enough, but especially if your ZK nodes are on the same > box as your Solr nodes they're even more like to have a timeout issue. > > > Best, > Erick > > On Fri, Nov 18, 2016

Re: CloudSolrClient$RouteException: Cannot talk to ZooKeeper - Updates are disabled.

2016-11-18 Thread Chetas Joshi
is that you have too much going on > somehow and you're overloading your system and > getting a timeout. So increasing the timeout > is definitely a possibility, or reducing the ingestion load > as a test. > > Best, > Erick > > On Fri, Nov 18, 2016 at 4:51 PM, Chetas Joshi

CloudSolrClient$RouteException: Cannot talk to ZooKeeper - Updates are disabled.

2016-11-18 Thread Chetas Joshi
Hi, I have a SolrCloud (on HDFS) of 50 nodes and a ZK quorum of 5 nodes. The SolrCloud is having difficulties talking to ZK when I am ingesting data into the collections. At that time I am also running queries (that return millions of docs). The ingest job is crying with the the following

Re: index dir of core xxx is already locked.

2016-11-16 Thread Chetas Joshi
res are pointing at the > same data directory. Whichever one gets there first will block any > later cores with the > message you see. So check your core.properties files and your HDFS magic > to see > how this is occurring would be my first guess. > > Best, > Erick > > On

index dir of core xxx is already locked.

2016-11-16 Thread Chetas Joshi
Hi, I have a SolrCloud (on HDFS) of 52 nodes. I have 3 collections each with 50 shards and maxShards per node for every collection is 1. I am having problem restarting a solr shard for a collection. When I restart, there is always a particular shard of a particular collection that remains down.

Re: Sorl shards: very sensitive to swap space usage !?

2016-11-14 Thread Chetas Joshi
Thanks everyone! The discussion is really helpful. Hi Toke, can you explain exactly what you mean by "the aggressive IO for the memory mapping caused the kernel to start swapping parts of the JVM heap to get better caching of storage data"? Which JVM are you talking about? Solr shard? I have

Re: Parallelize Cursor approach

2016-11-14 Thread Chetas Joshi
bypass that entirely, just form N queries that were > restricted to N disjoint subsets of the data and process them all in > parallel, either with /export or /select. > > Best, > Erick > > On Mon, Nov 14, 2016 at 3:53 PM, Chetas Joshi <chetas.jo...@gmail.com> > wrote: &g

Re: Parallelize Cursor approach

2016-11-14 Thread Chetas Joshi
exceptions. This was fixed in Solr 6.0. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > On Tue, Nov 8, 2016 at 6:17 PM, Erick Erickson <erickerick...@gmail.com> > > wrote: > > > >> Hmm, that should work fine. Let us know what t

Sorl shards: very sensitive to swap space usage !?

2016-11-10 Thread Chetas Joshi
Hi, I have a SolrCloud (Solr 5.5.0) of 50 nodes. The JVM heap memory usage of my solr shards is never more than 50% of the total heap. However, the hosts on which my solr shards are deployed often run into 99% swap space issue. This causes the solr shards go down. Why solr shards are so sensitive

Re: Parallelize Cursor approach

2016-11-08 Thread Chetas Joshi
hich are held in MMapDirectory space > so will be much, much faster. As of Solr 5.5. You can override the > decompression stuff, see: > https://issues.apache.org/jira/browse/SOLR-8220 for fields that are > both stored and docvalues... > > Best, > Erick > > On Sat, Nov 5, 2016 at 6:41 PM,

Re: Re-register a deleted Collection SorlCloud

2016-11-08 Thread Chetas Joshi
ection1_shard1_replica1 as long as > the collection1_shard# parts match you should be fine. If this isn't > done correctly, the symptom will be that when you update an existing > document, you may have two copies returned eventually. > > Best, > Erick > > On Mon, Nov 7, 2016 at

Re: Re-register a deleted Collection SorlCloud

2016-11-07 Thread Chetas Joshi
se ADDREPLICA to expand your collection, that'll handle the > copying from the leader correctly. > > Best, > Erick > > On Mon, Nov 7, 2016 at 12:49 PM, Chetas Joshi <chetas.jo...@gmail.com> > wrote: > > I have a Solr Cloud deployed on top of HDFS. > > > > I

Re-register a deleted Collection SorlCloud

2016-11-07 Thread Chetas Joshi
I have a Solr Cloud deployed on top of HDFS. I accidentally deleted a collection using the collection API. So, ZooKeeper cluster has lost all the info related to that collection. I don't have a backup that I can restore from. However, I have indices and transaction logs on HDFS. If I create a

Re: Parallelize Cursor approach

2016-11-05 Thread Chetas Joshi
wrote: > > No, you can't get cursor-marks ahead of time. > > They are the serialized representation of the last sort values > > encountered (hence not known ahead of time). > > > > -Yonik > > > > > > On Fri, Nov 4, 2016 at 8:48 PM, Chetas Joshi <chetas.jo

Parallelize Cursor approach

2016-11-04 Thread Chetas Joshi
Hi, I am using the cursor approach to fetch results from Solr (5.5.0). Most of my queries return millions of results. Is there a way I can read the pages in parallel? Is there a way I can get all the cursors well in advance? Let's say my query returns 2M documents and I have set rows=100,000.

autoAddReplicas:true not working

2016-10-24 Thread Chetas Joshi
Hello, I have the following configuration for the Solr cloud and a Solr collection This is Solr on HDFS and Solr version I am using is 5.5.0 No. of hosts: 52 (Solr Cloud) shard count: 50 replicationFactor: 1 MaxShardsPerNode: 1 autoAddReplicas: true Now, one of my

Re: /export handler to stream data using CloudSolrStream: JSONParse Exception

2016-10-21 Thread Chetas Joshi
Just to the add to my previous question: I used dynamic shard splitting while consuming data from the Solr collection using /export handler. On Fri, Oct 21, 2016 at 2:27 PM, Chetas Joshi <chetas.jo...@gmail.com> wrote: > Thanks Joel. > > I will migrate to Solr 6.0.0. > > How

Re: /export handler to stream data using CloudSolrStream: JSONParse Exception

2016-10-21 Thread Chetas Joshi
r handling in Solr 6 Streaming Expressions. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Thu, Oct 20, 2016 at 5:49 PM, Chetas Joshi <chetas.jo...@gmail.com> > wrote: > > > Hello, > > > > I am using /export handler to stream data using Cloud

Re: For TTL, does expirationFieldName need to be indexed?

2016-10-20 Thread Chetas Joshi
You just need to have indexed=true. It will use the inverted index to delete the expired documents. You don't need stored=true as all the info required by the DocExpirationUpdateProcessorFactory to delete a document is there in the inverted index. On Thu, Oct 20, 2016 at 4:26 PM, Brent

/export handler to stream data using CloudSolrStream: JSONParse Exception

2016-10-20 Thread Chetas Joshi
Hello, I am using /export handler to stream data using CloudSolrStream. I am using fl=uuid,space,timestamp where uuid and space are Strings and timestamp is long. My query (q=...) is not on these fields. While reading the results from the Solr cloud, I get the following errors

Re: Solr on HDFS: adding a shard replica

2016-09-13 Thread Chetas Joshi
Is this happening because I have set replicationFactor=1? So even if I manually add replica for the shard that's down, it will just create a dataDir but would not copy any of the data into the dataDir? On Tue, Sep 13, 2016 at 6:07 PM, Chetas Joshi <chetas.jo...@gmail.com> wrote: > Hi, &

Solr on HDFS: adding a shard replica

2016-09-13 Thread Chetas Joshi
Hi, I just started experimenting with solr cloud. I have a solr cloud of 20 nodes. I have one collection with 18 shards running on 18 different nodes with replication factor=1. When one of my shards goes down, I create a replica using the Solr UI. On HDFS I see a core getting added. But the