Well it seems that doing q="network se*" is working but not in the way you expect. Doing this q="network se*" would not trigger a prefix query and the "*" character would be treated as any character. I suspect that your query is in fact "network se" (assuming you're using a StandardTokenizer) and that the word "se" is very popular in your documents. That would explain the slow response time. Bottom line is that doing "network se*" will not trigger prefix query at all (I may be wrong but this is the expected behaviour for Solr up to 4.3).
2015-11-02 13:47 GMT+01:00 Modassar Ather <modather1...@gmail.com>: > The problem is with the same query as phrase. q="network se*". > > The last . is fullstops for the sentence and the query is q=field:"network > se*" > > Best, > Modassar > > On Mon, Nov 2, 2015 at 6:10 PM, jim ferenczi <jim.feren...@gmail.com> > wrote: > > > Oups I did not read the thread carrefully. > > *The problem is with the same query as phrase. q="network se*".* > > I was not aware that you could do that with Solr ;). I would say this is > > expected because in such case if the number of expansions for "se*" is > big > > then you would have to check the positions for a significant words. I > don't > > know if there is a limitation in the number of expansions for a prefix > > query contained into a phrase query but I would look at this parameter > > first (limit the number of expansion per prefix search, let's say the N > > most significant words based on the frequency of the words for instance). > > > > 2015-11-02 13:36 GMT+01:00 jim ferenczi <jim.feren...@gmail.com>: > > > > > > > > > > > > > > *I am not able to get the above point. So when I start Solr with 28g > > RAM, > > > for all the activities related to Solr it should not go beyond 28g. And > > the > > > remaining heap will be used for activities other than Solr. Please help > > me > > > understand.* > > > > > > Well those 28GB of heap are the memory "reserved" for your Solr > > > application, though some parts of the index (not to say all) are > > retrieved > > > via MMap (if you use the default MMapDirectory) which do not use the > heap > > > at all. This is a very important part of Lucene/Solr, the heap should > be > > > sized in a way that let a significant amount of RAM available for the > > > index. If not then you rely on the speed of your disk, if you have SSDs > > > it's better but reads are still significantly slower with SSDs than > with > > > direct RAM access. Another thing to keep in mind is that mmap will > always > > > tries to put things in RAM, this is why I suspect that you swap > activity > > is > > > killing your performance. > > > > > > 2015-11-02 11:55 GMT+01:00 Modassar Ather <modather1...@gmail.com>: > > > > > >> Thanks Jim for your response. > > >> > > >> The remaining size after you removed the heap usage should be reserved > > for > > >> the index (not only the other system activities). > > >> I am not able to get the above point. So when I start Solr with 28g > > RAM, > > >> for all the activities related to Solr it should not go beyond 28g. > And > > >> the > > >> remaining heap will be used for activities other than Solr. Please > help > > me > > >> understand. > > >> > > >> *Also the CPU utilization goes upto 400% in few of the nodes:* > > >> You said that only machine is used so I assumed that 400% cpu is for a > > >> single process (one solr node), right ? > > >> Yes you are right that 400% is for single process. > > >> The disks are SSDs. > > >> > > >> Regards, > > >> Modassar > > >> > > >> On Mon, Nov 2, 2015 at 4:09 PM, jim ferenczi <jim.feren...@gmail.com> > > >> wrote: > > >> > > >> > *if it correlates with the bad performance you're seeing. One > > important > > >> > thing to notice is that a significant part of your index needs to be > > in > > >> RAM > > >> > (especially if you're using SSDs) in order to achieve good > > performance.* > > >> > > > >> > Especially if you're not using SSDs, sorry ;) > > >> > > > >> > 2015-11-02 11:38 GMT+01:00 jim ferenczi <jim.feren...@gmail.com>: > > >> > > > >> > > 12 shards with 28GB for the heap and 90GB for each index means > that > > >> you > > >> > > need at least 336GB for the heap (assuming you're using all of it > > >> which > > >> > may > > >> > > be easily the case considering the way the GC is handling memory) > > and > > >> ~= > > >> > > 1TO for the index. Let's say that you don't need your entire index > > in > > >> > RAM, > > >> > > the problem as I see it is that you don't have enough RAM for your > > >> index > > >> > + > > >> > > heap. Assuming your machine has 370GB of RAM there are only 34GB > > left > > >> for > > >> > > your index, 1TO/34GB means that you can only have 1/30 of your > > entire > > >> > index > > >> > > in RAM. I would advise you to check the swap activity on the > machine > > >> and > > >> > > see if it correlates with the bad performance you're seeing. One > > >> > important > > >> > > thing to notice is that a significant part of your index needs to > be > > >> in > > >> > RAM > > >> > > (especially if you're using SSDs) in order to achieve good > > >> performance: > > >> > > > > >> > > > > >> > > > > >> > > *As mentioned above this is a big machine with 370+ gb of RAM and > > Solr > > >> > (12 > > >> > > nodes total) is assigned 336 GB. The rest is still a good for > other > > >> > system > > >> > > activities.* > > >> > > The remaining size after you removed the heap usage should be > > reserved > > >> > for > > >> > > the index (not only the other system activities). > > >> > > > > >> > > > > >> > > *Also the CPU utilization goes upto 400% in few of the nodes:* > > >> > > You said that only machine is used so I assumed that 400% cpu is > > for a > > >> > > single process (one solr node), right ? > > >> > > This seems impossible if you are sure that only one query is > played > > >> at a > > >> > > time and no indexing is performed. Best thing to do is to dump > stack > > >> > trace > > >> > > of the solr nodes during the query and to check what the threads > are > > >> > doing. > > >> > > > > >> > > Jim > > >> > > > > >> > > > > >> > > > > >> > > 2015-11-02 10:38 GMT+01:00 Modassar Ather <modather1...@gmail.com > >: > > >> > > > > >> > >> Just to add one more point that one external Zookeeper instance > is > > >> also > > >> > >> running on this particular machine. > > >> > >> > > >> > >> Regards, > > >> > >> Modassar > > >> > >> > > >> > >> On Mon, Nov 2, 2015 at 2:34 PM, Modassar Ather < > > >> modather1...@gmail.com> > > >> > >> wrote: > > >> > >> > > >> > >> > Hi Toke, > > >> > >> > Thanks for your response. My comments in-line. > > >> > >> > > > >> > >> > That is 12 machines, running a shard each? > > >> > >> > No! This is a single big machine with 12 shards on it. > > >> > >> > > > >> > >> > What is the total amount of physical memory on each machine? > > >> > >> > Around 370 gb on the single machine. > > >> > >> > > > >> > >> > Well, se* probably expands to a great deal of documents, but a > > huge > > >> > bump > > >> > >> > in memory utilization and 3 minutes+ sounds strange. > > >> > >> > > > >> > >> > - What are your normal query times? > > >> > >> > Few simple queries are returned with in a couple of seconds. > But > > >> the > > >> > >> more > > >> > >> > complex queries with proximity and wild cards have taken more > > than > > >> 3-4 > > >> > >> > minutes and some times some queries have timed out too where > time > > >> out > > >> > is > > >> > >> > set to 5 minutes. > > >> > >> > - How many hits do you get from 'network se*'? > > >> > >> > More than a million records. > > >> > >> > - How many results do you return (the rows-parameter)? > > >> > >> > It is the default one 10. Grouping is enabled on a field. > > >> > >> > - If you issue a query without wildcards, but with > approximately > > >> the > > >> > >> > same amount of hits as 'network se*', how long does it take? > > >> > >> > A query resulting in around half a million record return > within a > > >> > couple > > >> > >> > of seconds. > > >> > >> > > > >> > >> > That is strange, yes. Have you checked the logs to see if > > something > > >> > >> > unexpected is going on while you test? > > >> > >> > Have not seen anything particularly. Will try to check again. > > >> > >> > > > >> > >> > If you are using spinning drives and only have 32GB of RAM in > > >> total in > > >> > >> > each machine, you are probably struggling just to keep things > > >> running. > > >> > >> > As mentioned above this is a big machine with 370+ gb of RAM > and > > >> Solr > > >> > >> (12 > > >> > >> > nodes total) is assigned 336 GB. The rest is still a good for > > other > > >> > >> system > > >> > >> > activities. > > >> > >> > > > >> > >> > Thanks, > > >> > >> > Modassar > > >> > >> > > > >> > >> > On Mon, Nov 2, 2015 at 1:30 PM, Toke Eskildsen < > > >> > t...@statsbiblioteket.dk> > > >> > >> > wrote: > > >> > >> > > > >> > >> >> On Mon, 2015-11-02 at 12:00 +0530, Modassar Ather wrote: > > >> > >> >> > I have a setup of 12 shard cluster started with 28gb memory > > each > > >> > on a > > >> > >> >> > single server. There are no replica. The size of index is > > around > > >> > >> 90gb on > > >> > >> >> > each shard. The Solr version is 5.2.1. > > >> > >> >> > > >> > >> >> That is 12 machines, running a shard each? > > >> > >> >> > > >> > >> >> What is the total amount of physical memory on each machine? > > >> > >> >> > > >> > >> >> > When I query "network se*", the memory utilization goes upto > > >> 24-26 > > >> > gb > > >> > >> >> and > > >> > >> >> > the query takes around 3+ minutes to execute. Also the CPU > > >> > >> utilization > > >> > >> >> goes > > >> > >> >> > upto 400% in few of the nodes. > > >> > >> >> > > >> > >> >> Well, se* probably expands to a great deal of documents, but a > > >> huge > > >> > >> bump > > >> > >> >> in memory utilization and 3 minutes+ sounds strange. > > >> > >> >> > > >> > >> >> - What are your normal query times? > > >> > >> >> - How many hits do you get from 'network se*'? > > >> > >> >> - How many results do you return (the rows-parameter)? > > >> > >> >> - If you issue a query without wildcards, but with > approximately > > >> the > > >> > >> >> same amount of hits as 'network se*', how long does it take? > > >> > >> >> > > >> > >> >> > Why the CPU utilization is so high and more than one core is > > >> used. > > >> > >> >> > As far as I understand querying is single threaded. > > >> > >> >> > > >> > >> >> That is strange, yes. Have you checked the logs to see if > > >> something > > >> > >> >> unexpected is going on while you test? > > >> > >> >> > > >> > >> >> > How can I disable replication(as it is implicitly enabled) > > >> > >> permanently > > >> > >> >> as > > >> > >> >> > in our case we are not using it but can see warnings related > > to > > >> > >> leader > > >> > >> >> > election? > > >> > >> >> > > >> > >> >> If you are using spinning drives and only have 32GB of RAM in > > >> total > > >> > in > > >> > >> >> each machine, you are probably struggling just to keep things > > >> > running. > > >> > >> >> > > >> > >> >> > > >> > >> >> - Toke Eskildsen, State and University Library, Denmark > > >> > >> >> > > >> > >> >> > > >> > >> >> > > >> > >> > > > >> > >> > > >> > > > > >> > > > > >> > > > >> > > > > > > > > >