Re: Query time out. Solr node goes down.

2015-08-18 Thread Modassar Ather
So Toke/Daniel is the node showing *gone* on Solr cloud dashboard is because of GC pause and it is actually not gone but the ZK is not able to get the correct state? The issue is caused by a huge query with many wildcards and phrases in it. If you see I have mentioned about (*The request took too

Re: Query time out. Solr node goes down.

2015-08-18 Thread Modassar Ather
I tried to profile the memory of each solr node. I can see the GC activity going higher as much as 98% and there are many instances where it has gone up at 10+%. In one of the solr node I can see it going to 45%. Memory is fully used and have gone to the maximum usage of heap which is set to 24g.

Re: Solr Caching (documentCache) not working

2015-08-18 Thread Daniel Collins
I think this is expected. As Shawn mentioned, your hard commits have openSearcher=false, so they flush changes to disk, but don't force a re-open of the active searcher. By contrast softCommit, sets openSearcher=true, the point of softCommit is to make the changes visible so do to that you have

Re: Query time out. Solr node goes down.

2015-08-18 Thread Toke Eskildsen
On Tue, 2015-08-18 at 10:38 +0530, Modassar Ather wrote: Kindly help me understand, even if there is a a GC pause why the solr node will go down. If a stop-the-world GC is in progress, it is not possible for an external service to know if this is because a GC is in progress or the node is dead.

Re: Exception while using {!cardinality=1.0}.

2015-08-18 Thread Modassar Ather
Any suggestions please. Regards, Modassar On Thu, Aug 13, 2015 at 4:25 PM, Modassar Ather modather1...@gmail.com wrote: Hi, I am getting following exception for the query : *q=field:querystats=truestats.field={!cardinality=1.0}field*. The exception is not seen once the cardinality is set

Re: Query time out. Solr node goes down.

2015-08-18 Thread Daniel Collins
Ah ok, its ZK timeout then (org.apache.zookeeper.KeeperException$SessionExpiredException) which is because of your GC pause. The page Shawn mentioned earlier has several links on how to investigate GC issues and some common GC settings, sounds like you need to tweak those. Generally speaking, I

Re: Exception while using {!cardinality=1.0}.

2015-08-18 Thread Ahmet Arslan
Hi Modassar, What is this net.agkn.hll.serialization ? Custom plugin or something? Ahmet On Tuesday, August 18, 2015 9:23 AM, Modassar Ather modather1...@gmail.com wrote: Any suggestions please. Regards, Modassar On Thu, Aug 13, 2015 at 4:25 PM, Modassar Ather modather1...@gmail.com wrote:

Re: SOLR to pivot on date range query

2015-08-18 Thread Upayavira
This arrived with the latest 5.1/5.2 Solr, so no, it won't work on 4.4, which is quite old by now. As to how to do it on an older Solr, if you have the ability to do additional work at index time, create and entryDate_month field, which is truncated to the beginning of the month, then do a normal

Re: Solr 4.6.1 Cloud Stops Replication

2015-08-18 Thread Shawn Heisey
On 8/17/2015 10:53 PM, Rallavagu wrote: Also, I have noticed that the memory consumption goes very high. For instance, each node is configured with 48G memory while java heap is configured with 12G. The available physical memory is consumed almost 46G and the heap size is well within the

Re: Query term matches

2015-08-18 Thread Chris Hostetter
https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which

Re: Query time out. Solr node goes down.

2015-08-18 Thread Erick Erickson
bq: The issue is caused by a huge query with many wildcards and phrases in it. Well, the very first thing I'd do is look at whether this is necessary. For instance: leading and trailing wildcards are an anti-pattern. You should investigate using ngrams instead. trailing wildcards usually

Re: SOLR cloud (5.2.1) recovery

2015-08-18 Thread Erick Erickson
First, do not think in terms of cores, think replicas ;). And do not, use the core admin bits of the admin UI to do any SolrCloud-related operations. It's possible, but far too easy to get wrong. Use the collections API instead. Second, 600 collections, assuming all on a single cluster is

Re: Is it a good query performance with this data size ?

2015-08-18 Thread Erick Erickson
Lot of stuff here, let me reply to a few things: If you're faceting on high-cardinality fields, this is expensive. How many unique values are there in the fields you facet on? Note, I am _not_ asking about how many values are in the fields of the selected set, but rather how many values

Re: SOLR to pivot on date range query

2015-08-18 Thread Erick Erickson
Cloudera has back-ported a _bunch_ of Solr JIRAs to their release, so depending on which CDH version you have, the functionality may or may not be there. I suggest you contact Cloudera support to see what's been backported to the version of CDH you're using because it may not be just Solr 4.4.

Re: Is it a good query performance with this data size ?

2015-08-18 Thread wwang525
Hi Erick, Two facets are probably demanding: departure_date have 365 distinct values and hotel_code can have 800 distinct values. The docValues setting definitely helped me a lot even when all the queries had the above two facets. I will test a list of queries with or without the two facets

SOLR cloud (5.2.1) recovery

2015-08-18 Thread Olivier Damiot
hello, i'am a bit confused about how solr cloud recovery is supposed to work exactly in the case of loosing a single node completely. My 600 collections are created with numShards=3replicationFactor=3maxShardsPerNode=3 However, how do i configure a new node to take the place of the dead node,

Re: Is it a good query performance with this data size ?

2015-08-18 Thread Erick Erickson
those are not that high. I was thinking of facets with thousands to tens-of-thousands of unique values. I really wouldn't expect this to be a huge hit unless you're querying all docs. Let us know what you find. Best, Erick On Tue, Aug 18, 2015 at 11:31 AM, wwang525 wwang...@gmail.com wrote: Hi

Stem Words Highlighted - Keyword Not Highlighted

2015-08-18 Thread Ann B
Question: Can I configure solr to highlight the keyword also? The search results are correct, but the highlighting is not complete. * Example: Keyword: stocks Request: (I only provided the url parameters below.) hl=true hl.fl=spell hl.simple.pre=%5BHIGHLIGHT%5D

Re: SOLR does not respond to queries when cores are coming online

2015-08-18 Thread Upayavira
Where is Zookeeper running? Is it running as an independent service on a separate box? Also, 4.0 is very old now - the code has matured a LOT since then. Upayavira On Tue, Aug 18, 2015, at 09:54 PM, Erick Erickson wrote: You might be hitting: https://issues.apache.org/jira/browse/SOLR-7361

SOLR does not respond to queries when cores are coming online

2015-08-18 Thread Gilles Comeau
Hi all, Sorry if this has been asked before, my online searching is not bringing up any answers. If I have two shards on different servers with zookeeper, Core1 and Core2, in a collection that are identical to each other, why won't Core1 return any results while Core2 is starting up? If

Re: Is it a good query performance with this data size ?

2015-08-18 Thread Erick Erickson
bq: can I turn off the three cache and send a lot of queries to Solr I really think you're missing the easiest way to do that. To not put anything in the filter cache, just don't send any fq clauses. As far as the doc cache is concerned, by and large I just wouldn't worry about it. With

Re: Is it a good query performance with this data size ?

2015-08-18 Thread wwang525
Hi Erick, I just tested 10 different queries with or without the faceting search on the two properties : departure_date, and hotel_code. Under cold cache scenario, they have pretty much the same response time, and the faceting took much less time than the query time. Under cold cache scenario,

Re: SOLR does not respond to queries when cores are coming online

2015-08-18 Thread Erick Erickson
You might be hitting: https://issues.apache.org/jira/browse/SOLR-7361 Note that the fix is in the (currently releasing) 5.3 and trunk code, with virtually no possibility of back-porting to 4.0, unfortunately. Best, Erick On Tue, Aug 18, 2015 at 1:19 PM, Gilles Comeau gilles.com...@polecat.com

pre-loaded function-query?

2015-08-18 Thread Paul Libbrecht
Hello Solr experts, I'm writing a query expansion QueryComponent which takes web-app parameters (e.g. profile information) and turns them into a solr query. Thus far I've used lucene TermQuery-ies with success. Now, I would like to use something a bit more elaborate. Either I write it with

Re: Exception while using {!cardinality=1.0}.

2015-08-18 Thread Modassar Ather
Ahmet/Chris! Thanks for your replies. Ahmet I think net.agkn.hll.serialization is used by hll() function implementation of Solr. Chris I will try to create sample data and create a jira ticket with details. Regards, Modassar On Tue, Aug 18, 2015 at 9:58 PM, Chris Hostetter

Re: pre-loaded function-query?

2015-08-18 Thread Chris Hostetter
: My current expansion expands from the :user-query : to the :+user-query favouring-query-depending-other-params overall-favoring-query : (where the overall-favoring-query could be computed as a function). : With the boost parameter, i'd do: :(+user-query

Re: Disable caching

2015-08-18 Thread Jamie Johnson
Hmm...so I think I have things setup correctly, I have a custom QParserPlugin building a custom query that wraps the query built from the base parser and stores the user who is executing the query. I've added the username to the hashCode and equals checks so I think everything is setup properly.

Re: Disable caching

2015-08-18 Thread Yonik Seeley
On Tue, Aug 18, 2015 at 9:51 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks, I'll try to delve into this. We are currently using the parent query parser, within we could use {!secure} I think. Ultimately I would want the solr qparser to actually do the work of parsing and I'd just wrap

Re: Disable caching

2015-08-18 Thread Yonik Seeley
On Tue, Aug 18, 2015 at 7:11 PM, Jamie Johnson jej2...@gmail.com wrote: Yes, my use case is security. Basically I am executing queries with certain auths and when they are executed multiple times with differing auths I'm getting cached results. If it's just simple stuff like top N docs

Re: Disable caching

2015-08-18 Thread Jamie Johnson
when you say a security filter, are you asking if I can express my security constraint as a query? If that is the case then the answer is no. At this point I have a requirement to secure Terms (a nightmare I know). Our fallback is to aggregate the authorizations to a document level and secure

Re: Disable caching

2015-08-18 Thread Yonik Seeley
On Tue, Aug 18, 2015 at 8:19 PM, Jamie Johnson jej2...@gmail.com wrote: when you say a security filter, are you asking if I can express my security constraint as a query? If that is the case then the answer is no. At this point I have a requirement to secure Terms (a nightmare I know). Heh -

Re: Disable caching

2015-08-18 Thread Jamie Johnson
Thanks, I'll try to delve into this. We are currently using the parent query parser, within we could use {!secure} I think. Ultimately I would want the solr qparser to actually do the work of parsing and I'd just wrap that. Are there any examples that I could look at for this? It's not clear

Re: Disable caching

2015-08-18 Thread Yonik Seeley
You can comment out (some) of the caches. There are some caches like field caches that are more at the lucene level and can't be disabled. Can I ask what you are trying to prevent from being cached and why? Different caches are for different things, so it would seem to be an odd usecase to

Re: pre-loaded function-query?

2015-08-18 Thread Paul Libbrecht
Doug Turnbull wrote: I'm not sure if you mean organizing function queries under the hood in a query component or externally. Externally, I've always followed John Berryman's great advice for working with Solr when dealing with complex/reusable function queries and boosts

Re: pre-loaded function-query?

2015-08-18 Thread Doug Turnbull
The boost parameter is part of the edismax query parser. If you have your own query parser you could introduce your own argument boost and interpret it as a value source. Here's the code that parses the external function query in edismax

Re: Disable caching

2015-08-18 Thread Yonik Seeley
On Tue, Aug 18, 2015 at 8:38 PM, Jamie Johnson jej2...@gmail.com wrote: I really like this idea in concept. My query would literally be just a wrapper at that point, what would be the appropriate place to do this? It depends on how much you are trying to make everything transparent (that there

Disable caching

2015-08-18 Thread Jamie Johnson
I see that if Solr is in realtime mode that caching is disable within the SolrIndexSearcher that is created in SolrCore, but is there anyway to disable caching without being in realtime mode? Currently I'm implementing a NoOp cache that implements SolrCache but returns null for everything and

Re: Disable caching

2015-08-18 Thread Jamie Johnson
Yes, my use case is security. Basically I am executing queries with certain auths and when they are executed multiple times with differing auths I'm getting cached results. One option is to have another implementation that has a number of caches based on the auths, something that I suspect we

Re: pre-loaded function-query?

2015-08-18 Thread Doug Turnbull
I'm not sure if you mean organizing function queries under the hood in a query component or externally. Externally, I've always followed John Berryman's great advice for working with Solr when dealing with complex/reusable function queries and boosts

Re: Disable caching

2015-08-18 Thread Jamie Johnson
I really like this idea in concept. My query would literally be just a wrapper at that point, what would be the appropriate place to do this? What would I need to do to the query to make it behave with the cache. Again thanks for the idea, I think this could be a simple way to use the caches.

jetty.xml

2015-08-18 Thread William Bell
We sometimes get a spike in Solr, and we get like 3K of threads and then timeouts... In Solr 5.2.1 the defult jetty settings is kinda crazy for threads - since the value is HIGH! What do others recommend? Fusion jetty settings for Threads: Get name=ThreadPool Set name=minThreads

Performance issue with FILTER QUERY

2015-08-18 Thread Maulin Rathod
Hi, http://stackoverflow.com/questions/11627427/solr-query-q-or-filter-query-fq As per above link it suggests to use Filter Query but we observed Filter Query is slower than Normal Query in our case. Are we doing something wrong? SLOW WITH FILTER QUERY (takes more than 1 second)

Re: Solr Matched Terms

2015-08-18 Thread Mikhail Khludnev
Hello, I just wonder what's wrong with highlighting? On Tue, Aug 18, 2015 at 4:19 PM, Basheer Shaik shaikb...@hotmail.com wrote: Hi, I am new to Solr. We have a requirement to carry out fuzzy search. I am able to do this and figure out the documents that meet the fuzzy search criteria. Is

Solr cache for specific field

2015-08-18 Thread Norgorn
SOLR version - 4.10.3 We have SOLR Cloud cluster, each node has documents only for several categories. Queries look like ...fq=cat(1 3 89 ...)... So, only some nodes need to process, others can answer with zero as soon as they check cat. The problem is to keep separate cache for cat values on

Solr Matched Terms

2015-08-18 Thread Basheer Shaik
Hi, I am new to Solr. We have a requirement to carry out fuzzy search. I am able to do this and figure out the documents that meet the fuzzy search criteria. Is there a way to find out the list of terms from each selected document that matched this search criteria? Appreciate any help on this.

Re: Solr cache for specific field

2015-08-18 Thread Mikhail Khludnev
Solr Cloud Document Routing described at https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud allows you to omit hitting certain shards, but they need to be assigned with the different prefixes beforehand. Do I get your point right? On Tue, Aug 18, 2015 at 4:57

Re: Solr cache for specific field

2015-08-18 Thread Norgorn
I'm sorry for being so unclear. The problem is in speed - while node holds only several cats, it can answer with numFound=0, if these cats are missed in query. It looks like: node 1 - cats 1,2,3 node 2 - cats 3,4,5 node 3 - cats 50,70 ... Query q=cat:(1 4) QTime per node now is like node1 -

Re: Solr 4.6.1 Cloud Stops Replication

2015-08-18 Thread Rallavagu
Thanks for the response. Does this cache behavior influence the delay in catching up with cloud? How can we explain solr cloud replication and what are the option to monitor and take proactive action (such as initializing, pausing etc) if needed? On 8/18/15 5:57 AM, Shawn Heisey wrote: On

Re: Solr Caching (documentCache) not working

2015-08-18 Thread Shawn Heisey
On 8/18/2015 2:30 AM, Daniel Collins wrote: I think this is expected. As Shawn mentioned, your hard commits have openSearcher=false, so they flush changes to disk, but don't force a re-open of the active searcher. By contrast softCommit, sets openSearcher=true, the point of softCommit is to

Re: Solr Matched Terms

2015-08-18 Thread Scott Derrick
I second that question! Inquiring minds want to know! On 8/18/2015 7:19 AM, Basheer Shaik wrote: Hi, I am new to Solr. We have a requirement to carry out fuzzy search. I am able to do this and figure out the documents that meet the fuzzy search criteria. Is there a way to find out the list of

Re: Solr cache for specific field

2015-08-18 Thread Alexandre Rafalovitch
I am not sure I understand the problem statement. Is it speed? Memory usage? Something very specific about SolrCloud? To me it seems the problem is that your 'fq' _are_ getting cached when you may not want them as the list is different every time. You could disable that cache. Or you could try

Re: Solr cache for specific field

2015-08-18 Thread Alexandre Rafalovitch
Have you tried this with Cache=false? https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters Because the internal representation of the field value already may be doing what you want. And the caching of non-repeating filters is what slowing it down. I would just do that as a

Re: Solr 4.6.1 Cloud Stops Replication

2015-08-18 Thread Rallavagu
Thanks Shawn. All participating cloud nodes are running Tomcat and as you suggested will review the number of threads and increase them as needed. Essentially, what I have noticed was that two of four nodes caught up with bulk updates instantly while other two nodes took almost 3 hours to

Re: Cache

2015-08-18 Thread Yonik Seeley
On Tue, Aug 18, 2015 at 12:23 PM, naga sharathrayapati sharathrayap...@gmail.com wrote: Is it possible to clear the cache through query? I need this for performance valuation. No, but you can prevent a query from being cached: q={!cache=false}my query What are you trying to test the

Cache

2015-08-18 Thread naga sharathrayapati
Is it possible to clear the cache through query? I need this for performance valuation.

Re: Exception while using {!cardinality=1.0}.

2015-08-18 Thread Chris Hostetter
: I am getting following exception for the query : : *q=field:querystats=truestats.field={!cardinality=1.0}field*. The : exception is not seen once the cardinality is set to 0.9 or less. : The field is *docValues enabled* and *indexed=false*. The same exception : I tried to reproduce on non

Re: Solr 4.6.1 Cloud Stops Replication

2015-08-18 Thread Erick Erickson
Couple of things: 1 Here's an excellent backgrounder for MMapDirectory, which is what makes it appear that Solr is consuming all the physical memory http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html 2 It's possible that your transaction log was huge. Perhaps not

Re: Solr 4.6.1 Cloud Stops Replication

2015-08-18 Thread Rallavagu
Thanks for the response. Will take a look into using cloud solr server for updates and review tlog mechanism. On 8/18/15 9:29 AM, Erick Erickson wrote: Couple of things: 1 Here's an excellent backgrounder for MMapDirectory, which is what makes it appear that Solr is consuming all the physical

Re: Solr 4.6.1 Cloud Stops Replication

2015-08-18 Thread Shawn Heisey
On 8/18/2015 8:18 AM, Rallavagu wrote: Thanks for the response. Does this cache behavior influence the delay in catching up with cloud? How can we explain solr cloud replication and what are the option to monitor and take proactive action (such as initializing, pausing etc) if needed? I don't

Re: Solr Matched Terms

2015-08-18 Thread Jack Krupansky
Maybe a specialized highlighter could be produced that simply lists the matched terms in a form that apps can easily consume. -- Jack Krupansky On Tue, Aug 18, 2015 at 11:11 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, I just wonder what's wrong with highlighting? On Tue,

Re: Solr Matched Terms

2015-08-18 Thread Basheer Shaik
I did try Highlighting, but it is highlighting only those words which are part of the query, not the matching phrase. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Matched-Terms-tp4223649p4223688.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr cache for specific field

2015-08-18 Thread Shawn Heisey
On 8/18/2015 7:21 AM, Norgorn wrote: SOLR version - 4.10.3 We have SOLR Cloud cluster, each node has documents only for several categories. Queries look like ...fq=cat(1 3 89 ...)... So, only some nodes need to process, others can answer with zero as soon as they check cat. The problem is

Re: Solr Matched Terms

2015-08-18 Thread simon
Check out https://issues.apache.org/jira/browse/SOLR-4722, which will return matching terms (and their offsets). Patch can be applied cleanly to Solr 4; doesn't appear to have been tried with Solr 5 -Simon On Tue, Aug 18, 2015 at 11:30 AM, Jack Krupansky jack.krupan...@gmail.com wrote: Maybe a

Is it a good query performance with this data size ?

2015-08-18 Thread wwang525
Hi All, I am working on a search service based on Solr (v5.1.0). The data size is 15 M records. The size of the index files is 860MB. The test was performed on a local machine that has 8 cores with 32 G memory and CPU is 3.4Ghz (Intel Core i7-3770). I found out that setting docValues=true for