So Toke/Daniel is the node showing *gone* on Solr cloud dashboard is
because of GC pause and it is actually not gone but the ZK is not able to
get the correct state?
The issue is caused by a huge query with many wildcards and phrases in it.
If you see I have mentioned about (*The request took too
I tried to profile the memory of each solr node. I can see the GC activity
going higher as much as 98% and there are many instances where it has gone
up at 10+%. In one of the solr node I can see it going to 45%.
Memory is fully used and have gone to the maximum usage of heap which is
set to 24g.
I think this is expected. As Shawn mentioned, your hard commits have
openSearcher=false, so they flush changes to disk, but don't force a
re-open of the active searcher.
By contrast softCommit, sets openSearcher=true, the point of softCommit is
to make the changes visible so do to that you have
On Tue, 2015-08-18 at 10:38 +0530, Modassar Ather wrote:
Kindly help me understand, even if there is a a GC pause why the solr node
will go down.
If a stop-the-world GC is in progress, it is not possible for an
external service to know if this is because a GC is in progress or the
node is dead.
Any suggestions please.
Regards,
Modassar
On Thu, Aug 13, 2015 at 4:25 PM, Modassar Ather modather1...@gmail.com
wrote:
Hi,
I am getting following exception for the query :
*q=field:querystats=truestats.field={!cardinality=1.0}field*. The
exception is not seen once the cardinality is set
Ah ok, its ZK timeout then
(org.apache.zookeeper.KeeperException$SessionExpiredException)
which is because of your GC pause.
The page Shawn mentioned earlier has several links on how to investigate GC
issues and some common GC settings, sounds like you need to tweak those.
Generally speaking, I
Hi Modassar,
What is this net.agkn.hll.serialization ? Custom plugin or something?
Ahmet
On Tuesday, August 18, 2015 9:23 AM, Modassar Ather modather1...@gmail.com
wrote:
Any suggestions please.
Regards,
Modassar
On Thu, Aug 13, 2015 at 4:25 PM, Modassar Ather modather1...@gmail.com
wrote:
This arrived with the latest 5.1/5.2 Solr, so no, it won't work on 4.4,
which is quite old by now.
As to how to do it on an older Solr, if you have the ability to do
additional work at index time, create and entryDate_month field, which
is truncated to the beginning of the month, then do a normal
On 8/17/2015 10:53 PM, Rallavagu wrote:
Also, I have noticed that the memory consumption goes very high. For
instance, each node is configured with 48G memory while java heap is
configured with 12G. The available physical memory is consumed almost
46G and the heap size is well within the
https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even if you change the
subject line of your email, other mail headers still track which
bq: The issue is caused by a huge query with many wildcards and phrases in it.
Well, the very first thing I'd do is look at whether this is necessary.
For instance:
leading and trailing wildcards are an anti-pattern. You should investigate
using ngrams instead.
trailing wildcards usually
First, do not think in terms of cores, think replicas ;). And do not,
use the core admin bits of the admin UI to do any SolrCloud-related
operations. It's possible, but far too easy to get wrong.
Use the collections API instead.
Second, 600 collections, assuming all on a single cluster is
Lot of stuff here, let me reply to a few things:
If you're faceting on high-cardinality fields, this is expensive.
How many unique values are there in the fields you facet on?
Note, I am _not_ asking about how many values are in the fields
of the selected set, but rather how many values
Cloudera has back-ported a _bunch_ of Solr JIRAs to their
release, so depending on which CDH version you have, the
functionality may or may not be there. I suggest you contact
Cloudera support to see what's been backported to the
version of CDH you're using because it may not be just Solr
4.4.
Hi Erick,
Two facets are probably demanding:
departure_date have 365 distinct values and hotel_code can have 800 distinct
values.
The docValues setting definitely helped me a lot even when all the queries
had the above two facets. I will test a list of queries with or without the
two facets
hello,
i'am a bit confused about how solr cloud recovery is supposed to work
exactly in the case of loosing a single node completely.
My 600 collections are created with
numShards=3replicationFactor=3maxShardsPerNode=3
However, how do i configure a new node to take the place of the dead
node,
those are not that high. I was thinking of facets with thousands to
tens-of-thousands of unique values. I really wouldn't expect this to
be a huge hit unless you're querying all docs.
Let us know what you find.
Best,
Erick
On Tue, Aug 18, 2015 at 11:31 AM, wwang525 wwang...@gmail.com wrote:
Hi
Question:
Can I configure solr to highlight the keyword also? The search results are
correct, but the highlighting is not complete.
*
Example:
Keyword: stocks
Request: (I only provided the url parameters below.)
hl=true
hl.fl=spell
hl.simple.pre=%5BHIGHLIGHT%5D
Where is Zookeeper running? Is it running as an independent service on a
separate box?
Also, 4.0 is very old now - the code has matured a LOT since then.
Upayavira
On Tue, Aug 18, 2015, at 09:54 PM, Erick Erickson wrote:
You might be hitting: https://issues.apache.org/jira/browse/SOLR-7361
Hi all,
Sorry if this has been asked before, my online searching is not bringing up any
answers.
If I have two shards on different servers with zookeeper, Core1 and Core2, in a
collection that are identical to each other, why won't Core1 return any results
while Core2 is starting up? If
bq: can I turn off the three cache and send a lot of queries to Solr
I really think you're missing the easiest way to do that.
To not put anything in the filter cache, just don't send any fq clauses.
As far as the doc cache is concerned, by and large I just wouldn't
worry about it. With
Hi Erick,
I just tested 10 different queries with or without the faceting search on
the two properties : departure_date, and hotel_code. Under cold cache
scenario, they have pretty much the same response time, and the faceting
took much less time than the query time. Under cold cache scenario,
You might be hitting: https://issues.apache.org/jira/browse/SOLR-7361
Note that the fix is in the (currently releasing) 5.3 and trunk code,
with virtually no possibility of back-porting to 4.0, unfortunately.
Best,
Erick
On Tue, Aug 18, 2015 at 1:19 PM, Gilles Comeau
gilles.com...@polecat.com
Hello Solr experts,
I'm writing a query expansion QueryComponent which takes web-app
parameters (e.g. profile information) and turns them into a solr query.
Thus far I've used lucene TermQuery-ies with success.
Now, I would like to use something a bit more elaborate. Either I write
it with
Ahmet/Chris! Thanks for your replies.
Ahmet I think net.agkn.hll.serialization is used by hll() function
implementation of Solr.
Chris I will try to create sample data and create a jira ticket with
details.
Regards,
Modassar
On Tue, Aug 18, 2015 at 9:58 PM, Chris Hostetter
: My current expansion expands from the
:user-query
: to the
:+user-query favouring-query-depending-other-params overall-favoring-query
: (where the overall-favoring-query could be computed as a function).
: With the boost parameter, i'd do:
:(+user-query
Hmm...so I think I have things setup correctly, I have a custom
QParserPlugin building a custom query that wraps the query built from the
base parser and stores the user who is executing the query. I've added the
username to the hashCode and equals checks so I think everything is setup
properly.
On Tue, Aug 18, 2015 at 9:51 PM, Jamie Johnson jej2...@gmail.com wrote:
Thanks, I'll try to delve into this. We are currently using the parent
query parser, within we could use {!secure} I think. Ultimately I would
want the solr qparser to actually do the work of parsing and I'd just wrap
On Tue, Aug 18, 2015 at 7:11 PM, Jamie Johnson jej2...@gmail.com wrote:
Yes, my use case is security. Basically I am executing queries with
certain auths and when they are executed multiple times with differing
auths I'm getting cached results.
If it's just simple stuff like top N docs
when you say a security filter, are you asking if I can express my security
constraint as a query? If that is the case then the answer is no. At this
point I have a requirement to secure Terms (a nightmare I know). Our
fallback is to aggregate the authorizations to a document level and secure
On Tue, Aug 18, 2015 at 8:19 PM, Jamie Johnson jej2...@gmail.com wrote:
when you say a security filter, are you asking if I can express my security
constraint as a query? If that is the case then the answer is no. At this
point I have a requirement to secure Terms (a nightmare I know).
Heh -
Thanks, I'll try to delve into this. We are currently using the parent
query parser, within we could use {!secure} I think. Ultimately I would
want the solr qparser to actually do the work of parsing and I'd just wrap
that. Are there any examples that I could look at for this? It's not
clear
You can comment out (some) of the caches.
There are some caches like field caches that are more at the lucene
level and can't be disabled.
Can I ask what you are trying to prevent from being cached and why?
Different caches are for different things, so it would seem to be an
odd usecase to
Doug Turnbull wrote:
I'm not sure if you mean organizing function queries under the hood in a
query component or externally.
Externally, I've always followed John Berryman's great advice for working
with Solr when dealing with complex/reusable function queries and boosts
The boost parameter is part of the edismax query parser. If you have your
own query parser you could introduce your own argument boost and
interpret it as a value source. Here's the code that parses the external
function query in edismax
On Tue, Aug 18, 2015 at 8:38 PM, Jamie Johnson jej2...@gmail.com wrote:
I really like this idea in concept. My query would literally be just a
wrapper at that point, what would be the appropriate place to do this?
It depends on how much you are trying to make everything transparent
(that there
I see that if Solr is in realtime mode that caching is disable within the
SolrIndexSearcher that is created in SolrCore, but is there anyway to
disable caching without being in realtime mode? Currently I'm implementing
a NoOp cache that implements SolrCache but returns null for everything and
Yes, my use case is security. Basically I am executing queries with
certain auths and when they are executed multiple times with differing
auths I'm getting cached results. One option is to have another
implementation that has a number of caches based on the auths, something
that I suspect we
I'm not sure if you mean organizing function queries under the hood in a
query component or externally.
Externally, I've always followed John Berryman's great advice for working
with Solr when dealing with complex/reusable function queries and boosts
I really like this idea in concept. My query would literally be just a
wrapper at that point, what would be the appropriate place to do this?
What would I need to do to the query to make it behave with the cache.
Again thanks for the idea, I think this could be a simple way to use the
caches.
We sometimes get a spike in Solr, and we get like 3K of threads and then
timeouts...
In Solr 5.2.1 the defult jetty settings is kinda crazy for threads - since
the value is HIGH!
What do others recommend?
Fusion jetty settings for Threads:
Get name=ThreadPool
Set name=minThreads
Hi,
http://stackoverflow.com/questions/11627427/solr-query-q-or-filter-query-fq
As per above link it suggests to use Filter Query but we observed Filter Query
is slower than Normal Query in our case. Are we doing something wrong?
SLOW WITH FILTER QUERY (takes more than 1 second)
Hello,
I just wonder what's wrong with highlighting?
On Tue, Aug 18, 2015 at 4:19 PM, Basheer Shaik shaikb...@hotmail.com
wrote:
Hi,
I am new to Solr. We have a requirement to carry out fuzzy search. I am
able
to do this and figure out the documents that meet the fuzzy search
criteria.
Is
SOLR version - 4.10.3
We have SOLR Cloud cluster, each node has documents only for several
categories.
Queries look like ...fq=cat(1 3 89 ...)...
So, only some nodes need to process, others can answer with zero as soon as
they check cat.
The problem is to keep separate cache for cat values on
Hi,
I am new to Solr. We have a requirement to carry out fuzzy search. I am able
to do this and figure out the documents that meet the fuzzy search criteria.
Is there a way to find out the list of terms from each selected document
that matched this search criteria?
Appreciate any help on this.
Solr Cloud Document Routing described at
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
allows you to omit hitting certain shards, but they need to be assigned
with the different prefixes beforehand.
Do I get your point right?
On Tue, Aug 18, 2015 at 4:57
I'm sorry for being so unclear.
The problem is in speed - while node holds only several cats, it can answer
with numFound=0, if these cats are missed in query.
It looks like:
node 1 - cats 1,2,3
node 2 - cats 3,4,5
node 3 - cats 50,70
...
Query q=cat:(1 4)
QTime per node now is like
node1 -
Thanks for the response. Does this cache behavior influence the delay in
catching up with cloud? How can we explain solr cloud replication and
what are the option to monitor and take proactive action (such as
initializing, pausing etc) if needed?
On 8/18/15 5:57 AM, Shawn Heisey wrote:
On
On 8/18/2015 2:30 AM, Daniel Collins wrote:
I think this is expected. As Shawn mentioned, your hard commits have
openSearcher=false, so they flush changes to disk, but don't force a
re-open of the active searcher.
By contrast softCommit, sets openSearcher=true, the point of softCommit is
to
I second that question! Inquiring minds want to know!
On 8/18/2015 7:19 AM, Basheer Shaik wrote:
Hi,
I am new to Solr. We have a requirement to carry out fuzzy search. I am able
to do this and figure out the documents that meet the fuzzy search criteria.
Is there a way to find out the list of
I am not sure I understand the problem statement. Is it speed? Memory
usage? Something very specific about SolrCloud?
To me it seems the problem is that your 'fq' _are_ getting cached when
you may not want them as the list is different every time. You could
disable that cache.
Or you could try
Have you tried this with Cache=false?
https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters
Because the internal representation of the field value already may be
doing what you want. And the caching of non-repeating filters is what
slowing it down.
I would just do that as a
Thanks Shawn.
All participating cloud nodes are running Tomcat and as you suggested
will review the number of threads and increase them as needed.
Essentially, what I have noticed was that two of four nodes caught up
with bulk updates instantly while other two nodes took almost 3 hours
to
On Tue, Aug 18, 2015 at 12:23 PM, naga sharathrayapati
sharathrayap...@gmail.com wrote:
Is it possible to clear the cache through query?
I need this for performance valuation.
No, but you can prevent a query from being cached:
q={!cache=false}my query
What are you trying to test the
Is it possible to clear the cache through query?
I need this for performance valuation.
: I am getting following exception for the query :
: *q=field:querystats=truestats.field={!cardinality=1.0}field*. The
: exception is not seen once the cardinality is set to 0.9 or less.
: The field is *docValues enabled* and *indexed=false*. The same exception
: I tried to reproduce on non
Couple of things:
1 Here's an excellent backgrounder for MMapDirectory, which is
what makes it appear that Solr is consuming all the physical memory
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
2 It's possible that your transaction log was huge. Perhaps not
Thanks for the response. Will take a look into using cloud solr server
for updates and review tlog mechanism.
On 8/18/15 9:29 AM, Erick Erickson wrote:
Couple of things:
1 Here's an excellent backgrounder for MMapDirectory, which is
what makes it appear that Solr is consuming all the physical
On 8/18/2015 8:18 AM, Rallavagu wrote:
Thanks for the response. Does this cache behavior influence the delay
in catching up with cloud? How can we explain solr cloud replication
and what are the option to monitor and take proactive action (such as
initializing, pausing etc) if needed?
I don't
Maybe a specialized highlighter could be produced that simply lists the
matched terms in a form that apps can easily consume.
-- Jack Krupansky
On Tue, Aug 18, 2015 at 11:11 AM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
Hello,
I just wonder what's wrong with highlighting?
On Tue,
I did try Highlighting, but it is highlighting only those words which are
part of the query, not the matching phrase.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-Matched-Terms-tp4223649p4223688.html
Sent from the Solr - User mailing list archive at Nabble.com.
On 8/18/2015 7:21 AM, Norgorn wrote:
SOLR version - 4.10.3
We have SOLR Cloud cluster, each node has documents only for several
categories.
Queries look like ...fq=cat(1 3 89 ...)...
So, only some nodes need to process, others can answer with zero as soon as
they check cat.
The problem is
Check out https://issues.apache.org/jira/browse/SOLR-4722, which will
return matching terms (and their offsets). Patch can be applied cleanly to
Solr 4; doesn't appear to have been tried with Solr 5
-Simon
On Tue, Aug 18, 2015 at 11:30 AM, Jack Krupansky jack.krupan...@gmail.com
wrote:
Maybe a
Hi All,
I am working on a search service based on Solr (v5.1.0). The data size is 15
M records. The size of the index files is 860MB. The test was performed on a
local machine that has 8 cores with 32 G memory and CPU is 3.4Ghz (Intel
Core i7-3770).
I found out that setting docValues=true for
64 matches
Mail list logo