Re: Solr Query Explain Plan

2016-02-14 Thread Binoy Dalal
There is another resource to help analyze your queries: splainer.io As for query tuning, that is a really vast topic and there is no straightforward answer. You'll have to experiment and find the settings that suit you best. Here's a few resources to help you get started:

Re: Solr Query Explain Plan

2016-02-14 Thread Shahzad Masud
Thank you Binoy. Is there any pointer available to tune similar queries, as it is taking a huge amount of time? Shahzad On Mon, Feb 15, 2016 at 10:18 AM, Binoy Dalal wrote: > Append =true to your query. > It isn't exactly like a SQL execution plan but will give you the

Re: Negating multiple array fileds

2016-02-14 Thread Salman Ansari
@Binoy: The query does work but for one term (-persons:[* TO *]) but it does not work for multiple terms such as http://[Myserver]/solr/[Collection]/select?q=(-persons:[* TO *])AND(-orgs:[* TO *]) This returns zero records although I do have records that has both persons and orgs empty. @Jack:

Re: "pf" not supported by edismax?

2016-02-14 Thread Derek Poh
Hi Jack Sorry I am confused. For mycase,it seems that "pf" only work with dismax. with dismax: +((spp_keyword_exact:dvd) (spp_keyword_exact:bracket)) *(spp_keyword_exact:dvd bracket)* with edismax: +((spp_keyword_exact:dvd) (spp_keyword_exact:bracket)) () On 2/15/2016 1:26 PM, Jack

Re: "pf" not supported by edismax?

2016-02-14 Thread Jack Krupansky
Maybe because the tokenized phrase produces only a single term it is ignored. In any case, it won't be a phrase. pf only does something useful for phrases. IOW, where a PhraseQuery can be generated. A PhraseQuery for more than a single term would never match when the field value is a single term.

Re: Solr Query Explain Plan

2016-02-14 Thread Binoy Dalal
Append =true to your query. It isn't exactly like a SQL execution plan but will give you the details of how the query was parsed, scored and how much time was taken by each module used by the request handler. On Mon, 15 Feb 2016, 10:42 Shahzad Masud < shahzad.ma...@northbaysolutions.net> wrote:

Re: "pf" not supported by edismax?

2016-02-14 Thread Derek Poh
It is using KeywordTokenizerFactory. It is still consider as tokenized? Here's the field definition: type="gs_keyword_exact" multiValued="true"/> positionIncrementGap="100"> On 2/15/2016 12:43 PM, Jack

Solr Query Explain Plan

2016-02-14 Thread Shahzad Masud
Please pardon my ignorance, but just wanted to check if there is anything like explain plan while executing query on Solr. I have one query which is taking a lot of time (56-68 seconds) with very huge network activity. While most of queries are taking less than 4 seconds.

Re: Need to move on SOlr cloud (help required)

2016-02-14 Thread Midas A
Erick, We are using php for our application so client would you suggest . currently we are using pecl solr client . but i want to understand that suppose we sent a request to a node and that node is down that time how solrj figure out where request should go. On Fri, Feb 12, 2016 at 9:44

Default max number of connections

2016-02-14 Thread Anil
HI , I am using solr cloud with zookeeper. is that 20 is the default number of max connections per host ? Is there any way to use connection pooling like solr http connection ? Please clarify. Regards, Anil

Re: "pf" not supported by edismax?

2016-02-14 Thread Jack Krupansky
pf stands for phrase boosting, which implies tokenized text... spp_keyword_exact sounds like it is not tokenized. -- Jack Krupansky On Sun, Feb 14, 2016 at 10:08 PM, Derek Poh wrote: > Hi > > Correct me If I am wrong, edismax is an extension of dismax, so it will >

"pf" not supported by edismax?

2016-02-14 Thread Derek Poh
Hi Correct me If I am wrong, edismax is an extension of dismax, so it will support "pf". But from my testing I noticed "pf" is not working with edismax. From the debug information of a query using "pf" with edismax, there is no phrase match for the "pf" field "spp_keyword_exact". If I

Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Mark Ehle
is all the text being indexed? Check to make sure that there's actually the data you are looking for in the index. Is there a setting in tika that limits how much is indexed? I seem to remember confronting this problem myself once, and the data that I wanted just wasn't in the index because it was

Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Binoy Dalal
What you've done so far will highlight every instance of "nietava" found in the field, and return it, i.e., your entire field will return with all the "nietava"s in tags. If you do not want the entire field, only portions of your field containing the matched terms, then use hl.snippets parameter

Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Evert R.
Binoy, You are the man! =) Thank you very much! Would you by chance know how could I get the second highlight of the same word in the same file? Like: file_1.pdf (has three words "nietava") so..., how can I bring the highlighs for the three occurrences? I am pretty new around, should I send

Document Routing based on clientid (and null clientid)

2016-02-14 Thread Brian Narsi
My current design: All clients data in a 2 shard 2 replica each 2 node solr cluster. The data contains records with both clientid having value and clientid=null (the clientid=null is used for search across all clients) When searching I use fq: clientid = null or clientid =

Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Evert R.
Hi Binoy, thanks! Still not working, check the output: { "responseHeader":{ "status":0, "QTime":58, "params":{ "q":"nietava", "hl":"true", "hl.simple.post":"", "indent":"true", "fl":"id", "hl.flagsize":"0", "hl.fl":"content",

Re: Adding nodes

2016-02-14 Thread McCallick, Paul
These are excellent questions and give me a good sense of why you suggest using the collections api. In our case we have 8 shards of product data with a even distribution of data per shard, no hot spots. We have very different load at different points in the year (cyber monday), and we tend to

Re: Adding nodes

2016-02-14 Thread Susheel Kumar
Hi Pual, For Auto-scaling, it depends on how you are thinking to design and what/how do you want to scale. Which scenario you think makes coreadmin API easy to use for a sharded SolrCloud environment? Isn't if in a sharded environment (assume 3 shards A,B & C) and shard B has having higher or

Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Binoy Dalal
Are you sure you've typed in the parameters correctly? In your response it says flagsize instead of fragsize and maxanalzyedchars instead of maxanalyzedchars. Ohh wait, I see that I made the analyzed typo. Awfully sorry for that, I'm using my phone to send the mail out. On Sun, 14 Feb 2016,

Negating multiple array fileds

2016-02-14 Thread Salman Ansari
Hi, I think what I am asking should be easy to do but for some reasons I am facing issues in making that happen. The issue is that I want include/exclude some fields from my Solr query. All the fields that I need to include are multi valued int fields. When I include the fields I have the

Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Binoy Dalal
>From the solr wiki: hl.maxAnalyzedChars How many characters into a document to look for suitable snippets  Solr1.3. This parameter makes sense for the original Highlighter only. The default value is "51200". You can assign a large value to this parameter and use hl.fragsize=0 to return

Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Evert R.
Hi Binoy, I could not find this option in my solrconfig.xml file. ] I tryied to add this setting and nothing changed... Here is the code, I might miss placed: 400 409600 200

Re: Negating multiple array fileds

2016-02-14 Thread Binoy Dalal
Try negating by using a range query like (-persons:[* TO *]) I've always used this and it has always worked for me. On Sun, 14 Feb 2016, 18:51 Salman Ansari wrote: > Hi, > > I think what I am asking should be easy to do but for some reasons I am > facing issues in

Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Binoy Dalal
Don't add this parameter to the searchComponent definition, because the components where you've added it, GapFragmenter and RegexFragmenter, simply don't use it. Instead, add it to your request handler (/select etc.) if you've configured highlighting in the handler or append it to your query: *=*.

Re: Adding nodes

2016-02-14 Thread McCallick, Paul
Hi all, This doesn’t really answer the following question: What is the suggested way to add a new node to a collection via the apis? I am specifically thinking of autoscale scenarios where a node has gone down or more nodes are needed to handle load. The coreadmin api makes this easy. The

Re: Adding nodes

2016-02-14 Thread McCallick, Paul
Then what is the suggested way to add a new node to a collection via the apis? I am specifically thinking of autoscale scenarios where a node has gone down or more nodes are needed to handle load. Note that the ADDREPLICA endpoint requires a shard name, which puts the onus of how to scale

Re: Negating multiple array fileds

2016-02-14 Thread Jack Krupansky
Due to a bug (or poorly designed feature), you need to explicitly include a non-negative query term in a purely negative sub-query. Usually this means using *:* to select all documents. Note that the use of parentheses introduces a sub-query. So, (-persons:*) s.b. (*:* -persons:*). -- Jack

Re: Adding nodes

2016-02-14 Thread Susheel Kumar
Hi Paul, Shawn is referring to use Collections API https://cwiki.apache.org/confluence/display/solr/Collections+API than Core Admin API https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API for SolrCloud. Hope that clarifies and you mentioned about ADDREPLICA which is the collections

[ANNOUNCE] Luke 5.4.0 released

2016-02-14 Thread Dmitry Kan
This is a major release supporting lucene / solr 5.3.0. Download the release zip here: https://github.com/DmitryKey/luke/releases/tag/luke-5.4.0 Fixed in this release: #43 Build failure due to missing org.restlet-2.3.0 #46 upgrade to 5.4 (thanks to this pull request: #47) Also released

Highlight brings the content from the first pages of pdf

2016-02-14 Thread Evert R.
Hi There, I have a situation where started a techproducts, without any modification, post a pdf file. When searching as: q=text:search_word hl=true hl.fl=content It show the highlight accordingly! =) BUT... *if the "search_word" is after the first pages* in my pdf file, such as page 15... It

Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Evert R.
Hi Paul, Sorry my late reply. All the content is inside de docs. It brings the docs and the pdf file that has the search word in it. But the highlight is not showing if the search word is after a few pages. Evert *--Evert* 2016-02-14 8:36 GMT-02:00 Paul Libbrecht : > This

Re: Highlight brings the content from the first pages of pdf

2016-02-14 Thread Paul Libbrecht
This looks like the stored content is shortened. Can it be? Can you see that inside the docs? paul > Evert R. > 14 February 2016 at 11:26 > Hi There, > > I have a situation where started a techproducts, without any modification, > post a pdf file. When searching

Re: Adding nodes

2016-02-14 Thread Shawn Heisey
On 2/13/2016 6:01 PM, McCallick, Paul wrote: > - When creating a new collection, SOLRCloud will use all available nodes for > the collection, adding cores to each. This assumes that you do not specify a > replicationFactor. The number of nodes that will be used is numShards multipled by