Re: Need help with auto-suggester

2017-04-13 Thread Binoy Dalal
You can create a copy field and copy to it from all the fields you want to retrieve the suggestions from and then use that field with the suggester. On Thu 13 Apr, 2017, 23:21 OTH, wrote: > Hello, > > I've followed the steps here to set up auto-suggest: > https://lucidworks.com/2015/03/04/solr-s

Re: Error when adding user for Solr Basic Authentication

2017-04-13 Thread Zheng Lin Edwin Yeo
i found from StackOverflow that we have to use escaping of single quote and to use double quote around the JSON string. http://stackoverflow.com/questions/43387719/error-when-adding-user-for-solr-basic-authentication/43387895#43387895 curl --user user:password http://localhost:8983/solr/admin/auth

Solr 4.10 and Distributed pivot faceting in Non-Solr cloud mode

2017-04-13 Thread Natarajan, Rajeswari
Hi , Would like to know Solr 4.10 supports distributed pivot faceting in non-solr cloud mode. According to the below JIRA , it looks like it is fixed in 4.10. But we use solr in non cloud mode. https://issues.apache.org/jira/browse/SOLR-2894 Thank you, Raji

Re: Enable Gzip compression Solr 6.0

2017-04-13 Thread Rick Leir
Hi Mahmoud Beware of using a proxy. Your web application will get attacked, and you should only forward the parameters that are needed for your app features. But you thought of that already. Cheers -- Rick On April 12, 2017 11:39:57 PM EDT, Mahmoud Almokadem wrote: >Thanks Rick, > >I already r

Re: keywords not found - google like feature

2017-04-13 Thread GW
After reading everyone's post, my thoughts are sometimes things are better achieved with smoke and mirrors. I achieved something similar by measuring my scores with no keyword hits. I wrote simple jquery script to do a CSS strike through on the returned message if the score was poor, + I returned

Re: DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain

2017-04-13 Thread Ishan Chattopadhyaya
Why are you adding these update processors (esp. the AddSchemaFieldsUpdateProcessor) after DistributedUpdateProcessor? Try adding them before DUP, and it has a better chance to work. On Wed, Apr 12, 2017 at 3:44 PM, Pratik Thaker < pratik.tha...@smartstreamrdu.com> wrote: > Hi All, > > I am facin

Re: keywords not found - google like feature

2017-04-13 Thread Erick Erickson
bq: he searches he wants to know what keywords were not found in results. We need to distinguish between words not found in the returned documents and words not found at all. The solutions above tell you about documents returned. If the keyword was found in a document not returned (say the 11th d

RE: keywords not found - google like feature

2017-04-13 Thread Markus Jelsma
Hi - That is not going to be that easy out-of-the-box. In regular setups the output you find in debugging mode contains stemmed versions of the original input text. At best you use KeepWordsFilterFactory to get unstemmed terms, but those tokens would, in usual cases, also have passed through fi

Re: keywords not found - google like feature

2017-04-13 Thread Nilesh Kamani
Thanks for your input guys. I will look into it. On Thu, Apr 13, 2017 at 4:07 PM, simon wrote: > Regardless of the business case (which would be good to know) you might > want to try something along the lines of > http://stackoverflow.com/questions/25038080/how-can-i- > tell-solr-to-return-the-h

Re: keywords not found - google like feature

2017-04-13 Thread simon
Regardless of the business case (which would be good to know) you might want to try something along the lines of http://stackoverflow.com/questions/25038080/how-can-i-tell-solr-to-return-the-hit-search-terms-per-document - basically generate pseudo-fields using the exists() function query which wil

Re: keywords not found - google like feature

2017-04-13 Thread David Hastings
Another ugly solution would be to use the debugQuery=true option, then analyze the reults in explain, if the word isnt in the explain, then you strike it out. On Thu, Apr 13, 2017 at 4:01 PM, Markus Jelsma wrote: > Hi - There is no such feature out-of-the-box in Solr. But you probably > could mo

RE: maxDoc ten times greater than numDoc

2017-04-13 Thread Markus Jelsma
Thanks, but i am not going to be brave this time :) I have tried reclaimDeletesWeight on an ordinary index some time ago and it was very aggresive with slightly higher values than default. I think setting this weight in this situation would be analogous to a forceMerge every time, which makes s

RE: keywords not found - google like feature

2017-04-13 Thread Markus Jelsma
Hi - There is no such feature out-of-the-box in Solr. But you probably could modify a highlighter implementation to return this information, the highlighter is the component that comes closest to that feature. Regards, Markus -Original message- > From:Nilesh Kamani > Sent: Thursday

Re: keywords not found - google like feature

2017-04-13 Thread Nilesh Kamani
Here is the example. https://www.google.ca/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=off&q=solr+spring+trump You will see this under search results. Missing: trump I am not asking for visual representation of such feature. Is there anyway solr is returning such info in response ?

Re: keywords not found - google like feature

2017-04-13 Thread Alexandre Rafalovitch
Are you asking visual representation or an actual feature. Because if all your keywords/clauses are optional (default SHOULD) then Solr automatically tries to match maximum number of them and then less and less. So, if all words do not match, it will return results that match less number of words.

Re: keywords not found - google like feature

2017-04-13 Thread Erick Erickson
Pasted images are generally stripped out, you'll have to provide an external link. On Thu, Apr 13, 2017 at 12:04 PM, Nilesh Kamani wrote: > Something like this. Does SOLR have such feature ? > > [image: Inline image 1] > > On Thu, Apr 13, 2017 at 1:49 PM, Nilesh Kamani > wrote: > >> Hello All,

Re: keywords not found - google like feature

2017-04-13 Thread Nilesh Kamani
Something like this. Does SOLR have such feature ? [image: Inline image 1] On Thu, Apr 13, 2017 at 1:49 PM, Nilesh Kamani wrote: > Hello All, > > When we search google, sometimes google returns results with mention of > keywords not found (mentioned as strike-through) > > Does Solr provide such

Re: Autosuggestion

2017-04-13 Thread OTH
Hello So, from what I've picked up so far: FST only matches from the beginning of the input, but can handle spelling errors and do stemming. AnalyzingInfix can't handle spelling errors or stemming but can match from the middle of the string. (Is there anyway to achieve both of the functionalities a

Need help with auto-suggester

2017-04-13 Thread OTH
Hello, I've followed the steps here to set up auto-suggest: https://lucidworks.com/2015/03/04/solr-suggester/ So basically I configured the auto-suggester in solrconfig.xml, where I told it which field in my index needs to be used for auto-suggestion. The problem is: When the user searches in th

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-13 Thread Chetas Joshi
Hi Shawn, Thanks for the insights into the memory requirements. Looks like cursor approach is going to require a lot of memory for millions of documents. If I run a query that returns only 500K documents still keeping 100K docs per page, I don't see long GC pauses. So it is not really the number o

keywords not found - google like feature

2017-04-13 Thread Nilesh Kamani
Hello All, When we search google, sometimes google returns results with mention of keywords not found (mentioned as strike-through) Does Solr provide such feature ? Thanks, Nilesh Kamani

Re: keyword-in-content for PDF document

2017-04-13 Thread Alexandre Rafalovitch
The boundary scanner supports sentence as per: https://cwiki.apache.org/confluence/display/solr/Highlighting So, the word in context should - if I remember correctly - give you the sentence that word is in even if the field has longer text. Regards, Alex. http://www.solr-start.com/ - Reso

RE: keyword-in-content for PDF document

2017-04-13 Thread Allison, Timothy B.
If you don't care about sentence boundaries, but just want a window around target terms and you want concordance functionality (sort before, after, etc), you might check out LUCENE-5317, which is available as a standalone jar on my github site [1] and is available through maven central. Using a

Re: keyword-in-content for PDF document

2017-04-13 Thread ankur
Thanks Alex. Yes, I am using TIKA. So, to some extent it preserves the text flow. There is something interesting in your reply, "Or you could try using highlighter to return only the sentence. ". I didnt understand that bit. How do we use Highlighter to return the sentence? To make sure, I want

Re: Filtering results by minimum relevancy score

2017-04-13 Thread Walter Underwood
BM25 came out of work on probabilistic engines, but using BM25 in Solr doesn’t automatically make it probabilistic. I read a paper once that showed the two models are not that different, maybe by Karen Sparck-Jones. Still, even with a probabilistic model, relevance cutoffs don’t work. It is st

Re: keyword-in-content for PDF document

2017-04-13 Thread Alexandre Rafalovitch
With great difficulty. PDF does not usually preserve the text flow, it uses instead absolute positioning for text fragments. Extraction will try to approximate the right thing, but it is an approximation. And if you have two columns, it is harder again. Some documents may have accessibility layer,

Re: Grouped Result sort issue

2017-04-13 Thread alessandro.benedetti
I had the chance to make some investigation code side, and I basically confirm what Erick hypothesized and what Diego Ceccarelli mentioned in this other thread [1]. Grouping happens with a 2 collector phases strategy : 1) first phase retrieve and sort the groups 2) second phase retrieve the top d

Re: maxDoc ten times greater than numDoc

2017-04-13 Thread Erick Erickson
If you want to be brave Through a clever bit of reflection, the parameters that TieredMergePolicy uses to decide what segments to reclaim are settable in solrconfig.xml (undocumented, so use at your own risk). You could try bumping reclaimDeletesWeight in your TieredMergePolicy configuration

Re: AW: What does the replication factor parameter in collections api do?

2017-04-13 Thread Erick Erickson
bq: Why is it possible then to alter replicationFactor via MODIFYCOLLECTION in the collections API Because MODIFYCOLLECTION just changes properties in the collection definition generically and replicationFactor just happens to be one. IOW there's no overarching reason. It would be extra work to d

Re: Autosuggestion

2017-04-13 Thread Erick Erickson
bq: FST-based vs AnalyzingInfix They are two totally different things. FST-based suggesters are very fast and compact. But they only match from the beginning of the input. AnalyzingInfix creates a "sidecar" index that's searched like a normal index and the _field_ is returned. Thus analyzinginfi

keyword-in-content for PDF document

2017-04-13 Thread ankur
If i am search for word "growth" in a PDF, i want to output all the sentences with the word "growth" in it. How can that be done? -- View this message in context: http://lucene.472066.n3.nabble.com/keyword-in-content-for-PDF-document-tp4329754.html Sent from the Solr - User mailing list archiv

Re: keyword-in-context for PDF document

2017-04-13 Thread ankur
Apologies, I meant "keyword-in-context". -- View this message in context: http://lucene.472066.n3.nabble.com/keyword-in-content-for-PDF-document-tp4329754p4329756.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using BasicAuth with SolrJ Code

2017-04-13 Thread Zheng Lin Edwin Yeo
The security.json which I'm using is the default one that is available from the Solr Documentation https://cwiki.apache.org/confluence/display/ solr/Basic+Authentication+Plugin. { "authentication":{ "blockUnknown": true, "class":"solr.BasicAuthPlugin", "credentials":{"solr":"IV0EHq1OnNrj6

Re: Autosuggestion

2017-04-13 Thread OTH
Thanks, that's very helpful! The third link especially is quite helpful. Is there any recommendation regarding using FST-based vs AnalyzingInfix suggesters? Thanks On Wed, Apr 12, 2017 at 6:23 PM, Andrea Gazzarini wrote: > Hi, > I think you got an old post. I would have a look at the built-in fe

Re: SOLR - 6.4.0 SolrCore Initialization Failures

2017-04-13 Thread Shawn Heisey
On 4/13/2017 4:37 AM, Uchit Patel wrote: > I have recently moved my cores from SOLR 5.1.0 to 6.4.0. I am using windows > environment. I have large data in cores. I have total 6 cores with total data > 142 GB. All cores are migrated perfectly but one is giving error: > > SolrCore Initialization F

Re: AW: What does the replication factor parameter in collections api do?

2017-04-13 Thread Shawn Heisey
On 4/13/2017 3:22 AM, Johannes Knaus wrote: > Ok. Thank you for your quick reply. Though I still feel a little > uneasy. Why is it possible then to alter replicationFactor via > MODIFYCOLLECTION in the collections API? What would be the use case > for this parameter at all then? If you use a very

Re: Using BasicAuth with SolrJ Code

2017-04-13 Thread Noble Paul
That looks good. can you share the security.json (commenting out anything that's sensitive of course) On Wed, Apr 12, 2017 at 5:10 PM, Zheng Lin Edwin Yeo wrote: > This is what I get when I run the code. > > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server

Re: maxDoc ten times greater than numDoc

2017-04-13 Thread Alexandre Rafalovitch
Maybe not every entry got deleted and it was holding up the segment. E.g. a child or parent record abandoned. If, for example, the parent record has a date field and the child does not, then deleting with a date-based query may trigger this. I think there was a bug about abandoned child or somethin

SOLR - 6.4.0 SolrCore Initialization Failures

2017-04-13 Thread Uchit Patel
Hi, I have recently moved my cores from SOLR 5.1.0 to 6.4.0.  I am using windows environment. I have large data in cores. I have total 6 cores with total data 142 GB. All cores are migrated perfectly but one is giving error: SolrCore Initialization Failures - core_name: org.apache.solr.c

RE: maxDoc ten times greater than numDoc

2017-04-13 Thread Markus Jelsma
I have forced a merge yesterday and went back to one segment. One indexer program reindexes (most or all) every 20 minutes orso. There is nothing custom at that particular point. There is no autoCommit, the indexer program is responsible for a hard commit, it is the single source of reindexed d

AW: What does the replication factor parameter in collections api do?

2017-04-13 Thread Johannes Knaus
Ok. Thank you for your quick reply. Though I still feel a little uneasy. Why is it possible then to alter replicationFactor via MODIFYCOLLECTION in the collections API? What would be the use case for this parameter at all then? -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:eric

Re: Filtering results by minimum relevancy score

2017-04-13 Thread alessandro.benedetti
Hi Koji, strictly talking about TF-IDF ( and BM25 which is an evolution of that approach) I would say it is a weighting function/numerical statistic that can be used for ranking functions and is based on probabilistic concepts ( such as IDF) but it is not a probabilistic function[1]. Indeed a BM25

The resolution of the pf parameter with dismax and dismax is not consistent

2017-04-13 Thread yihl
Hi: I have a question about edismax and dismax. I'm using SOLR 6.3.0. Both types of query statements are the same, but the results is differ. Now there is such a document: pj_title:word1 word2 word3 1. edismax q=word1 word2&qf=pj_title&pf=pj_title&defType=edi

streaming expressions parallel merge

2017-04-13 Thread Damien Kamerman
Hi, With solr streaming expressions is there a way to parallel merge a number of solr streams. Or a way to apply the parallel function to something like this? merge( search(collection1, ...), search(collection2, ...), ... on="id asc") ) Cheers, Damien.

Error when adding user for Solr Basic Authentication

2017-04-13 Thread Zheng Lin Edwin Yeo
Hi, When I try to add the user for the Solr Basic Authentication using the following method in curl curl --user user:password http://localhost:8983/solr/admin/authentication -H 'Content-type:application/json' -d '{ "set-user": {"tom" : "TomIsCool" , "harry":"HarrysSecret"}}' I g