User Authentication

2015-08-24 Thread LeZotte, Tom
Hi Solr Community I have been trying to add user authentication to our Solr 5.3.1 RedHat install. I’ve found some examples on user authentication on the Jetty side. But they have failed. Does any one have a step by step example on authentication for the admin screen? And a core? Thanks Tom

Solr relevancy score order

2015-08-24 Thread Steven White
Hi Everyone, When I search for a term in Solr, and it happens that 10 doc end up with the same score, what's the order of doc ranking in the set of those 10 equally scored doc and what is it based on? Is there a link I can read more about this? Thanks, Steve

Re: Solr relevancy score order

2015-08-24 Thread Ahmet Arslan
Hi Steven, When scores produce a tie, internal Lucene document IDs are used to break it. However, internal Lucene Ids can change when index changes. (merges, updates etc). You can see those values with [docid] - DocIdAugmenterFactory. If you want 100% stable sorting, use a second sorting

Re: User Authentication

2015-08-24 Thread Alexandre Rafalovitch
Thanks for the email from the future. It is good to start to prepare for 5.3.1 now that 5.3 is nearly out. Joking aside (and assuming Solr 5.2.1), what exactly are you trying to achieve? Solr should not actually be exposed to the users directly. It should be hiding in a backend only visible to

how to index document with multiple words (phrases) and words permutation?

2015-08-24 Thread afrooz
I need to find a solution to index my documents base on a dictionary. This dictionary contains 3 million phrases. I have one big challenge and that is: I need to index document base on this dictionary only with a consideration that words permutation is also accepted. For example : I have a phrase

Retreive new inserted data !

2015-08-24 Thread Pegazus
Hi all, I would like to now if it is possible to retreive added document not yet flush in hard disk. I have my case here: I insert two document in solr and just after this insert i try to retrieve them. Three result possible : - Success to retrieve those two documents. - Retrieve only one. -

Re: exclude folder in dataimport handler.

2015-08-24 Thread coolmals
I used this to exclude files from folders of templatedata. But it still couldnt remove these files from indexing field column=$skipDoc regex=.*\\avon-br\\templatedata\\.* replaceWith=true sourceColName=fileAbsolutePath/ When I save the value of this expression in temp

Re: Solr relevancy score order

2015-08-24 Thread Steven White
Thanks Ahmet. Steve On Mon, Aug 24, 2015 at 10:09 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Steven, When scores produce a tie, internal Lucene document IDs are used to break it. However, internal Lucene Ids can change when index changes. (merges, updates etc). You can see

Re: Solr relevancy score order

2015-08-24 Thread Ahmet Arslan
Hi Steven, Here is the relevant Jira ticket : https://issues.apache.org/jira/browse/LUCENE-6057 Ahmet On Monday, August 24, 2015 5:09 PM, Ahmet Arslan iori...@yahoo.com.INVALID wrote: Hi Steven, When scores produce a tie, internal Lucene document IDs are used to break it. However, internal

Re: GC parameters tuning for core of 140M docs on 50G of heap memory

2015-08-24 Thread Shawn Heisey
On 8/24/2015 12:48 AM, Pavel Hladik wrote: we have a Solr 5.2.1 with 9 cores and one of them has 140M docs. Can you please recommend tuning of those GC parameters? The performance is not a issue, sometimes during peaks we have OOM and we use 50G of heap memory, the server has 64G of ram.

Re: Solr 4.6.1 Cloud Stops Replication

2015-08-24 Thread Erick Erickson
bq: As a follow up, the default is set to NRTCachingDirectoryFactory for DirectoryFactory but not MMapDirectory. It is mentioned that NRTCachingDirectoryFactory caches small files in memory for better NRT performance. NRTCachingDirectoryFactory also uses MMapDirectory under the covers as well as

Performance gain with setting !cache=false in the query for complex queries

2015-08-24 Thread wwang525
Hi All, I am working on improving query performance of queries that is based on 15 M records, and all the queries have a list of about 6 filter queries with grouping and faceting requirements. So far, I found that the cache setting in solrconfig.xml is helpful after the Solr server is warmed

Re: User Authentication

2015-08-24 Thread LeZotte, Tom
Alex I got a super secret release of Solr 5.3.1, wasn’t suppose to say anything. Yes I’m running 5.2.1, I will check out the release notes for 5.3. Was looking for three types of user authentication, I guess. 1. the Admin Console 2. User auth for each Core ( and select and update) on a server.

Re: how to index document with multiple words (phrases) and words permutation?

2015-08-24 Thread Erick Erickson
This feels a little like an XY problem, from Hossman's apache page: Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue.

Re: Retreive new inserted data !

2015-08-24 Thread Erick Erickson
This is what the real time get handler is all about, see: https://cwiki.apache.org/confluence/display/solr/RealTime+Get Best, Erick On Mon, Aug 24, 2015 at 8:19 AM, Pegazus yoann.kla...@thalesgroup.com wrote: Hi all, I would like to now if it is possible to retreive added document not yet

Re: Performance gain with setting !cache=false in the query for complex queries

2015-08-24 Thread Erick Erickson
bq: Does that make sense? In a word, yes. Without {!cache=false}, each and every document in the entire corpus is examined and a bitset constructed that represents that result set, then the entry in the filter cache is constructed. With cache=false, only docs that make it through the rest of the

RE: Retreive new inserted data !

2015-08-24 Thread Pegazus
Thanks a lot !! Pegazus [@@ THALES GROUP INTERNAL @@] De : Erick Erickson [via Lucene] [mailto:ml-node+s472066n422494...@n3.nabble.com] Envoyé : lundi 24 août 2015 17:59 À : KLAUSZ Yoann Objet : Re: Retreive new inserted data ! This is what the real time get handler is all about, see:

Re: Disable caching

2015-08-24 Thread Jamie Johnson
I ran into another issue that I am having issue running to ground. My implementation on Solr 4.x worked as I expected but trying to migrate this to Solr 5.x it looks like some of the faceting is delegated to DocValuesFacets which ultimately caches things at a field level in the FieldCache.DEFAULT

Re: Solr relevancy score order

2015-08-24 Thread Steven White
A follow up question. Is the sub-sorting on the lucene internal doc IDs ascending or descending order? That is, do the most recently index doc show up first in this set of docs that have tied score? If not, who can I have the most recent be first? Do I have to sort on lucene's internal doc

Re: Exception while using {!cardinality=1.0}.

2015-08-24 Thread Chris Hostetter
: Can you please explain how having the same field for query and stat can : cause some issue for my better understanding of this feature? I don't know if it can, it probably shouldn't, but in terms of trying ot udnerstand the bug and reproduce it, any pertinant facts may be relivant -

Re: Solr relevancy score order

2015-08-24 Thread Chris Hostetter
: A follow up question. Is the sub-sorting on the lucene internal doc IDs : ascending or descending order? That is, do the most recently index doc you can not make any generic assumptions baout hte order of the internal lucene doc IDS -- the secondary sort on the internal IDs is stable (and

Re: User Authentication

2015-08-24 Thread Noble Paul
did you manage to look at the reference guide? https://cwiki.apache.org/confluence/display/solr/Securing+Solr On Mon, Aug 24, 2015 at 9:23 PM, LeZotte, Tom tom.lezo...@vanderbilt.edu wrote: Alex I got a super secret release of Solr 5.3.1, wasn’t suppose to say anything. Yes I’m running 5.2.1,

Re: how to index document with multiple words (phrases) and words permutation?

2015-08-24 Thread afrooz
Thanks Erick, I will explain the detail scenario so you might give me a solution: I want to annotate a medical document base on only medical dictionary. I don't need to annotate non medical words of document at all. The medical dictionary contains terms which contains multiple words, and these

Re: Solr relevancy score order

2015-08-24 Thread Steven White
Thanks Hoss. I understand the dynamic nature of doc-IDs. All that I care about is the most recent docs be at the top of the hit list when there is a tie. From your reply, it is not clear if that's what happens. If not, then I have to sort, but this is something I want to avoid so it won't add

Re: User Authentication

2015-08-24 Thread Steven White
Hi Noble, Is everything in the link you provided applicable to Solr 5.2.1? Thanks Steve On Mon, Aug 24, 2015 at 2:20 PM, Noble Paul noble.p...@gmail.com wrote: did you manage to look at the reference guide? https://cwiki.apache.org/confluence/display/solr/Securing+Solr On Mon, Aug 24,

Re: Solr 4.10.3 cached grouping results but Solr 5.2.1 don't, why?

2015-08-24 Thread Pavel Hladik
Nobody knows or has the same issue? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-3-cached-grouping-results-but-Solr-5-2-1-don-t-why-tp4224396p4224812.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: GC parameters tuning for core of 140M docs on 50G of heap memory

2015-08-24 Thread Yago Riveiro
Do you have docValues on for your fields? On Mon, Aug 24, 2015 at 7:48 AM, Pavel Hladik pavel.hla...@profimedia.cz wrote: Hi, we have a Solr 5.2.1 with 9 cores and one of them has 140M docs. Can you please recommend tuning of those GC parameters? The performance is not a issue, sometimes

Re: Multiple concurrent queries to Solr

2015-08-24 Thread Ashish Mukherjee
Thanks, everyone. Arcadius, that ticket is interesting. I was wondering if an implementation of SolrClient could be based on HttpAsyncClient instead of HttpSolrClient. Just a thought right now, which needs to be explored deeper. - Ashish On Mon, Aug 24, 2015 at 1:46 AM, Arcadius Ahouansou

GC parameters tuning for core of 140M docs on 50G of heap memory

2015-08-24 Thread Pavel Hladik
Hi, we have a Solr 5.2.1 with 9 cores and one of them has 140M docs. Can you please recommend tuning of those GC parameters? The performance is not a issue, sometimes during peaks we have OOM and we use 50G of heap memory, the server has 64G of ram. GC_TUNE=-XX:NewRatio=3 \ -XX:SurvivorRatio=4 \

Re: GC parameters tuning for core of 140M docs on 50G of heap memory

2015-08-24 Thread Toke Eskildsen
On Sun, 2015-08-23 at 23:48 -0700, Pavel Hladik wrote: we have a Solr 5.2.1 with 9 cores and one of them has 140M docs. Can you please recommend tuning of those GC parameters? The performance is not a issue, sometimes during peaks we have OOM and we use 50G of heap memory, the server has 64G

Re: GC parameters tuning for core of 140M docs on 50G of heap memory

2015-08-24 Thread Pavel Hladik
Very interesting, never heard about that. We tested on our x64 linux with java 1.8.0_51 and result is: java -jar -Xms31g -Xmx31g -Xmn50m memory.jar Total Memory (in bytes): 33279705088 Free Memory (in bytes): 33277314136 Max Memory (in bytes): 33279705088 Elements created and added to LinkedList:

Re: Solr 4.10.3 cached grouping results but Solr 5.2.1 don't, why?

2015-08-24 Thread Pavel Hladik
We use grouping, so will try collapsing. Thank you for ideas! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-3-cached-grouping-results-but-Solr-5-2-1-don-t-why-tp4224396p4224857.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: GC parameters tuning for core of 140M docs on 50G of heap memory

2015-08-24 Thread Toke Eskildsen
On Mon, 2015-08-24 at 03:17 -0700, Pavel Hladik wrote: It seems that most elements are really on 31G, but can we say that Solr application is used like a number of elements from this result? Fortunately Solr uses a lot of bit packing, which often translates to arrays of longs, which take up the

Re: GC parameters tuning for core of 140M docs on 50G of heap memory

2015-08-24 Thread Pavel Hladik
On sorting fields we have: dynamicField name=sortingRankCom2_fid* type=int indexed=false stored=false omitNorms=true docValues=true / dynamicField name=sortingRankEd2_fid* type=int indexed=false stored=false omitNorms=true docValues=true / -- View this message in context:

Re: Solr 4.6.1 Cloud Stops Replication

2015-08-24 Thread Rallavagu
As a follow up, the default is set to NRTCachingDirectoryFactory for DirectoryFactory but not MMapDirectory. It is mentioned that NRTCachingDirectoryFactory caches small files in memory for better NRT performance. Wondering if the this would also consume physical memory to the amount of MMap

Re: Solr performance is slow with just 1GB of data indexed

2015-08-24 Thread Upayavira
I honestly suspect your performance issue is down to the number of terms you are passing into the clustering algorithm, not to memory usage as such. If you have *huge* documents and cluster across them, performance will be slower, by definition. Clustering is usually done offline, for example on

Re: DIH delta-import pk

2015-08-24 Thread CrazyDiamond
i have autogenerated uuid for each document in solr. it is not marked as uniquefield. i add requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainuuid/str /lst /requestHandler in config to generate uuid when i add

Re: Solr 4.10.3 cached grouping results but Solr 5.2.1 don't, why?

2015-08-24 Thread Upayavira
Are you grouping or collapsing? Look at the {!collapse} post filter and the associated ExpandComponent, which may give you a similar outcome (depending upon what you are trying to achieve) but with better performance. Upayavira On Mon, Aug 24, 2015, at 07:42 AM, Pavel Hladik wrote: Nobody knows

Re: GC parameters tuning for core of 140M docs on 50G of heap memory

2015-08-24 Thread Rallavagu
One other item to check is non heap memory usage. This can be monitored from admin page. On 8/23/15 11:48 PM, Pavel Hladik wrote: Hi, we have a Solr 5.2.1 with 9 cores and one of them has 140M docs. Can you please recommend tuning of those GC parameters? The performance is not a issue,

TimeAllowed bug

2015-08-24 Thread Bill Bell
Weird fq caching bug when using timeAllowed Find a pwid (in this case YLGVQ) Run a query w/ a FQ on the pwid and timeAllowed=1. http://hgsolr2devsl.healthgrades.com:8983/solr/providersearch/select/?q=*:*wt=jsonfl=pwidfq=pwid:YLGVQtimeAllowed=1 Ensure #2 returns 0 results Rerun the query

Re: User Authentication

2015-08-24 Thread Steven White
For my project, Keberos is not a requirement. What I need is: 1) Basic Auth to Solr server (at all access levels) 2) SSL support My setup is not using ZK, it's a single core. Steve On Mon, Aug 24, 2015 at 4:12 PM, Don Bosco Durai bo...@apache.org wrote: Just curious, is Kerberos an option

Re: Multiple concurrent queries to Solr

2015-08-24 Thread Arcadius Ahouansou
BTW, google revealed that there is a 3rd-party Scala library for async calls which could be usable from Java. I have not tried it myself though https://github.com/inoio/solrs On 24 August 2015 at 21:35, Arcadius Ahouansou arcad...@menelic.com wrote: Hi Ashish. The Apache HttpAsyncClient uses

Re: Multiple concurrent queries to Solr

2015-08-24 Thread Arcadius Ahouansou
Hi Ashish. The Apache HttpAsyncClient uses Java Future to wrap a synchronous call into asyn The above ticket does similar thing by wrapping a SolrJ call into Future Feel free to submit any proposal you may have to the dev list. Arcadius On 24 August 2015 at 07:20, Ashish Mukherjee

Re: SOLR 5.3

2015-08-24 Thread Noble Paul
The release is underway. Incorporating some corrections suggested by others. Expect an announcement ove rthe next few hours On Sun, Aug 23, 2015 at 6:44 PM, Arcadius Ahouansou arcad...@menelic.com wrote: Solr-5.3 has been available for download from

Re: how to index document with multiple words (phrases) and words permutation?

2015-08-24 Thread Alexandre Rafalovitch
These look like requirements for a generic Solr search, maybe with focus on proximity and/or phrase matching. Perhaps some white-listing filter if you have a fixed set of words you care about. E.g. with KeepWordFilter in the analyzer chain.

Re: User Authentication

2015-08-24 Thread Don Bosco Durai
Just curious, is Kerberos an option for you? If so, mostly all your 3 use cases will addressed. Bosco On 8/24/15, 12:18 PM, Steven White swhite4...@gmail.com wrote: Hi Noble, Is everything in the link you provided applicable to Solr 5.2.1? Thanks Steve On Mon, Aug 24, 2015 at 2:20 PM,

Using copyField with dynamicField

2015-08-24 Thread Zach Thompson
Hi All, Is it possible to use copyField with dynamicField?  I was trying to do the following, dynamicField name=*_text type=text indexed=true  stored=true/ copyField source=*_text dest=text maxChars=100 / and getting a 400 error on trying to copy the first dynamic field. Without the copyField

Re: User Authentication

2015-08-24 Thread LeZotte, Tom
Bosco, We use CAS for user authentication, not sure if we have Kerberos working anywhere. Also we are not using ZooKeeper, because we are only running one server currently. thanks Tom LeZotte Health I.T. - Senior Product Developer (p) 615-875-8830 On Aug 24, 2015, at 3:12 PM, Don Bosco

Re: Performance gain with setting !cache=false in the query for complex queries

2015-08-24 Thread wwang525
Hi Erick, The earlier test was done through individual requests. However, my load test is even better. (1) load test (3 requests/per second/per core) immediately after restarting Solr: average response time: 122 ms (2) load test (5 requests/per second/per core) immediately after restarting Solr:

Performance improvements

2015-08-24 Thread naga sharathrayapati
In order to improve the query time of nested faceting query (json facet api), have used 'docValues' in the schema,optimized index and increased cache sizes(no evictions) I still cannot be bring the query time to less than 1 sec. is there anything that i can do that can improve the performance?

Re: Performance improvements

2015-08-24 Thread Shawn Heisey
On 8/24/2015 4:33 PM, naga sharathrayapati wrote: In order to improve the query time of nested faceting query (json facet api), have used 'docValues' in the schema,optimized index and increased cache sizes(no evictions) I still cannot be bring the query time to less than 1 sec. is there

Re: Performance improvements

2015-08-24 Thread Yonik Seeley
On Mon, Aug 24, 2015 at 6:33 PM, naga sharathrayapati sharathrayap...@gmail.com wrote: In order to improve the query time of nested faceting query (json facet api), have used 'docValues' in the schema,optimized index and increased cache sizes(no evictions) I still cannot be bring the query

Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-24 Thread Jamie Johnson
as mentioned in a previous email I have a need to provide security controls at the term level. I know that Lucene/Solr doesn't support this so I had baked something onto a 4.x baseline that was sufficient for my use cases. I am now looking to move that implementation to 5.x and am running into an

Re: Performance improvements

2015-08-24 Thread naga sharathrayapati
yes, re-indexed after changing the schema

Re: Using copyField with dynamicField

2015-08-24 Thread Alexandre Rafalovitch
It should work (at first glance). copyField does support wildcards. Do you have a field called text? Also, your field name and field type text have the same name. Not sure it is the best idea. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

Re: Solr performance is slow with just 1GB of data indexed

2015-08-24 Thread Zheng Lin Edwin Yeo
Thank you Upayavira for your reply. Would like to confirm, when I set rows=100, does it mean that it only build the cluster based on the first 100 records that are returned by the search, and if I have 1000 records that matches the search, all the remaining 900 records will not be considered for

Re: Performance gain with setting !cache=false in the query for complex queries

2015-08-24 Thread Erick Erickson
Well, It Depends (tm). I've certainly seen response times on that order, it all revolves around the complexity of the queries, how much faceting you're doing, all that kind of thing. If always specifying cache=false works for you, go for it. The only caution I would add is that randomly

Spellcheck / Suggestions : Append custom dictionary to SOLR default index

2015-08-24 Thread Max Chadwick
Is there a way to append a set of words the the out-of-box solr index when using the spellcheck / suggestions feature?

[ANNOUNCE] Apache Solr 5.2.0 released

2015-08-24 Thread Noble Paul
Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search.

Re: [ANNOUNCE] Apache Solr 5.2.0 released

2015-08-24 Thread Noble Paul
sorry , screwed up the title On Tue, Aug 25, 2015 at 8:30 AM, Noble Paul noble.p...@gmail.com wrote: Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search,

Re: Solr relevancy score order

2015-08-24 Thread Erick Erickson
Getting the most recent doc first in the case of a tie will _not_ just happen. I don't think you really get the nuance here... You index doc1, and doc2 later. Let's claim that doc1 gets internal Lucene doc ID of 1 and doc2 gets an internal doc ID of 2. So far you're golden. Let's further claim

[ANNOUNCE] Apache Solr 5.3.0 released

2015-08-24 Thread Noble Paul
Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search.

Re: User Authentication

2015-08-24 Thread Noble Paul
no. Most of it is in Solr 5.3 On Tue, Aug 25, 2015 at 12:48 AM, Steven White swhite4...@gmail.com wrote: Hi Noble, Is everything in the link you provided applicable to Solr 5.2.1? Thanks Steve On Mon, Aug 24, 2015 at 2:20 PM, Noble Paul noble.p...@gmail.com wrote: did you manage to

Re: Using copyField with dynamicField

2015-08-24 Thread Erick Erickson
What is reported in the Solr log? That's usually much more informative. Best, Erick On Mon, Aug 24, 2015 at 5:26 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: It should work (at first glance). copyField does support wildcards. Do you have a field called text? Also, your field name and

Re: Lucene/Solr 5.0 and custom FieldCahe implementation

2015-08-24 Thread Mikhail Khludnev
Hello Jamie, I don't understand how it could choose DocValuesFacets (it occurs on docValues=true) field, but then switches to UninvertingReader/FieldCache which means docValues=false. If you can provide more details it would be great. Beside of that, I suppose you can only implement and inject