Re: Cache for percentiles facets

2015-08-16 Thread Håvard Wahl Kongsgård
Hi, just a general question as I was unable to find any old posts relating to stats/percentile/facets performance/cache settings. I have been using Solr since version 4.0 , now using the latest v. 5.2.1. What I have done: - Increase heap memory to 30gb - Experimented with the cache settings -

Re: Index very large number of documents from large number of clients

2015-08-16 Thread Toke Eskildsen
Troy Edwards tedwards415...@gmail.com wrote: 1) There are about 6000 clients 2) The number of documents from each client are about 50 (average document size is about 400 bytes) So roughly 3 billion documents / 1TB index size. So at least 2 shards, due to the 2 billion limit in Lucene. If

Re: Big SolrCloud cluster with a lot of collections

2015-08-16 Thread yura last
Thanks for your answersCurrently I have one machine (6 cores, 148 GB RAM, 2.5 TB HDD) and I index around 60 million documents for a day - the index size is around 26GB.I do have customer-ID today and I use it for the queries. I don't split the customers but I get bad performance. If I will make

Re: phonetic filter factory question

2015-08-16 Thread Jamie Johnson
Thanks, i didn't know you could do this, I'll check this out. On Aug 15, 2015 12:54 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: From the teaching to fish category of advice (since I don't know the actual answer). Did you try Analysis screen in the Admin UI? If you check Verbose

Re: Big SolrCloud cluster with a lot of collections

2015-08-16 Thread yura last
I expect that the amount of concurrent customers will be low.Today I have 1 machine so I don't have the capacity for all the data. Because of that I am thinking on a new cluster solution.Today is 1 billion each day for 90 days = 90 billion (around 45TB data). I should prefer a lot of machines

Re: Big SolrCloud cluster with a lot of collections

2015-08-16 Thread Toke Eskildsen
yura last y_ura_2...@yahoo.com.INVALID wrote: I expect that the amount of concurrent customers will be low. Today I have 1 machine so I don't have the capacity for all the data. You aim for 90 billion documents in the first go and want to prepare for 10 times that. Your current test setup is

Re: Big SolrCloud cluster with a lot of collections

2015-08-16 Thread Toke Eskildsen
yura last y_ura_2...@yahoo.com.INVALID wrote: I have one machine (6 cores, 148 GB RAM, 2.5 TB HDD) and I index around 60 million documents for a day - the index size is around 26GB. So 1 billion documents would be approximately 500GB. ...and 10 billion/day in 90 days would be 450TB. I do

Re: Admin Login

2015-08-16 Thread Scott Derrick
Erik, After Walters reply I started thinking along the lines you mentioned and realized the folly of doing that! Scott On 8/15/2015 9:57 PM, Erick Erickson wrote: Scott: You better not even let them access Solr directly.

Re: joins

2015-08-16 Thread Nagasharath
I exactly have the same requirement On 13-Aug-2015, at 2:12 pm, Kiran Sai Veerubhotla sai.sq...@gmail.com wrote: does solr support joins? we have a use case where two collections have to be joined and the join has to be on the faceted results of the two collections. is this possible?

Re: Query term matches

2015-08-16 Thread Toke Eskildsen
Scott Derrick sc...@tnstaafl.net wrote: Is there a way to get the list of terms that matched in a query response? Add debug=query to your request: https://wiki.apache.org/solr/CommonQueryParameters#debug You might also want to try http://splainer.io/ - Toke Eskildsen

Re: Query term matches

2015-08-16 Thread Scott Derrick
with a query like q=mar* I tried the debugQuery=true but it just said rawquerystring: mar*, querystring: mar*, parsedquery: _text_:mar*, parsedquery_toString: _text_:mar*, I already know that! one document match's Mary another matches Mary and martyr I will look at splainer.io Scott

Query term matches

2015-08-16 Thread Scott Derrick
Is there a way to get the list of terms that matched in a query response? I realize the q parameter is returned, but I'm looking for just the list of terms and not the operators. Scott -- To those leaning on the sustaining infinite, to-day is big with blessings. Mary Baker Eddy

Solr Cloud Security Question

2015-08-16 Thread Tarala, Magesh
I have a solr cloud with 3 nodes. I've added password protection following the steps here: http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password Now only one node is able to load the collections. The others are getting 401 Unauthorized error when loading the

No. of records mismatch

2015-08-16 Thread Pattabiraman, Meenakshisundaram
I did a dataimport with 'clean' set to false. The DIH status upon completion was: str name=statusidle/str str name=importResponse/ lst name=statusMessages str name=Total Requests made to DataSource1/str str name=Total Rows Fetched6843427/str str name=Total Documents Processed6843427/str str

RE: No. of records mismatch

2015-08-16 Thread Pattabiraman, Meenakshisundaram
You almost certainly have a non-unique ID field. Yes it is not absolutely unique but do not think it is at this 1 to 6 ratio. Try it with a clean index, and then review the number of deleted documents (updates are a delete then insert action) I tried on a new instance - same effect. I do not

xsl error

2015-08-16 Thread Scott Derrick
I'm using a dataimporthandler requestHandler name=/update/html startup=lazy class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=confightml-config.xml/str /lst /requestHandler I'm using the xsl attribute on all the entities, but this one is

Re: Solr Cloud Security Question

2015-08-16 Thread Shawn Heisey
On 8/16/2015 12:09 PM, Tarala, Magesh wrote: I have a solr cloud with 3 nodes. I've added password protection following the steps here: http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password Now only one node is able to load the collections. The others are

Re: Query term matches

2015-08-16 Thread Erick Erickson
This isn't going to be easy. Why do you need to know? Especially with wildcards this'll be challenging. For the specific docs that are returned, highlighting will tell you _some_ of them. Why only some? Because usually only the best N snippets are returned, say 3 (it's configurable). And it's

Re: joins

2015-08-16 Thread naga sharathrayapati
Is there any chance of this feature(merge the results to create a composite document) coming out in the next release 5.3 ? On Sun, Aug 16, 2015 at 2:08 PM, Upayavira u...@odoko.co.uk wrote: You can do what are called pseudo joins, which are eqivalent to a nested query in SQL. You get back data

RE: Solr Cloud Security Question

2015-08-16 Thread Tarala, Magesh
Thanks Shawn! We are on 4.10.4. Will consider 5.x upgrade shortly. -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org] Sent: Sunday, August 16, 2015 9:05 PM To: solr-user@lucene.apache.org Subject: Re: Solr Cloud Security Question On 8/16/2015 12:09 PM, Tarala, Magesh

Re: joins

2015-08-16 Thread Erick Erickson
bq: Is there any chance of this feature(merge the results to create a composite document) coming out in the next release 5.3 In a word no. And there aren't really any long-range plans either that I'm aware of. You could also explore streaming aggregation, if the need here is more batch-oriented.

Re: Query term matches

2015-08-16 Thread Scott Derrick
splainer doesn't return anything the debug parameter can. On 8/16/2015 11:39 AM, Toke Eskildsen wrote: Scott Derrick sc...@tnstaafl.net wrote: Is there a way to get the list of terms that matched in a query response? Add debug=query to your request:

Re: Query term matches

2015-08-16 Thread Scott Derrick
I'm searching a collection of documents. When I build my results page I provide a link to each document. If the user click the link I display the document with all the matched terms highlighted. I need to supply my highlighter a list of words to hilight in the doc. I thought the

Re: joins

2015-08-16 Thread Upayavira
You can do what are called pseudo joins, which are eqivalent to a nested query in SQL. You get back data from one core, based upon criteria in the other. You cannot (yet) merge the results to create a composite document. Upayavira On Sun, Aug 16, 2015, at 06:02 PM, Nagasharath wrote: I exactly

Re: joins

2015-08-16 Thread naga sharathrayapati
https://issues.apache.org/jira/browse/SOLR-7090 I see this jira open in support of joins which might solve the problem. On Sun, Aug 16, 2015 at 2:51 PM, Erick Erickson erickerick...@gmail.com wrote: bq: Is there any chance of this feature(merge the results to create a composite document)

Re: No. of records mismatch

2015-08-16 Thread Upayavira
You almost certainly have a non-unique ID field. Some documents are overwritten during indexing. Try it with a clean index, and then review the number of deleted documents (updates are a delete then insert action). Deletes are calculated with maxDocs minus numDocs. Upayavira On Sun, Aug 16,