Re: Setting up to index multiple datastores

2017-03-03 Thread Daniel Miller
On 3/2/2017 5:14 PM, Shawn Heisey wrote: On 3/2/2017 2:58 PM, Daniel Miller wrote: I'm asking for some guidance on how I might optimize Solr. I use Solr for work. I use Dovecot for personal domains. I have not used them together. I probably should -- my personal mailbox is many gigabytes and

Re: How to update index after document expired.

2017-03-03 Thread XuQing Tan
On Fri, Mar 3, 2017 at 9:17 AM, Erick Erickson wrote: > you'd have to copy/paste or petition to make > DocExpirationUpdateProcessorFactory not final. > yes, I copied DocExpirationUpdateProcessorFactory with additional reason is, our xml content from external source already contains expires in Da

Solr 6.3.0, possible SYN flooding on port 8983. Sending cookies.

2017-03-03 Thread Yago Riveiro
Hello, I have this log in my dmesg: possible SYN flooding on port 8983. Sending cookies. The Solr instance (6.3.0) has not accepting more http connections. I ran this: _lsof -nPi |grep \:8983 | wc -l_ and the number of connection to port 8983 is about 14K in CLOSE_WAIT ou ESTABLISHED state. An

Re: solr warning - filling logs

2017-03-03 Thread Satya Marivada
There is nothing else running on port that I am trying to use: 15101. 15102 works fine. On Fri, Mar 3, 2017 at 2:25 PM Satya Marivada wrote: > Dave and All, > > The below exception is not happening anymore when I change the startup > port to something else apart from that I had in original start

Re: copyField match, but how?

2017-03-03 Thread nbosecker
You're on the money, Chris. Thank you s much, I didn't even realize "body" wasn't stored. Of course that is the reason!! -- View this message in context: http://lucene.472066.n3.nabble.com/copyField-match-but-how-tp4323327p4323335.html Sent from the Solr - User mailing list archive at Nab

Re: copyField match, but how?

2017-03-03 Thread Chris Hostetter
: In my schema.xml, I have these copyFields: you haven't shown us the field/fieldType definitions for any of those fields, so it's possible "simplex" was included in a field that is indexed=true but not stored-false -- which is why you might be able to search on it, but not see it in the field

Re: copyField match, but how?

2017-03-03 Thread Alexandre Rafalovitch
I think you are not using default field, but rather eDismax field definitions. Still you seem to be matching on alltext anyway. What's the field definition? Did you check the index content with Maple or with Admin Schema field content? Regards, Alex On 3 Mar 2017 5:07 PM, "nbosecker" wrote:

copyField match, but how?

2017-03-03 Thread nbosecker
I've got a confusing situation related to copyFields and search. In my schema.xml, I have these copyFields: and a defaultSearchField to the 'alltext' copyField: alltext In my index, this document with all these mapped fields - nothing to note except that the word "*simplex*" is *NOT

Re: Partial Name matching

2017-03-03 Thread Alexandre Rafalovitch
That's a curse of too much info. Could you extract just the relevant parts (field definitions, search configuration). And also explain *) What you expected to see with a couple of examples *) What you actually see *) Why is that "wrong" for you It is an interesting problem, but it is unusual enoug

Re: solr warning - filling logs

2017-03-03 Thread Satya Marivada
Dave and All, The below exception is not happening anymore when I change the startup port to something else apart from that I had in original startup. The original starup, if I have started without ssl enabled and then startup on the same port with ssl enabled, it is when this warning is happening

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Alexandre Rafalovitch
Commit is index global. So if you have overlapping timelines and commit is issued, it will affect all changes done to that point. So, the aliases may be better for you. You could potentially also reload a cure with changes solrconfig.XML settings, but that's heavy on caches. Regards, Alex On

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Sales
> > You have indicated that you have a way to avoid doing updates during the > full import. Because of this, you do have another option that is likely > much easier for you to implement: Set the "commitWithin" parameter on > each update request. This works almost identically to autoSoftCommit,

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Shawn Heisey
On 3/3/2017 10:17 AM, Sales wrote: > I am not sure how best to handle this. We use the data import handle to > re-sync all our data on a daily basis, takes 1-2 hours depending on system > load. It is set up to commit at the end, so, the old index remains until it’s > done, and, we lose no access

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Sales
> On Mar 3, 2017, at 11:30 AM, Erick Erickson wrote: > > One way to handle this (presuming SolrCloud) is collection aliasing. > You create two collections, c1 and c2. You then have two aliases. when > you start "index" is aliased to c1 and "search" is aliased to c2. Now > do your full import to

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Erick Erickson
One way to handle this (presuming SolrCloud) is collection aliasing. You create two collections, c1 and c2. You then have two aliases. when you start "index" is aliased to c1 and "search" is aliased to c2. Now do your full import to "index" (and, BTW, you'd be well advised to do at least a hard co

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Sales
> > On Mar 3, 2017, at 11:22 AM, Alexandre Rafalovitch wrote: > > On 3 March 2017 at 12:17, Sales > wrote: >> When we enabled those, during the index, the data disappeared since it kept >> soft committing during the import process, > > This part does not quite make sense. Could you expand on

Re: What is the bottleneck for an optimise operation?

2017-03-03 Thread Erick Erickson
Well, historically during the really old days, optimize made a major difference. As Lucene evolved that difference was smaller, and in recent _anecdotal_ reports it's up to 10% improvement in query processing with the usual caveats that there are a ton of variables here, including and especially ho

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Alexandre Rafalovitch
On 3 March 2017 at 12:17, Sales wrote: > When we enabled those, during the index, the data disappeared since it kept > soft committing during the import process, This part does not quite make sense. Could you expand on this "data disappeared" part to understand what the issue is. The main issue

Re: How to update index after document expired.

2017-03-03 Thread Erick Erickson
Right, you'd have to copy/paste or petition to make DocExpirationUpdateProcessorFactory not final. Although it was likely made that for a good reason. It does beg the question however, is this the right thing to do? What you have here is some record of when docs should expire. You had to know that

Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Sales
I am not sure how best to handle this. We use the data import handle to re-sync all our data on a daily basis, takes 1-2 hours depending on system load. It is set up to commit at the end, so, the old index remains until it’s done, and, we lose no access while the import is happening. But, we no

Re: What is the bottleneck for an optimise operation?

2017-03-03 Thread Caruana, Matthew
We index rarely and in bulk as we’re an organisation that deals in enabling access to leaked documents for journalists. The indexes are mostly static for 99% of the year. We only optimise after reindexing due to schema changes or when we have a new leak. Our workflow is to index on a staging se

Re: Using solr-core-4.6.1.jar on solr-5.5.4 server

2017-03-03 Thread Erick Erickson
bq: " do you think there could be compatibility problems" What API calls? Are you using some custom SolrJ from a client? This is _extremely_ risky IMO. Solr does not guarantee you can do rolling upgrades across major versions for instance. And the Solr<->Solr communications are through SolrJ (whi

Re: What is the bottleneck for an optimise operation?

2017-03-03 Thread Erick Erickson
Matthew: What load testing have you done on optimized .vs. unoptimized indexes? Is there enough of a performance gain to be worth the trouble? Toke's indexes are pretty static, and in his situation it's worth the effort. Before spending a lot of cycles on making optimization work/understanding the

Re: Joining across collections with Nested documents

2017-03-03 Thread Walter Underwood
Make two denormalized collections. Just don’t join at query time. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 3, 2017, at 1:01 AM, Preeti Bhat wrote: > > We can't, they are being used for different purposes and we have few cases > where we

Re: What is the bottleneck for an optimise operation?

2017-03-03 Thread Caruana, Matthew
Thank you, you’re right - only one of the four cores is hitting 100%. This is the correct answer. The bottleneck is CPU exacerbated by an absence of parallelisation. > On 3 Mar 2017, at 12:32, Toke Eskildsen wrote: > > On Thu, 2017-03-02 at 15:39 +, Caruana, Matthew wrote: >> Thank you. Th

Re: What is the bottleneck for an optimise operation?

2017-03-03 Thread Toke Eskildsen
On Thu, 2017-03-02 at 15:39 +, Caruana, Matthew wrote: > Thank you. The question remains however, if this is such a hefty > operation then why is it walking to the destination instead of > running, so to speak? We only do optimize on an old Solr 4.10 setup, but for that we have plenty of exper

Re: Solr Query Suggestion

2017-03-03 Thread Emir Arnautovic
Hi Vrinda, You should use field collapsing (https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results) or if you cannot live with its limitations, you can use results grouping (https://cwiki.apache.org/confluence/display/solr/Result+Grouping) HTH, Emir On 03.03.2017 10:5

Using solr-core-4.6.1.jar on solr-5.5.4 server

2017-03-03 Thread skasab2s
Hello, we want to update our Solr Server to *solr-5.5.4*. For the API calls on this server, we only can use *solr-core-4.6.1.jar*, because it is maintained by an external vendor and we cannot upgrade the library ourselves. I tested this constellation shortly and seemed to work fine. Even so, do yo

Solr Query Suggestion

2017-03-03 Thread vrindavda
Hello, I have indexed data of 3 categories say Category-1,Category-2,Category-3. I need suggestions to form query as to get top 3 results from each categories - Category-1(3),Category-2(3),Category-3(3). - Total 9. Is this possible ? Thank you, Vrinda Davda -- View this message in context:

FieldName as case insenstive

2017-03-03 Thread Preeti Bhat
Hi All, I have a field named "CompanyName" in one of my collection. When I try to search CompanyName:xyz or CompanyName:XYZ it gives me results. But when I try companyname:xyz then the result fails. Is there a way to ensure that fieldname in solr is case insensitive as the client is going to pa

RE: Joining across collections with Nested documents

2017-03-03 Thread Preeti Bhat
Thanks Mikhail, I will look into this option. Thanks and Regards, Preeti Bhat -Original Message- From: Mikhail Khludnev [mailto:m...@apache.org] Sent: Friday, March 03, 2017 1:03 PM To: solr-user Subject: Re: Joining across collections with Nested documents Related docs can be retrieved

RE: Joining across collections with Nested documents

2017-03-03 Thread Preeti Bhat
We can't, they are being used for different purposes and we have few cases where we would need data from both. Thanks and Regards, Preeti Bhat -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Friday, March 03, 2017 12:02 PM To: solr-user@lucene.apache.org S

Re: OR condition between !frange and normal query

2017-03-03 Thread Emir Arnautovic
Hi Edwin, _query_ is not field in your index but Solr syntax for subqueries. Not sure if that is the issue that you are referring to but query you sent (example I sent earlier) is not fully valid - has an extra '('. Can you try: q=_query_:"{!frange l=1}ms(startDate_dt,endDate_dt)" OR _query_:

Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize

2017-03-03 Thread Caruana, Matthew
This is the current config: 100 1 10 10 We index in bulk, so after indexing about 4 million documents over a week (OCR takes long) we normally