RE: optimize status

Reitzel, Charles Mon, 29 Jun 2015 13:50:26 -0700

Hi Garth,

Yes, I'm straying from OP's question (I think Steve is all set).   But his 
question, quite naturally, comes up often and a similar discussion ensues each 
time.

I take your point about shards and segments being different things.  I 
understand that the hash ranges per segment are not kept in ZK.   I guess I 
wish they were.

In this regard, I liked Mongodb, uses a 2-level sharding scheme.   Each shard 
manages a list of  "chunks", each has its own hash range which is kept in the 
cluster state.   If data needs to be balanced across nodes, it works at the 
chunk level.  No record/doc level I/O is necessary.   Much more targeted and 
only the data that needs to move is touched.  Solr does most things better than 
Mongo, imo.  But this is one area where the Mongo got it right.

As for your example, what benefit does an application gain by reducing 10 
segments, say, down to 1?   Even if the index never changes?   The gain _might_ 
be measurable, but it will be small compared to performance gains that can be 
had by maintaining a good data balance across nodes.

Your example is based on implicit routing.  So dynamic management of shards is 
less applicable.  I just hope you get similar volumes of data every year.   
Otherwise, some years will perform better than others due to unbalanced data 
distribution!

best,
Charlie

-----Original Message-----
From: Garth Grimm [mailto:gdgr...@yahoo.com.INVALID] 
Sent: Monday, June 29, 2015 1:15 PM
To: solr-user@lucene.apache.org
Subject: RE: optimize status

" Is there really a good reason to consolidate down to a single segment?"

Archiving (as one example).  Come July 1, the collection for log 
entries/transactions in June will never be changed, so optimizing is actually a 
good thing to do.

Kind of getting away from OP's question on this, but I don't think the ability 
to move data between shards in SolrCloud (such as shard splitting) has much to 
do with the Lucene segments under the hood.  I'm just guessing, but I'd think 
the main issue with shard splitting would be to ensure that document route 
ranges are handled properly, and I don't think the value used for routing has 
anything to do with what segment they happen to be stored into.

-----Original Message-----
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
Sent: Monday, June 29, 2015 11:38 AM
To: solr-user@lucene.apache.org
Subject: RE: optimize status

Is there really a good reason to consolidate down to a single segment?

Any incremental query performance benefit is tiny compared to the loss of
managability.   

I.e. shouldn't segments _always_ be kept small enough to facilitate
re-balancing data across shards?   Even in non-cloud instances this is true.
When a collection grows, you may want shard/split an existing index by
adding a node and moving some segments around.    Isn't this the direction
Solr is going?   With many, smaller segments, this is feasible.  With "one
big segment", the collection must always be reindexed.

Thus, "optimize" would mean, "get rid of all deleted records" and would, in
fact, optimize queries by eliminating wasted I/O.   Perhaps worth it for
slowly changing indexes.   Seems like the Tiered merge policy is 90% there
...    Or am I all wet (again)?

-----Original Message-----
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Monday, June 29, 2015 10:39 AM
To: solr-user@lucene.apache.org
Subject: Re: optimize status

"Optimize" is a manual full merge.

Solr automatically merges segments as needed. This also expunges deleted 
documents.

We really need to rename "optimize" to "force merge". Is there a Jira for that?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Jun 29, 2015, at 5:15 AM, Steven White <swhite4...@gmail.com> wrote:

> Hi Upayavira,
> 
> This is news to me that we should not optimize and index.
> 
> What about disk space saving, isn't optimization to reclaim disk space 
> or is Solr somehow does that?  Where can I read more about this?
> 
> I'm on Solr 5.1.0 (may switch to 5.2.1)
> 
> Thanks
> 
> Steve
> 
> On Mon, Jun 29, 2015 at 4:16 AM, Upayavira <u...@odoko.co.uk> wrote:
> 
>> I'm afraid I don't understand. You're saying that optimising is 
>> causing performance issues?
>> 
>> Simple solution: DO NOT OPTIMIZE!
>> 
>> Optimisation is very badly named. What it does is squashes all 
>> segments in your index into one segment, removing all deleted 
>> documents. It is good to get rid of deletes - in that sense the index 
>> is
"optimized".
>> However, future merges become very expensive. The best way to handle 
>> this topic is to leave it to Lucene/Solr to do it for you. Pretend 
>> the "optimize" option never existed.
>> 
>> This is, of course, assuming you are using something like Solr 3.5+.
>> 
>> Upayavira
>> 
>> On Mon, Jun 29, 2015, at 08:08 AM, Summer Shire wrote:
>>> 
>>> Have to cause of performance issues.
>>> Just want to know if there is a way to tap into the status.
>>> 
>>>> On Jun 28, 2015, at 11:37 PM, Upayavira <u...@odoko.co.uk> wrote:
>>>> 
>>>> Bigger question, why are you optimizing? Since 3.6 or so, it 
>>>> generally hasn't been requires, even, is a bad thing.
>>>> 
>>>> Upayavira
>>>> 
>>>>> On Sun, Jun 28, 2015, at 09:37 PM, Summer Shire wrote:
>>>>> Hi All,
>>>>> 
>>>>> I have two indexers (Independent processes ) writing to a common 
>>>>> solr core.
>>>>> If One indexer process issued an optimize on the core I want the 
>>>>> second indexer to wait adding docs until the optimize has 
>>>>> finished.
>>>>> 
>>>>> Are there ways I can do this programmatically?
>>>>> pinging the core when the optimize is happening is returning OK
>> because
>>>>> technically
>>>>> solr allows you to update when an optimize is happening.
>>>>> 
>>>>> any suggestions ?
>>>>> 
>>>>> thanks,
>>>>> Summer
>> 

*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*************************************************************************

*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*************************************************************************

RE: optimize status

Reply via email to