Re: Deleted Docs increasing in Solr 6.1.0

2019-11-04 Thread Erick Erickson
First of all I wouldn’t worry about it unless you have a _significant_ number of deleted docs. The default TMP as of around Solr 7.5 should accumulate up to around 33% deleted docs. Prior to 7.5, the number of deleted docs could hover around 50% depending on the access pattern. expungeDeletes

Deleted Docs increasing in Solr 6.1.0

2019-11-04 Thread vishal patel
We have 2 shards and 2 replicas in a testing environment.Deleted Docs are 18749 for one collection[documents].I have attached a screenshot of solr admin panel. (1) Would there any impact on disk size if deleted docs will increase? (2) We try to remove deleted doc by executing command : curl

RE: Very high number of deleted docs, part 2

2018-01-11 Thread Markus Jelsma
ick Erickson <erickerick...@gmail.com> > Sent: Wednesday 10th January 2018 22:41 > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: Very high number of deleted docs, part 2 > > There's some background here: > https://lucidworks.com/2017/10/13/segment-merging-d

Re: Very high number of deleted docs, part 2

2018-01-10 Thread Erick Erickson
geDeletes did not do the job in testing Surprising. What actually happened? Do note that expungeDeletes does not promise to remove all deleted docs, it merges segments with < (some percentage) deleted documents. Best, Erick On Wed, Jan 10, 2018 at 9:45 AM, Markus Jelsma <markus.jel...@op

RE: Very high number of deleted docs, part 2

2018-01-10 Thread Markus Jelsma
y 2018 17:56 > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: Very high number of deleted docs, part 2 > > I'm not 100% sure that playing with maxSegments will work. > > what will work is to re-index everything. You can re-index into the > existing collection,

Re: Very high number of deleted docs, part 2

2018-01-05 Thread Erick Erickson
w about optimizing it again, with maxSegments set to ten, it should > recover right? > > -Original message- > > From:Shawn Heisey <apa...@elyograg.org> > > Sent: Friday 5th January 2018 14:34 > > To: solr-user@lucene.apache.org > > Subject: Re: Very hig

RE: Very high number of deleted docs, part 2

2018-01-05 Thread Markus Jelsma
: Friday 5th January 2018 14:34 > To: solr-user@lucene.apache.org > Subject: Re: Very high number of deleted docs, part 2 > > On 1/5/2018 5:33 AM, Markus Jelsma wrote: > > Another collection, now on 7.1, also shows this problem and has default TMP > > settings. Thi

Re: Very high number of deleted docs, part 2

2018-01-05 Thread Shawn Heisey
, then there will never be a segment larger than 5GB, and the deleted document percentage would be less likely to get out of control.  The optimize operation ignores the maximum segment size and reduces the index to a single large segment with zero deleted docs. TMP's behavior with really big segments

Very high number of deleted docs, part 2

2018-01-05 Thread Markus Jelsma
[1] http://lucene.472066.n3.nabble.com/Very-high-number-of-deleted-docs-td4357327.html

Re: max docs, deleted docs optimization

2017-11-01 Thread kshitij tyagi
notice or not is an open question. In an > index with only 10 lakh docs, it's unlikely even having 50% deleted > documents is going to make much of a difference. > > 3> Yes, the deleted docs are in segment until it's merged away. Lucene > is very efficient (according to Mike McC

Re: max docs, deleted docs optimization

2017-10-31 Thread Erick Erickson
e deleted docs are in segment until it's merged away. Lucene is very efficient (according to Mike McCandless) at skipping deleted docs. 4> It rewrites all segments, purging deleted documents. However, it has some pitfalls, see: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-

max docs, deleted docs optimization

2017-10-31 Thread kshitij tyagi
Hi, I am using atomic update to update one of the fields, I want to know : 1. if total docs in core are 10 lakh and I partially update 2 lakhs docs then what will be the number of deleted docs? 2. Does higher number of deleted docs have affect on query time? means does query time increases

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
Well, that made a difference! Now we're back at 64 MB per replica. Thanks, Markus -Original message- > From:Erick Erickson <erickerick...@gmail.com> > Sent: Wednesday 4th October 2017 16:19 > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: Very hig

Re: Very high number of deleted docs

2017-10-04 Thread Erick Erickson
rgeMerge after the periodic update cycle, but > i preferred Lucene to do it for me. > > Thanks, > Markus > > -Original message- >> From:Erick Erickson <erickerick...@gmail.com> >> Sent: Wednesday 4th October 2017 14:56 >> To: solr-user <solr-

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
update cycle, but i preferred Lucene to do it for me. Thanks, Markus -Original message- > From:Erick Erickson <erickerick...@gmail.com> > Sent: Wednesday 4th October 2017 14:56 > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: Very high number of deleted doc

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
Ah thanks for that! -Original message- > From:Emir Arnautović <emir.arnauto...@sematext.com> > Sent: Wednesday 4th October 2017 15:03 > To: solr-user@lucene.apache.org > Subject: Re: Very high number of deleted docs > > Hi Markus, > It is passed but not expl

Re: Very high number of deleted docs

2017-10-04 Thread Erick Erickson
is correct, you have to either periodically optimize/forceMerge or expungeDeletes regularly. At that point, though, you might as well optimize/forceMerge. expungeDeletes would only save you re-writing segments with < 20% deleted docs (at least I think that's the cutoff). Or reindex from scratch and ne

Re: Very high number of deleted docs

2017-10-04 Thread Emir Arnautović
onfig. > > Thanks, > Markus > > -Original message- >> From:Amrit Sarkar <sarkaramr...@gmail.com> >> Sent: Wednesday 4th October 2017 14:42 >> To: solr-user@lucene.apache.org >> Subject: Re: Very high number of deleted docs >> >> Hi Markus, >&g

Re: Very high number of deleted docs

2017-10-04 Thread Erick Erickson
Did you _ever_ do a forceMerge/optimize or expungeDeletes? Here's the problem TieredMergePolicy (TMP) has a maximum segment size it will allow, 5G by default. No segment is even considered for merging unless it has < 2.5G (or half whatever the default is) non-deleted docs, the logic be

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
-Original message- > From:Amrit Sarkar <sarkaramr...@gmail.com> > Sent: Wednesday 4th October 2017 14:42 > To: solr-user@lucene.apache.org > Subject: Re: Very high number of deleted docs > > Hi Markus, > > Emir already mentioned tuning *reclaimDeletesWeight w

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
; To: solr-user@lucene.apache.org > Subject: Re: Very high number of deleted docs > > Hi Markus, > You can set reclaimDeletesWeight in merge settings to some higher value than > default (I think it is 2) to favor segments with deleted docs when merging. > > HTH, > Emir > -

Re: Very high number of deleted docs

2017-10-04 Thread Amrit Sarkar
Hi Markus, Emir already mentioned tuning *reclaimDeletesWeight which *affects segments about to merge priority. Optimising index time by time, preferably scheduling weekly / fortnight / ..., at low traffic period to never be in such odd position of 80% deleted docs in total index. Amrit Sarkar

Re: Very high number of deleted docs

2017-10-04 Thread Emir Arnautović
Hi Markus, You can set reclaimDeletesWeight in merge settings to some higher value than default (I think it is 2) to favor segments with deleted docs when merging. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training -

Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
Hello, Using a 6.6.0, i just spotted one of our collections having a core of which over 80 % of the total number of documents were deleted documents. It has configured with no non-default settings. Is this supposed to happen? How can i prevent these kind of numbers? Thanks, Markus

Re: Solr Deleted Docs Issue

2015-03-19 Thread vicky desai
have observed is though merge factor seems to work we always end up with around 6 lakh deleted docs in index daily. On optimizing all this deleted docs are removed. We benefit on memory as well as query speed on optimization. But as I understand its a small time gain and situation repeats itself

Re: Solr Deleted Docs Issue

2015-03-19 Thread Shawn Heisey
On 3/19/2015 12:24 AM, vicky desai wrote: I fail to understand why this deleted docs are not removed from index on merging. Is there a good documentation which explains how exactly is merging done? What can I do to solve this problem other than optimization? Deleted docs *are* removed

Re: Solr Deleted Docs Issue

2015-03-16 Thread Erick Erickson
bq: If this operation is continuously done I would end up with a large set of deleted docs which will affect the performance of the queries I hit on this solr. No, you won't. They'll be merged away as background segments are merged. Here's a great visualization of the process, the third one down

Solr Deleted Docs Issue

2015-03-16 Thread vicky desai
docs merging is done after every 10th update and so the max Segment Count I can have is 10 which is fine. However even when merging happens deleted docs are not cleared and I end up with 100 deleted docs in index. If this operation is continuously done I would end up with a large set of deleted docs

Re: Solr Deleted Docs Issue

2015-03-16 Thread Shawn Heisey
and so the max Segment Count I can have is 10 which is fine. However even when merging happens deleted docs are not cleared and I end up with 100 deleted docs in index. If this operation is continuously done I would end up with a large set of deleted docs which will affect the performance

Caches contain deleted docs (?)

2013-11-27 Thread Roman Chyla
entries for deleted docs, so to filter them out, one has to manually check livedocs. Is this the expected behaviour? I don't understand why the cache would be bothering to load data for deleted docs. This is on SOLR4.0 Thanks! roman

Re: Caches contain deleted docs (?)

2013-11-27 Thread Erick Erickson
, false); the resulting arrays *will* contain entries for deleted docs, so to filter them out, one has to manually check livedocs. Is this the expected behaviour? I don't understand why the cache would be bothering to load data for deleted docs. This is on SOLR4.0 Thanks! roman

Re: Caches contain deleted docs (?)

2013-11-27 Thread Roman Chyla
I understand that changes would be expensive, but shouldn't the cache simply skip the deleted docs? In the same way as the cache for multivalued fields (that accepts livedocs bits). Thanks, roman On Wed, Nov 27, 2013 at 6:26 PM, Erick Erickson erickerick...@gmail.comwrote: Yep, it's

Deleted Docs

2013-07-09 Thread Katie McCorkell
Hello, I am curious about the Deleted Docs: statistic on the solr/#/collection1 Overview page. Does Solr remove docs while indexing? I thought it only did that when Optimizing, however my instance had 726 Deleted Docs, but then after adding some documents that number decreased, eventually to 18

Re: Deleted Docs

2013-07-09 Thread Jack Krupansky
Krupansky -Original Message- From: Katie McCorkell Sent: Tuesday, July 09, 2013 5:38 PM To: solr-user@lucene.apache.org Subject: Deleted Docs Hello, I am curious about the Deleted Docs: statistic on the solr/#/collection1 Overview page. Does Solr remove docs while indexing? I thought

Re: Deleted Docs

2013-07-09 Thread Shawn Heisey
On 7/9/2013 3:38 PM, Katie McCorkell wrote: I am curious about the Deleted Docs: statistic on the solr/#/collection1 Overview page. Does Solr remove docs while indexing? I thought it only did that when Optimizing, however my instance had 726 Deleted Docs, but then after adding some documents

Re: Deleted docs in IndexWriter Cache (NRT related)

2011-07-18 Thread Grijesh
optimize ensures that deleted docs and terms will not be displayed. - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Deleted-docs-in-IndexWriter-Cache-NRT-related-tp3177877p3178670.html Sent from the Solr - User mailing list

Re: Deleted docs in IndexWriter Cache (NRT related)

2011-07-18 Thread Nagendra Nagarajayya
.472066.n3.nabble.com/Deleted-docs-in-IndexWriter-Cache-NRT-related-tp3177877p3178179.html Sent from the Solr - User mailing list archive at Nabble.com.

Deleted docs in IndexWriter Cache (NRT related)

2011-07-17 Thread Nagendra Nagarajayya
Hi! If a document with an unique id is added again, the new document is added by deleting/marking the older doc as deleted. So when a search is made with an IndexReader obtained from the IndexWriter (for NRT) both the docs show up, the older doc and the newer updated doc. To prevent the

Re: Deleted docs in IndexWriter Cache (NRT related)

2011-07-17 Thread pravesh
commit would be the safest way for making sure the deleted content doesn't show up. Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Deleted-docs-in-IndexWriter-Cache-NRT-related-tp3177877p3178179.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Remove the deleted docs from the Solr Index

2010-01-04 Thread Shalin Shekhar Mangar
On Wed, Dec 30, 2009 at 12:10 AM, Mohamed Parvez par...@gmail.com wrote: Ditto. There should have been an DIH command to re-sync the Index with the DB. But there is such a command; it is called full-import. -- Regards, Shalin Shekhar Mangar.

Re: Remove the deleted docs from the Solr Index

2010-01-03 Thread Ravi Gidwani
Lance: At times we dont have the freedom make these Database changes. Currently I am in this situation. Hence the requirement on the DIH. ~Ravi. On Sat, Jan 2, 2010 at 3:44 PM, Lance Norskog goks...@gmail.com wrote: The other option is to have a 'deleted' column in your table, and

Re: Remove the deleted docs from the Solr Index

2010-01-02 Thread Lance Norskog
The other option is to have a 'deleted' column in your table, and have the application 'delete' operation set that field. In the DIH you query this column with 'deletedPkQuery'. Or, you can use triggers to maintain a new table with the IDs of deleted rows. This will allow you to have a batch job

Re: Remove the deleted docs from the Solr Index

2009-12-29 Thread Ravi Gidwani
Hi Shalin: I get your point about not knowing what has been deleted from the database. So this is what even I am looking for: 0) A document (id=100) is currently part of solr index.( 1) Lets say the application deleted a record with id=100 from database. 2) Now I need to execute

Re: Remove the deleted docs from the Solr Index

2009-12-29 Thread Mohamed Parvez
Ditto. There should have been an DIH command to re-sync the Index with the DB. Right now it looks like one way street form DB to Index. On Tue, Dec 29, 2009 at 3:07 AM, Ravi Gidwani ravi.gidw...@gmail.comwrote: Hi Shalin: I get your point about not knowing what has been deleted

Remove the deleted docs from the Solr Index

2009-12-28 Thread Mohamed Parvez
I am using Solr 1.4 and DIH to build the index from a table. I use full import once to create the index and then i keep using delta import to update the index. All woks fine as long a the table gets added with only new rows. if there are some rows in the table that get deleted then the index

Re: Remove the deleted docs from the Solr Index

2009-12-28 Thread Mauricio Scheffer
Here's a couple more options: http://stackoverflow.com/questions/1555610/solr-dih-how-to-handle-deleted-documents/ http://stackoverflow.com/questions/1555610/solr-dih-how-to-handle-deleted-documents/ Cheers, Mauricio On Mon, Dec 28, 2009 at 5:51 PM, Mohamed Parvez par...@gmail.com wrote: I am

Re: Remove the deleted docs from the Solr Index

2009-12-28 Thread Mohamed Parvez
I have looked in the that thread earlier. But there is no option there for a solution from Solr side. I mean the two more options there are 1] Use database triggers instead of DIH to manage updating the index :- This out of question as we cant run 1000 odd triggers every hour to delete. 2] Some

Re: Remove the deleted docs from the Solr Index

2009-12-28 Thread Shalin Shekhar Mangar
On Tue, Dec 29, 2009 at 3:03 AM, Mohamed Parvez par...@gmail.com wrote: I have looked in the that thread earlier. But there is no option there for a solution from Solr side. I mean the two more options there are 1] Use database triggers instead of DIH to manage updating the index :- This