: The guys was using delta import anyway, so maybe the problem is
: different and not related to the clean.

that's not what the logs say.

Here's what i see...

Log begins with server startup @ "Jun 10, 2015 11:14:56 AM"

The DeletionPolicy for the "shopclue_prod" core is initialized at "Jun 
10, 2015 11:15:04 AM" and we see a few interesting things here we note 
for the future as we keep reading...

1) There is currently "commits:num=1" commits on disk
2) the current index dir in use is "index.20150311161021822"
3) the current segment & generation are "segFN=segments_1a,generation=46"

Immediately after this, we see some searcher warming using a searcher with 
this same segments file, and then this searcher is registered ("Jun 10, 
2015 11:15:05 AM") and the core is registered.

Next we see some replication polling, and we see what look like some 
simple monitoring requests for "q=*" which return hits=85898 being 
repeated over and over.

At "Jun 10, 2015 11:16:30 AM" we see some requests for /dataimport that 
look like they are coming from the UI. and then at "Jun 10, 2015 11:17:01 
AM" we see a request for a full import started.

We have no idea what the data import configuration file looks like, so we 
have no idea if clean=false is being used or not.  it's certianly not 
specified in the URL.

We see some more monitoring URLs returning hits=85898 and some more 
/repliation status calls, and then @ "Jun 10, 2015 11:18:02 AM" we see the 
first commit executed since hte server started up.  

there's no indication that this commit came from an external request (eg 
"/update") so probably was made by some internal request.  One 
possiblility is that it came from DIH finishing -- but i doubt it, i'm 
fairly sure that would have involved more logging then this.  A more 
probably scenerio is that it came from an autoCommit setting -- the fact 
that it is almost exactly 60 seconds after DIH started -- and almost 
exactly 60 seconds after DIH may have done a deleteAll query due to 
clean=true -- makes it seem very likely that this was a 1 minute 
autoCommit)

(but since we don't have either hte data import config, or the 
solrconfig.xml, we have no way of knowing -- it's all just guess work.)

Very importantly, note that this commit is not opening a new searcher...

Jun 10, 2015 11:18:02 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start 
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

Here are some other interesting things to note from the logging 
that comes from the DeletionPolicy when this commit happens...

1) it now notes that there are commits:num=2 on disk
2) the current index dir hasn't changed (index.20150311161021822) so 
some weird replication command didn't swap the world out from under us
3) the newest segment/generation are "segFN=segments_1b,generation=47"
4) the newest commit has no other files in it besides the segments file.

this means, with out a doubt, there are no documents in this commits view 
of the index.  they have all been deleted by something.


At this point the *old* searcher (for commit generation 46) is still in 
use however -- nothing has done an openSearcher=true.

we see more /dataimport status requests, and other requests that appear to 
come from the Solr UI, and more monitoring queries that still return 
hits=85898 because the same searcher is in use.

At "Jun 10, 2015 11:27:04 AM" we see another commit happen -- again, no 
indication that this came from an outside /update request, so it might be 
from DIH, or it might be from an autoCommit setting.  the fact that it is 
nearly exactly 10 minutes after DIH started (and probably did a clean=true 
deleteAll query) makes it seem extremely likely this is an autoSoftCommit 
setting kicking in.

Very importantly, note that this softCommit *does* open a new searcher...

Jun 10, 2015 11:27:04 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}


In less then a second, this new searcher is warmed up and the next time we 
see a q=* monitoring query get logged, it returns hits=0.

Note that at no point in the logs, after the DataImporter is started, do 
we see it log anything other then that it has initiated the request to 
MySQL -- we do see some logs starting ~ "Jun 10, 2015 11:41:19 AM" 
indicating that someone was using the Web UI to look at the dataimport 
handler's status report.  it would be really nice to know what that person 
saw at that point -- because my guess is DIH was still running and was 
staled waiting for MySql, and hadn't even started adding docs to Solr (if 
it had, i'm certian there would have been some log of it).

So instead, the combination of a (probable) DIH clean=true option and a 
(near certainty) autoCommit=60sec and autoSoftCommit=10min ment that a new 
commit was created after the clean, and that commit was opened as a new 
searcher before the DIH ever finished.

Or at least ... that's my best guess based solely on the Solr logs, w/o 
any concrete info on what the configs look like, or what the User observed 
at the time when looking at the DIH status.



-Hoss
http://www.lucidworks.com/

Reply via email to