So I jumped back on this.

I have not been using the optimize option on this new set of tests.
If I run the full index on the leader I seem to get all of the items in the 
database minus 3 that have a missing field.

Indexing completed. Added/Updated: 903,990 documents. Deleted 0 documents. 
(Duration: 25m 11s)
Requests: 1 (0/s), Fetched: 903,993 (598/s), Skipped: 0, Processed: 903,990

Last Modified:2 minutes ago
Num Docs:903990
Max Doc:903990
Heap Memory Usage:2625744
Deleted Docs:0
Version:3249
Segment Count:7
Optimized:
Current:

If I run it on the other node I get:

Indexing completed. Added/Updated: 903,993 documents. Deleted 0 documents. 
(Duration: 27m 08s)
Requests: 1 (0/s), Fetched: 903,993 (555/s), Skipped: 0, Processed: 903,993 
(555/s)

Last Modified:about a minute ago
Num Docs:897791
Max Doc:897791
Heap Memory Usage:2621072
Deleted Docs:0
Version:3285
Segment Count:7
Optimized:
Current:

Any ideas?

If there is any more info that is needed let me know.

AJ


-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, October 31, 2014 1:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Missing Records

Sorry to say this, but I don't think the numDocs/maxDoc numbers are telling you 
anything..... because it looks like you've optimized which purges any data 
associated with deleted docs, including the internal IDs which are the 
numDocs/maxDocs figures. So if there were deletions, we can't see any evidence 
of same.


Siiggggh.


On Fri, Oct 31, 2014 at 9:56 AM, AJ Lemke <aj.le...@securitylabs.com> wrote:
> I have run some more tests so the numbers have changed a bit.
>
> Index Results done on Node 1:
> Indexing completed. Added/Updated: 903,993 documents. Deleted 0 
> documents. (Duration: 31m 47s)
> Requests: 1 (0/s), Fetched: 903,993 (474/s), Skipped: 0, Processed: 
> 903,993
>
> Node 1:
> Last Modified: 44 minutes ago
> Num Docs: 824216
> Max Doc: 824216
> Heap Memory Usage: -1
> Deleted Docs: 0
> Version: 1051
> Segment Count: 1
> Optimized: checked
> Current: checked
>
> Node 2:
> Last Modified: 44 minutes ago
> Num Docs: 824216
> Max Doc: 824216
> Heap Memory Usage: -1
> Deleted Docs: 0
> Version: 1051
> Segment Count: 1
> Optimized: checked
> Current: checked
>
> Search results are the same as the doc numbers above.
>
> Logs only have one instance of an error:
>
> ERROR - 2014-10-31 10:47:12.867; 
> org.apache.solr.update.StreamingSolrServers$1; error
> org.apache.solr.common.SolrException: Bad Request
>
>
>
> request: 
> http://192.168.20.57:7574/solr/inventory_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica2%2F&wt=javabin&version=2
>         at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> Some info that may be of help
> This is on my local vm using jetty with the embedded zookeeper.
> Commands to start cloud:
>
> java -DzkRun -jar start.jar
> java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983 -jar start.jar
>
> sh zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir 
> ~/development/configs/inventory/ -confname config_ inventory sh 
> zkcli.sh -zkhost localhost:9983 -cmd linkconfig -collection inventory 
> -confname config_ inventory
>
> curl 
> "http://localhost:8983/solr/admin/collections?action=CREATE&name=inventory&numShards=1&replicationFactor=2&maxShardsPerNode=4";
> curl "http://localhost:8983/solr/admin/collections?action=RELOAD&name= 
> inventory "
>
> AJ
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, October 31, 2014 9:49 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Missing Records
>
> OK, that is puzzling.
>
> bq: If there were duplicates only one of the duplicates should be removed and 
> I still should be able to search for the ID and find one correct?
>
> Correct.
>
> Your bad request error is puzzling, you may be on to something there.
> What it looks like is that somehow some of the documents you're 
> sending to Solr aren't getting indexed, either being dropped through 
> the network or perhaps have invalid fields, field formats (i.e. a date 
> in the wrong format,
> whatever) or some such. When you complete the run, what are the maxDoc and 
> numDocs numbers on one of the nodes?
>
> What else do you see in the logs? They're pretty big after that many adds, 
> but maybe you can grep for ERROR and see something interesting like stack 
> traces. Or even "org.apache.solr". This latter will give you some false hits, 
> but at least it's better than paging through a huge log file....
>
> Personally, in this kind of situation I sometimes use SolrJ to do my indexing 
> rather than DIH, I find it easier to debug so that's another possibility. In 
> the worst case with SolrJ, you can send the docs one at a time....
>
> Best,
> Erick
>
> On Fri, Oct 31, 2014 at 7:37 AM, AJ Lemke <aj.le...@securitylabs.com> wrote:
>> Hi Erick:
>>
>> All of the records are coming out of an auto numbered field so the ID's will 
>> all be unique.
>>
>> Here is the the test I ran this morning:
>>
>> Indexing completed. Added/Updated: 903,993 documents. Deleted 0 
>> documents. (Duration: 28m)
>> Requests: 1 (0/s), Fetched: 903,993 (538/s), Skipped: 0, Processed:
>> 903,993 (538/s)
>> Started: 33 minutes ago
>>
>> Last Modified:4 minutes ago
>> Num Docs:903829
>> Max Doc:903829
>> Heap Memory Usage:-1
>> Deleted Docs:0
>> Version:1517
>> Segment Count:16
>> Optimized: checked
>> Current: checked
>>
>> If there were duplicates only one of the duplicates should be removed and I 
>> still should be able to search for the ID and find one correct?
>> As it is right now I am missing records that should be in the collection.
>>
>> I also noticed this:
>>
>> org.apache.solr.common.SolrException: Bad Request
>>
>>
>>
>> request: 
>> http://192.168.20.57:7574/solr/inventory_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica2%2F&wt=javabin&version=2
>>         at 
>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>         at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>> AJ
>>
>>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Thursday, October 30, 2014 7:08 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Missing Records
>>
>> First question: Is there any possibility that some of the docs have 
>> duplicate IDs (<uniqueKey>s)? If so, then some of the docs will be replaced, 
>> which will lower your returns.
>> One way to figuring this out is to go to the admin screen and if numDocs < 
>> maxDoc, then documents have been replaced.
>>
>> Also, if numDocs is smaller than 903,993 then you probably have some docs 
>> being replaced. One warning, however. Even if docs are deleted, then this 
>> could still be the case because when segments are merged the deleted docs 
>> are purged.
>>
>> Best,
>> Erick
>>
>> On Thu, Oct 30, 2014 at 3:12 PM, S.L <simpleliving...@gmail.com> wrote:
>>> I am curious , how many shards do you have and whats the replication 
>>> factor you are using ?
>>>
>>> On Thu, Oct 30, 2014 at 5:27 PM, AJ Lemke <aj.le...@securitylabs.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> We have a SOLR cloud instance that has been humming along nicely 
>>>> for months.
>>>> Last week we started experiencing missing records.
>>>>
>>>> Admin DIH Example:
>>>> Fetched: 903,993 (736/s), Skipped: 0, Processed: 903,993 (736/s) A
>>>> *:* search claims that there are only 903,902 this is the first 
>>>> full index.
>>>> Subsequent full indexes give the following counts for the *:* 
>>>> search
>>>> 903,805
>>>> 903,665
>>>> 826,357
>>>>
>>>> All the while the admin returns: Fetched: 903,993 (x/s), Skipped: 
>>>> 0,
>>>> Processed: 903,993 (x/s) every time. ---records per second is 
>>>> variable
>>>>
>>>>
>>>> I found an item that should be in the index but is not found in a search.
>>>>
>>>> Here are the referenced lines of the log file.
>>>>
>>>> DEBUG - 2014-10-30 15:10:51.160;
>>>> org.apache.solr.update.processor.LogUpdateProcessor; PRE_UPDATE 
>>>> add{,id=750041421} 
>>>> {{params(debug=false&optimize=true&indent=true&commit=true&clean=tr
>>>> u
>>>> e
>>>> &wt=json&command=full-import&entity=ads&verbose=false),defaults(con
>>>> f
>>>> i
>>>> g=data-config.xml)}}
>>>> DEBUG - 2014-10-30 15:10:51.160;
>>>> org.apache.solr.update.SolrCmdDistributor; sending update to 
>>>> http://192.168.20.57:7574/solr/inventory_shard1_replica2/ retry:0 
>>>> add{,id=750041421} 
>>>> params:update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.
>>>> 5 7%3A8983%2Fsolr%2Finventory_shard1_replica1%2F
>>>>
>>>> --- there are 746 lines of log between entries ---
>>>>
>>>> DEBUG - 2014-10-30 15:10:51.340; org.apache.http.impl.conn.Wire;  
>>>> >> 
>>>> "[0x2][0xc3][0xe0]&params[0xa2][0xe0].update.distrib(TOLEADER[0xe0]
>>>> ,
>>>> d
>>>> istrib.from?[0x17]
>>>> http://192.168.20.57:8983/solr/inventory_shard1_replica1/[0xe0]&del
>>>> B
>>>> y
>>>> Q[0x0][0xe0]'docsMap[0xe][0x13][0x10]8[0x8]?[0x80][0x0][0x0][0xe0]#
>>>> Z
>>>> i
>>>> p%51106[0xe0]-IsReelCentric[0x2][0xe0](HasPrice[0x1][0xe0]*Make_Low
>>>> e r 'ski-doo[0xe0])StateName$Iowa[0xe0]-OriginalModel/Summit
>>>> Highmark[0xe0]/VerticalSiteIDs!2[0xe0]-ClassBinaryIDp@[0xe0]#lat(42.
>>>> 4
>>>> 8929[0xe0]-SubClassFacet01704|Snowmobiles[0xe0](FuelType%Other[0xe0
>>>> ]
>>>> 2
>>>> DivisionName_Lower,recreational[0xe0]&latlon042.4893,-96.3693[0xe0]
>>>> * P hotoCount!8[0xe0](HasVideo[0x2][0xe0]"ID)750041421[0xe0]&Engine
>>>> [0xe0]*ClassFacet.12|Snowmobiles[0xe0]$Make'Ski-Doo[0xe0]$City*Siou
>>>> x 
>>>> City[0xe0]#lng*-96.369302[0xe0]-Certification!N[0xe0]0EmotionalTagline0162"
>>>> Long Track
>>>> [0xe0]*IsEnhanced[0x1][0xe0]*SubClassID$1704[0xe0](NetPrice$4500[0x
>>>> e
>>>> 0
>>>> ]1IsInternetSpecial[0x2][0xe0](HasPhoto[0x1][0xe0]/DealerSortOrder!
>>>> 2 [ 0xe0]+Description?VThis Bad boy will pull you through the 
>>>> deepest snow!With the 162" track and 1000cc of power you can fly up 
>>>> any hill!![0xe0],DealerRadius+8046.720000[0xe0],Transmission
>>>> [0xe0]*ModelFacet7Ski-Doo|Summit
>>>> Highmark[0xe0]/DealerNameFacet9Certified
>>>> Auto,
>>>> Inc.|4150[0xe0])StateAbbr"IA[0xe0])ClassName+Snowmobiles[0xe0](Deal
>>>> e r ID$4150[0xe0]&AdCode$DX1Q[0xe0]*DealerName4Certified
>>>> Auto,
>>>> Inc.[0xe0])Condition$Used[0xe0]/Condition_Lower$used[0xe0]-Exterior
>>>> C
>>>> o
>>>> lor+Blue/Yellow[0xe0],DivisionName,Recreational[0xe0]$Trim(1000
>>>> SDI[0xe0](SourceID!1[0xe0]0HasAdEnhancement!0[0xe0]'ClassID"12[0xe0].
>>>> FuelType_Lower%other[0xe0]$Year$2005[0xe0]+DealerFacet?[0x8]4150|Ce
>>>> r
>>>> t
>>>> ified Auto, Inc.|Sioux
>>>> City|IA[0xe0],SubClassName+Snowmobiles[0xe0]%Model/Summit
>>>> Highmark[0xe0])EntryDate42011-11-17T10:46:00Z[0xe0]+StockNumber&000
>>>> 1
>>>> 0 5[0xe0]+PriceRebate!0[0xe0]+Model_Lower/summit
>>>> highmark[\n]"
>>>> What could be the issue and how does one fix this issue?
>>>>
>>>> Thanks so much and if more information is needed I have preserved 
>>>> the log files.
>>>>
>>>> AJ
>>>>

Reply via email to