I have run some more tests so the numbers have changed a bit.

Index Results done on Node 1:
Indexing completed. Added/Updated: 903,993 documents. Deleted 0 documents. 
(Duration: 31m 47s)
Requests: 1 (0/s), Fetched: 903,993 (474/s), Skipped: 0, Processed: 903,993

Node 1:
Last Modified: 44 minutes ago
Num Docs: 824216
Max Doc: 824216
Heap Memory Usage: -1
Deleted Docs: 0
Version: 1051
Segment Count: 1
Optimized: checked
Current: checked

Node 2:
Last Modified: 44 minutes ago
Num Docs: 824216
Max Doc: 824216
Heap Memory Usage: -1
Deleted Docs: 0
Version: 1051
Segment Count: 1
Optimized: checked
Current: checked

Search results are the same as the doc numbers above.

Logs only have one instance of an error:

ERROR - 2014-10-31 10:47:12.867; org.apache.solr.update.StreamingSolrServers$1; 
error
org.apache.solr.common.SolrException: Bad Request



request: 
http://192.168.20.57:7574/solr/inventory_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica2%2F&wt=javabin&version=2
        at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Some info that may be of help
This is on my local vm using jetty with the embedded zookeeper.
Commands to start cloud:

java -DzkRun -jar start.jar
java -Djetty.port=7574 -DzkRun -DzkHost=localhost:9983 -jar start.jar

sh zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir 
~/development/configs/inventory/ -confname config_ inventory
sh zkcli.sh -zkhost localhost:9983 -cmd linkconfig -collection inventory 
-confname config_ inventory

curl 
"http://localhost:8983/solr/admin/collections?action=CREATE&name=inventory&numShards=1&replicationFactor=2&maxShardsPerNode=4";
curl "http://localhost:8983/solr/admin/collections?action=RELOAD&name= 
inventory "

AJ


-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, October 31, 2014 9:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Missing Records

OK, that is puzzling.

bq: If there were duplicates only one of the duplicates should be removed and I 
still should be able to search for the ID and find one correct?

Correct.

Your bad request error is puzzling, you may be on to something there.
What it looks like is that somehow some of the documents you're sending to Solr 
aren't getting indexed, either being dropped through the network or perhaps 
have invalid fields, field formats (i.e. a date in the wrong format,
whatever) or some such. When you complete the run, what are the maxDoc and 
numDocs numbers on one of the nodes?

What else do you see in the logs? They're pretty big after that many adds, but 
maybe you can grep for ERROR and see something interesting like stack traces. 
Or even "org.apache.solr". This latter will give you some false hits, but at 
least it's better than paging through a huge log file....

Personally, in this kind of situation I sometimes use SolrJ to do my indexing 
rather than DIH, I find it easier to debug so that's another possibility. In 
the worst case with SolrJ, you can send the docs one at a time....

Best,
Erick

On Fri, Oct 31, 2014 at 7:37 AM, AJ Lemke <aj.le...@securitylabs.com> wrote:
> Hi Erick:
>
> All of the records are coming out of an auto numbered field so the ID's will 
> all be unique.
>
> Here is the the test I ran this morning:
>
> Indexing completed. Added/Updated: 903,993 documents. Deleted 0 
> documents. (Duration: 28m)
> Requests: 1 (0/s), Fetched: 903,993 (538/s), Skipped: 0, Processed: 
> 903,993 (538/s)
> Started: 33 minutes ago
>
> Last Modified:4 minutes ago
> Num Docs:903829
> Max Doc:903829
> Heap Memory Usage:-1
> Deleted Docs:0
> Version:1517
> Segment Count:16
> Optimized: checked
> Current: checked
>
> If there were duplicates only one of the duplicates should be removed and I 
> still should be able to search for the ID and find one correct?
> As it is right now I am missing records that should be in the collection.
>
> I also noticed this:
>
> org.apache.solr.common.SolrException: Bad Request
>
>
>
> request: 
> http://192.168.20.57:7574/solr/inventory_shard1_replica1/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica2%2F&wt=javabin&version=2
>         at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> AJ
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, October 30, 2014 7:08 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Missing Records
>
> First question: Is there any possibility that some of the docs have duplicate 
> IDs (<uniqueKey>s)? If so, then some of the docs will be replaced, which will 
> lower your returns.
> One way to figuring this out is to go to the admin screen and if numDocs < 
> maxDoc, then documents have been replaced.
>
> Also, if numDocs is smaller than 903,993 then you probably have some docs 
> being replaced. One warning, however. Even if docs are deleted, then this 
> could still be the case because when segments are merged the deleted docs are 
> purged.
>
> Best,
> Erick
>
> On Thu, Oct 30, 2014 at 3:12 PM, S.L <simpleliving...@gmail.com> wrote:
>> I am curious , how many shards do you have and whats the replication 
>> factor you are using ?
>>
>> On Thu, Oct 30, 2014 at 5:27 PM, AJ Lemke <aj.le...@securitylabs.com> wrote:
>>
>>> Hi All,
>>>
>>> We have a SOLR cloud instance that has been humming along nicely for 
>>> months.
>>> Last week we started experiencing missing records.
>>>
>>> Admin DIH Example:
>>> Fetched: 903,993 (736/s), Skipped: 0, Processed: 903,993 (736/s) A
>>> *:* search claims that there are only 903,902 this is the first full 
>>> index.
>>> Subsequent full indexes give the following counts for the *:* search
>>> 903,805
>>> 903,665
>>> 826,357
>>>
>>> All the while the admin returns: Fetched: 903,993 (x/s), Skipped: 0,
>>> Processed: 903,993 (x/s) every time. ---records per second is 
>>> variable
>>>
>>>
>>> I found an item that should be in the index but is not found in a search.
>>>
>>> Here are the referenced lines of the log file.
>>>
>>> DEBUG - 2014-10-30 15:10:51.160;
>>> org.apache.solr.update.processor.LogUpdateProcessor; PRE_UPDATE 
>>> add{,id=750041421} 
>>> {{params(debug=false&optimize=true&indent=true&commit=true&clean=tru
>>> e 
>>> &wt=json&command=full-import&entity=ads&verbose=false),defaults(conf
>>> i
>>> g=data-config.xml)}}
>>> DEBUG - 2014-10-30 15:10:51.160;
>>> org.apache.solr.update.SolrCmdDistributor; sending update to 
>>> http://192.168.20.57:7574/solr/inventory_shard1_replica2/ retry:0 
>>> add{,id=750041421}
>>> params:update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.
>>> 5 7%3A8983%2Fsolr%2Finventory_shard1_replica1%2F
>>>
>>> --- there are 746 lines of log between entries ---
>>>
>>> DEBUG - 2014-10-30 15:10:51.340; org.apache.http.impl.conn.Wire;  >> 
>>> "[0x2][0xc3][0xe0]&params[0xa2][0xe0].update.distrib(TOLEADER[0xe0],
>>> d
>>> istrib.from?[0x17]
>>> http://192.168.20.57:8983/solr/inventory_shard1_replica1/[0xe0]&delB
>>> y 
>>> Q[0x0][0xe0]'docsMap[0xe][0x13][0x10]8[0x8]?[0x80][0x0][0x0][0xe0]#Z
>>> i 
>>> p%51106[0xe0]-IsReelCentric[0x2][0xe0](HasPrice[0x1][0xe0]*Make_Lowe
>>> r 'ski-doo[0xe0])StateName$Iowa[0xe0]-OriginalModel/Summit
>>> Highmark[0xe0]/VerticalSiteIDs!2[0xe0]-ClassBinaryIDp@[0xe0]#lat(42.
>>> 4
>>> 8929[0xe0]-SubClassFacet01704|Snowmobiles[0xe0](FuelType%Other[0xe0]
>>> 2 
>>> DivisionName_Lower,recreational[0xe0]&latlon042.4893,-96.3693[0xe0]*
>>> P hotoCount!8[0xe0](HasVideo[0x2][0xe0]"ID)750041421[0xe0]&Engine
>>> [0xe0]*ClassFacet.12|Snowmobiles[0xe0]$Make'Ski-Doo[0xe0]$City*Sioux
>>> City[0xe0]#lng*-96.369302[0xe0]-Certification!N[0xe0]0EmotionalTagline0162"
>>> Long Track
>>> [0xe0]*IsEnhanced[0x1][0xe0]*SubClassID$1704[0xe0](NetPrice$4500[0xe
>>> 0 
>>> ]1IsInternetSpecial[0x2][0xe0](HasPhoto[0x1][0xe0]/DealerSortOrder!2
>>> [ 0xe0]+Description?VThis Bad boy will pull you through the deepest 
>>> snow!With the 162" track and 1000cc of power you can fly up any 
>>> hill!![0xe0],DealerRadius+8046.720000[0xe0],Transmission
>>> [0xe0]*ModelFacet7Ski-Doo|Summit
>>> Highmark[0xe0]/DealerNameFacet9Certified
>>> Auto,
>>> Inc.|4150[0xe0])StateAbbr"IA[0xe0])ClassName+Snowmobiles[0xe0](Deale
>>> r ID$4150[0xe0]&AdCode$DX1Q[0xe0]*DealerName4Certified
>>> Auto,
>>> Inc.[0xe0])Condition$Used[0xe0]/Condition_Lower$used[0xe0]-ExteriorC
>>> o
>>> lor+Blue/Yellow[0xe0],DivisionName,Recreational[0xe0]$Trim(1000
>>> SDI[0xe0](SourceID!1[0xe0]0HasAdEnhancement!0[0xe0]'ClassID"12[0xe0].
>>> FuelType_Lower%other[0xe0]$Year$2005[0xe0]+DealerFacet?[0x8]4150|Cer
>>> t
>>> ified Auto, Inc.|Sioux
>>> City|IA[0xe0],SubClassName+Snowmobiles[0xe0]%Model/Summit
>>> Highmark[0xe0])EntryDate42011-11-17T10:46:00Z[0xe0]+StockNumber&0001
>>> 0 5[0xe0]+PriceRebate!0[0xe0]+Model_Lower/summit
>>> highmark[\n]"
>>> What could be the issue and how does one fix this issue?
>>>
>>> Thanks so much and if more information is needed I have preserved 
>>> the log files.
>>>
>>> AJ
>>>

Reply via email to