RE: Index size increases disproportionately to size of added field when indexed=false

Howe, David Thu, 15 Feb 2018 03:12:54 -0800

Hi Alessandro,

Some interesting testing today that seems to have gotten me closer to what the 
issue is.  When I run the version of the index that is working correctly 
against my database table that has the extra field in it, the index suddenly 
increases in size.  This is even though the data importer is running the same 
SELECT as before (which doesn't include the extra column) and loads the same 
number of rows.


After scratching my head for a bit and browsing through both versions of the 
table I am loading from (with and without the extra field), I noticed that the 
natural ordering of the tables is different.  These tables are "staging" tables 
that I populate with another set of queries and inserts to get the data into a 
format that is easy to ingest into Solr.  When I add the extra field to these 
queries, it changes the Oracle query plan as the field is contained in a 
different table that I need to join to.  As I don't specify an "ORDER BY" on 
the query (as I didn't think it would make a difference and would slow the 
query down), Oracle is free to chose how it orders the result set.  Adding the 
extra field changes that natural ordering, which affects the order things go 
into my staging table.  As I don't specify an "ORDER BY" when I select things 
out of the staging table, my data in the scenario that is working is being 
loaded in a different order to the scenario which doesn't work.

I am currently running full loads to verify this under each scenario, as I have 
now forced the data in the scenario that doesn't work to be in the same order 
as the scenario that does.  Will see how this load goes overnight.

This leads to the question of what difference does it make to Solr what order I 
load the data in?

I also noticed that the .cfs file is quite large in the second scenario, even 
though this is supposed to be disabled by default in Solr.  I checked my Solr 
config and there is no override of the default.

In answer to your questions:

1) same number of documents - YES ~14,000,000 documents
2) identical documents ( + 1 new field each not indexed) - YES, the second 
scenario has one extra field that is stored but not indexed
3) same number of deleted documents - YES, there are zero deleted documents in 
both scenarios
4) they both were born from scratch ( an empty index) - YES, both start from a 
brand new virtual server with a brand new installation of Solr

I am using the default auto commit, which I think is 15000.

Thanks again for your assistance.

Regards,

David

David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, 
confidential or legally professionally privileged. It is intended exclusively 
for the individual or entity to which it is addressed. You should only read, 
disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
the information if you are authorised to do so. Australia Post does not 
represent, warrant or guarantee that the integrity of this email communication 
has been maintained nor that the communication is free of errors, virus or 
interference.

If you are not the addressee or intended recipient please notify us by replying 
direct to the sender and then destroy any electronic or paper copy of this 
message. Any views expressed in this email communication are taken to be those 
of the individual sender, except where the sender specifically attributes those 
views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.

RE: Index size increases disproportionately to size of added field when indexed=false

Reply via email to