On 9/11/2018 7:07 PM, John Smith wrote:
header:      223,580

child1:      124,978
child2:      254,045
child3:      127,917
child4:    1,009,030
child5:      225,311
child6:      381,561
child7:      438,315
child8:       18,850


Trying to index that into solr with a flatfile schema, blows up into
5,475,316,072 rows. Yes, 5.5 billion rows. I calculated that by running a

I think you're not getting what I'm suggesting.  Or maybe there's an aspect of your data that I'm not understanding.

If we add up all those numbers for the child docs, there are 2.5 million of them.  So you would have 2.5 million docs in Solr.  I have created Solr indexes far larger than this, and I do not consider my work to be "big data".  Solr can handle 2.5 million docs easily, as long as the hardware resources are sufficient.

Where the data duplication will come in is in additional fields in those 2.5 million docs.  Each one will contain some (or maybe all) of the data that WOULD have been in the parent document.  The amount of data balloons, but the number of documents (rows) doesn't.

That kind of arrangement is usually enough to accomplish whatever is needed.  I cannot assume that it will work for your use case, but it does work for most.

Thanks,
Shawn

Reply via email to