On 9/11/2018 7:07 PM, John Smith wrote:
header: 223,580
child1: 124,978
child2: 254,045
child3: 127,917
child4: 1,009,030
child5: 225,311
child6: 381,561
child7: 438,315
child8: 18,850
Trying to index that into solr with a flatfile schema, blows up into
5,475,316,072 rows. Yes, 5.5 billion rows. I calculated that by running a
I think you're not getting what I'm suggesting. Or maybe there's an
aspect of your data that I'm not understanding.
If we add up all those numbers for the child docs, there are 2.5 million
of them. So you would have 2.5 million docs in Solr. I have created
Solr indexes far larger than this, and I do not consider my work to be
"big data". Solr can handle 2.5 million docs easily, as long as the
hardware resources are sufficient.
Where the data duplication will come in is in additional fields in those
2.5 million docs. Each one will contain some (or maybe all) of the data
that WOULD have been in the parent document. The amount of data
balloons, but the number of documents (rows) doesn't.
That kind of arrangement is usually enough to accomplish whatever is
needed. I cannot assume that it will work for your use case, but it
does work for most.
Thanks,
Shawn