I am attempting to optimize time based data such as that of a newsfeed.
I've been running tests with data broken into indices based on month, week,
day. I'm using aliases to query the entire set or smaller ranges such as
last-month, last-quarter.
I'm still trying to figure out what will be
Sharding is good for when you have multiple nodes, that way you have a
small number of shards per node that can be queried in parallel, rather
than one (or a few) done sequentially. However you will get similar results
by having many smaller indexes across multiple nodes. The key thing between
the
Ok. Makes sense.
I'd like to setup an indexing strategy for time data that will hold for
some time without needing to reshuffle everything.
Advantages I've found of the small indices and shards would be that there
is NO finite number of shards. Aliasing strategies have more power than
basic