Hello, I've stored weather data in two tables with different schemes. Scheme1 is using the month and station ID for the row key (e.g. 201601_GME00102292) and the days of the month (1-31) in the version column. Scheme2 is using the year and station ID for the row key (e.g. 2016_GME00102292) and the days of the year (1-366) in the version column. Of course, the version iterator has been removed from the tables. Because I have different metrics, like minimum temperature and maximum temperature of one day, I'm using locality groups, one group for each metric. (e.g. setgroups TMIN=TMIN, TMAX=TMAX). Additionaly I've done a pre splitting by year (e.g. 2014, 2015, 2016, ...).
Now to my question: If I do a full table scan with a batch scanner, Scheme2 is always faster than Scheme1 (with 2.5 billion entries Scheme1's scan took 24 minutes and Scheme2's scan took 21 minutes). Why is that? Is it because there are fewer seeks made when using Scheme2? Would be nice if someone can help me to understand what's happening here. Yours faithfully, Oliver Swoboda