Thanks for this very useful info.. On 19 Oct 2017 11:28 pm, "Saurabh Mahapatra" <saurabhmahapatr...@gmail.com> wrote:
> I do not think you will get such information about benchmarks from > customers on production workloads. But from the customers I have worked > with who have taken Drill to production, here is some information that may > be of use to you: > > 1. The trend universally has been to use beefier machines for in-memory > query engines. We see 256GB RAM and 32 cores as the most frequent > configuration. On the network side, it is 2x10GbE. > > 2. The most commonly sized dedicated cluster for starting out with Drill in > production has been around 16-20 nodes with the above configuration. I have > several customers who have deployed this on 200+ nodes as well but in those > scenarios, it is a service among many. > > 3. The concurrency we see in the above settings is a function of the size > of the dataset and the complexity of the customer query. In general, > Little's law holds. The smaller the chunk of work is to be processed, the > faster will be the throughput. Our understanding of this changes further > with the new releases of Drill where spill to disk features will make it > more of a pessimistic execution engine. Also, the use of queues can also > change this understanding. > > 4. From my company side, we do have TPCH and TPCDS benchmarks that I do > share with customers. But such benchmarks are flawed because they come from > the world of traditional warehousing where the competition was among > general purpose query engines. For example, our tests show that at higher > and higher data scale, Drill beats Impala on these benchmarks. The same is > touted by the Hive LLAP folks as well. But they do not necessarily imply > that it is the best tool choice for the production environment. It is a > reason why I am resistant getting into the war of the query engines in > which every query engine beats the other under a given set of primed > conditions. > > 5. It is an absolute most that you understand the query patterns that the > system will have to withstand with the data characteristics specific to > your use case. I would only trust that. Big data systems are going to be > application specific and will require tuning. Which also means that you > have to revisit the kinds of analytics you would like your end users to > have. Which again raises the question-what kinds of analytics truly > generate value for the BI user? > > Best, > Saurabh > > On Wed, Oct 18, 2017 at 10:26 PM, PROJJWAL SAHA <proj.s...@gmail.com> > wrote: > > > Hi, > > > > Is there any public performance benchmark that users have achieved using > > Drill in production scenarios ? It would be useful if someone can pass me > > any links for customer user stories. > > > > Regards > > >