Hi Dinushka! That's awesome you're working with Apache Impala for your internship! You should know the Impala website provides some incredible documentation. In particular check out the performance section to learn about how to benchmark queries. https://impala.apache.org/docs/build/html/topics/impala_performance.html
As you mentioned since it's in "standalone mode" all sorts of things can be going, however the easiest way to get some insights to do look at the explain plan and profile for the queries you run. For examples take a look at this page https://impala.apache.org/docs/build/html/topics/impala_explain_plan.html Any chance you can share the EXPLAIN and PROFILE statement output of your queries? You should also take a look at the output of the SUMMARY statement for yourself in case anything obvious stands out to you. Thanks! On Wed, Oct 30, 2019 at 9:10 PM Dinushka <[email protected]> wrote: > Hi, > > I'm a student doing an internship, I have been given a task to do DB > performance testing for kudu with Impala for our data and use case. > > Sample dataset is about 150 million records with 150 columns and total > size of kudu is 55GB. composite primary key (X,Y,Z) and partitioning by > hash (X =4,Y=2,Z=2) > > SQL 1= "select A from table where G="value"" > SQL 2= "select A from table where G="value" order by Z" > > I'm testing kudu and Impala in standalone mode and have 2 queries which > will only return one row. One with "order by" and other without "order by" > . > > When I do testing, I found that Impala with order by is about 15% to 35% > slow. when you have order by in the SQL. > > In large row counts queries, it's time can be about 2-20 times more. > > 1) Why is Impala slow with order by? > > 2) Can order by be made faster in clustered mode, that mean made to be > parallelized > ? > > 3) Is it a good idea to use order by with Impala? if so have any body use > it with a larger data set with good performance. > > 4) Is there any other solutions to do fast order by queries within few > seconds. (Interactive query engines) > > > Thank you >
