Hi Dinushka!

That's awesome you're working with Apache Impala for your internship! You
should know the Impala website provides some incredible documentation. In
particular check out the performance section to learn about how to
benchmark queries.
https://impala.apache.org/docs/build/html/topics/impala_performance.html

As you mentioned since it's in "standalone mode" all sorts of things can be
going, however the easiest way to get some insights to do look at the
explain plan and profile for the queries you run. For examples take a look
at this page
https://impala.apache.org/docs/build/html/topics/impala_explain_plan.html

Any chance you can share the EXPLAIN and PROFILE statement output of your
queries? You should also take a look at the output of the SUMMARY statement
for yourself in case anything obvious stands out to you.

Thanks!

On Wed, Oct 30, 2019 at 9:10 PM Dinushka <[email protected]> wrote:

> Hi,
>
> I'm a student doing an internship, I have been given a task to do DB
> performance testing for kudu with Impala for our data and use case.
>
> Sample dataset is about 150 million records with 150 columns  and total
> size of kudu is 55GB. composite primary key (X,Y,Z) and partitioning by
> hash (X =4,Y=2,Z=2)
>
> SQL 1= "select  A from table where  G="value""
> SQL 2= "select  A from table where  G="value" order by Z"
>
> I'm testing kudu and Impala in standalone mode and have 2 queries which
> will only return one row. One with "order by" and other without "order by"
> .
>
> When I do testing, I found that Impala with order by is about 15% to 35%
> slow. when you have order by in the SQL.
>
> In large row counts queries, it's time can be about 2-20 times more.
>
> 1) Why is Impala slow with order by?
>
> 2) Can order by  be made faster in clustered mode, that mean made to be 
> parallelized
> ?
>
> 3) Is it a good idea to use order by with Impala? if so have any body use
> it with a larger data set with good performance.
>
> 4) Is there any other solutions to do fast order by queries within few
> seconds. (Interactive query engines)
>
>
> Thank you
>

Reply via email to