Re: performance issue on big table join

2017-11-02 Thread
AM, Hongxu Ma wrote: > > > Thanks LL. Your query options look good. > > > > As Xu Cheng mentioned, I also noticed that Impala do hash join slowly in > > some big data situations. > > Very curious to the root cause. > > > > > > 在 02/11/2017 10:00, 俊杰陈 写道: &g

Re: performance issue on big table join

2017-11-01 Thread
+user list 2017-11-02 9:57 GMT+08:00 俊杰陈 : > Hi Mostafa > > Cheng already put the profile in thread. > > Here is another profile for impala release version. you can also see the > attachment. > > > 2017-11-02 9:30 GMT+08:00 Mostafa Mokhtar : > >> Attaching th

Re: performance issue on big table join

2017-11-01 Thread
e profile from the WebUI on the coordinator node it > would be great. > > On Wed, Nov 1, 2017 at 6:22 PM, 俊杰陈 wrote: > > > Thanks Hongxu, > > > > Here are configurations on my cluster, most of them are default values. &g

Re: performance issue on big table join

2017-11-01 Thread
all query options, e.g: > BATCH_SIZE: [0] > DISABLE_CODEGEN: [0] > RUNTIME_FILTER_MODE: GLOBAL > > Just a guess, thanks. > > 在 27/10/2017 10:25, 俊杰陈 写道: > The profile file is damaged. Here is a screenshot for exec summary > [cid:ii_j999ymep1_15f5ba563aeabb91]

Re: How many threads impala start for handling partitioned join?

2017-10-26 Thread
eneral idea of the multithreading effort is to start multiple fragment > instances per host. A fragment instance may contain an exchange node. > > > On Wed, Oct 25, 2017 at 7:22 PM, 俊杰陈 wrote: > > > Thanks for the reply. > > > > I saw IMPALA-3902 <https://issues.apache.org

Re: performance issue on big table join

2017-10-26 Thread
The profile file is damaged. Here is a screenshot for exec summary ​ 2017-10-27 10:04 GMT+08:00 俊杰陈 : > Hi Devs > > I met a performance issue on big table join. The query takes more than 3 > hours on Impala and only 3 minutes on Spark SQL on the same 5 nodes > cluster. when runn

performance issue on big table join

2017-10-26 Thread
Hi Devs I met a performance issue on big table join. The query takes more than 3 hours on Impala and only 3 minutes on Spark SQL on the same 5 nodes cluster. when running query, the left scanner and exchange node are very slow. Did I miss some key arguments? you can see profile file in attachme

Re: How many threads impala start for handling partitioned join?

2017-10-25 Thread
e join (without > regard for the amount of partitions that fit into memory). > > HTH > > On 25 October 2017 at 05:44, 俊杰陈 wrote: > > Hi > > > > When Impala does a partitioned join on a node, it split the build input > > into partitions until a partition can fit int

How many threads impala start for handling partitioned join?

2017-10-24 Thread
Hi When Impala does a partitioned join on a node, it split the build input into partitions until a partition can fit into memory and consume the probe input then do the join and output rows. My question is will impala schedule multiple tasks to do join if multiple partitions fit into memory, or i

Re: vim / Eclipse setups for new developers, on the C++ side

2017-09-13 Thread
I use NetBeans to view the code, the "show call graph" is useful to me. 2017-09-14 5:44 GMT+08:00 Tim Armstrong : > For a long time I've just used GNU screen + VIM with syntax highlighting. > Then "git grep" or search in VIM as needed to find things. Obviously not > ideal for everyone. > > I've t

Re: Impala Sorter just sort small partition?

2017-08-04 Thread
It does > quicksort recursively then switches to insertion sort once the partitions > are less than INSERTION_THRESHOLD = 16. > > Sorter also supports an external merge sort - if the full input doesn't fit > in memory, it sorts in-memory runs with SortHelper() then does merge

Impala Sorter just sort small partition?

2017-08-03 Thread
Hi I'm looking Sorter.cc and found that Sorter::SortHelper just sort smaller partition. Is there anything I missed? -- Thanks & Best Regards

Re: material for impala newbie

2017-08-02 Thread
acolyer.org/2015/02/05/impala-a-modern-open- > > source-sql-engine-for-hadoop/ > > ) > > > > There's also an old paper on code generation: > > https://pdfs.semanticscholar.org/bac4/169d6b6f713c76271b5ccf3d452933 > > 51f785.pdf > > > > But the very be

material for impala newbie

2017-08-02 Thread
Hi I’m learning impala code now, is there anyone has any impala doc/PPT for computing workflow (such as order by), vectorization, and codegen? Thanks in advanced. -- Thanks & Best Regards