Show original message
Hi All , While referring to spark UI , displayed as 198/200 - almost frozen...during shuffle stage of one task, most of the executor is with 0 byte, but just one executor is with 1 G . moreover, in the several join operation , some case is like this, one table or pairrdd is only with 40 keys, but the other table is with 10, 000 number keys..... Then, could it be decided some issue as data skew ... Any help or comment will be deep appreciated . Thanks in advance ~ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Here we have one application, it needs to extract different columns from 6 hive tables, and then does some easy calculation, there is around 100,000 number of rows in each table, finally need to output another table or file (with format of consistent columns) . However, after lots of days trying, the spark hive job is unthinkably slow - sometimes almost frozen. There is 5 nodes for spark cluster. Could anyone offer some help, some idea or clue is also good. Thanks in advance~ On Tuesday, July 19, 2016 11:05 AM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: Show original message Hi Mungeol, Thanks a lot for your help. I will try that. On Tuesday, July 19, 2016 9:21 AM, Mungeol Heo <mungeol....@gmail.com> wrote: Try to run a action at a Intermediate stage of your job process. Like save, insertInto, etc. Wish it can help you out. On Mon, Jul 18, 2016 at 7:33 PM, Zhiliang Zhu <zchl.j...@yahoo.com.invalid> wrote: > Thanks a lot for your reply . > > In effect , here we tried to run the sql on kettle, hive and spark hive (by > HiveContext) respectively, the job seems frozen to finish to run . > > In the 6 tables , need to respectively read the different columns in > different tables for specific information , then do some simple calculation > before output . > join operation is used most in the sql . > > Best wishes! > > > > > On Monday, July 18, 2016 6:24 PM, Chanh Le <giaosu...@gmail.com> wrote: > > > Hi, > What about the network (bandwidth) between hive and spark? > Does it run in Hive before then you move to Spark? > Because It's complex you can use something like EXPLAIN command to show what > going on. > > > > > > > On Jul 18, 2016, at 5:20 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> > wrote: > > the sql logic in the program is very much complex , so do not describe the > detailed codes here . > > > On Monday, July 18, 2016 6:04 PM, Zhiliang Zhu <zchl.j...@yahoo.com.INVALID> > wrote: > > > Hi All, > > Here we have one application, it needs to extract different columns from 6 > hive tables, and then does some easy calculation, there is around 100,000 > number of rows in each table, > finally need to output another table or file (with format of consistent > columns) . > > However, after lots of days trying, the spark hive job is unthinkably slow > - sometimes almost frozen. There is 5 nodes for spark cluster. > > Could anyone offer some help, some idea or clue is also good. > > Thanks in advance~ > > Zhiliang >