Thanks for your response. We were evaluating Spark and were curious to know how it is used today and the lowest latency it can provide.
> On Mar 14, 2016, at 8:37 AM, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > Hi Wlodeck, > > Let us look at this. > > In Oracle I have two tables channels and sales. This code works in Oracle > > 1 select c.channel_id, sum(c.channel_id * (select count(1) from sales s > WHERE c.channel_id = s.channel_id)) As R > 2 from channels c > 3* group by c.channel_id > s...@mydb.mich.LOCAL> / > CHANNEL_ID R > ---------- ---------- > 2 516050 > 3 1620984 > 4 473664 > 5 0 > 9 18666 > > I have the same tables In Hive but the same query crashes! > > hive> select c.channel_id, sum(c.channel_id * (select count(1) from sales s > WHERE c.channel_id = s.channel_id)) As R > > from channels c > > group by c.channel_id > > ; > NoViableAltException(232@[435:1: precedenceEqualExpression : ( ( LPAREN > precedenceBitwiseOrExpression COMMA )=> precedenceEqualExpressionMutiple | > precedenceEqualExpressionSingle );]) > > The solution is to use a temporary table to keep the sum/group by from sales > table as an intermediate stage (temporary tables are session specific and > they are created and dropped after you finish the session) > > hive> create temporary table tmp as select channel_id, count(channel_id) as > total from sales group by channel_id; > > > Ok the rest is pretty easy > > hive> select c.channel_id, c.channel_id * t.total as results > > from channels c, tmp t > > where c.channel_id = t.channel_id; > > 2.0 2800432.0 > 3.0 8802300.0 > 4.0 2583552.0 > 9.0 104013.0 > > HTH > > > > > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com > > >> On 14 March 2016 at 14:22, ws <wlodek_sie...@yahoo.com> wrote: >> Hive 1.2.1.2.3.4.0-3485 >> Spark 1.5.2 >> Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production >> >> ### >> SELECT >> f.description, >> f.item_number, >> sum(f.df_a * (select count(1) from e.mv_A_h_a where hb_h_name = >> r.h_id)) as df_a >> FROM e.eng_fac_atl_sc_bf_qty f, wv_ATL_2_qty_df_rates r >> where f.item_number NOT LIKE 'HR%' AND f.item_number NOT LIKE 'UG%' AND >> f.item_number NOT LIKE 'DEV%' >> group by >> f.description, >> f.item_number >> ### >> >> This query works fine in oracle but not Hive or Spark. >> So the problem is: "sum(f.df_a * (select count(1) from e.mv_A_h_a where >> hb_h_name = r.h_id)) as df_a" field. >> >> >> Thanks, >> Wlodek >> -- >> >> >> On Sunday, March 13, 2016 7:36 PM, Mich Talebzadeh >> <mich.talebza...@gmail.com> wrote: >> >> >> Depending on the version of Hive on Spark engine. >> >> As far as I am aware the latest version of Hive that I am using (Hive 2) has >> improvements compared to the previous versions of Hive (0.14,1.2.1) on Spark >> engine. >> >> As of today I have managed to use Hive 2.0 on Spark version 1.3.1. So it is >> not the latest Spark but it is pretty good. >> >> What specific concerns do you have in mind? >> >> HTH >> >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> http://talebzadehmich.wordpress.com >> >> >> On 13 March 2016 at 23:27, sjayatheertha <sjayatheer...@gmail.com> wrote: >> Just curious if you could share your experience on the performance of spark >> in your company? How much data do you process? And what's the latency you >> are getting with spark engine? >> >> Vidya >