Thanks for this update. I can create a hive ORC transactional table with Spark no problem. the whole thing in Hive on spark including updates works fine.
my Spark is 1.6.1 and Hive is version 2 Bur updates of ORC transactional table through Spark fails I am afraid Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.6.1 /_/ Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc. SQL context available as sqlContext. *scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)*HiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@eb9b76c *//* *// My source table a plane text table called oraclehadoop.sales_staging* *//* *scala> HiveContext.sql("use oraclehadoop")*res0: org.apache.spark.sql.DataFrame = [result: string] *scala> val s = HiveContext.table("sales_staging")*s: org.apache.spark.sql.DataFrame = [prod_id: bigint, cust_id: bigint, time_id: timestamp, channel_id: bigint, promo_id: bigint, quantity_sold: decimal(10,0), amount_sold: decimal(10,0)] *//* *// register it as temp table* *scala> s.registerTempTable("tmp")* // // Create a new ORC transactional table through Spark // *scala> HiveContext.sql("DROP TABLE IF EXISTS oraclehadoop.orctest")*res2: org.apache.spark.sql.DataFrame = [] *scala> var sqltext : String = ""sqltext: String = ""* *scala> sqltext = """ | CREATE TABLE orctest | ( | PROD_ID bigint , | CUST_ID bigint , | TIME_ID timestamp , | CHANNEL_ID bigint , | PROMO_ID bigint , | QUANTITY_SOLD decimal(10,0) , | AMOUNT_SOLD decimal(10,0) | ) | CLUSTERED BY (PROD_ID) INTO 256 BUCKETS | STORED AS ORC | TBLPROPERTIES ( | "orc.compress"="SNAPPY", | "transactional"="true", | "orc.create.index"="true", | "orc.stripe.size"="16777216", | "orc.row.index.stride"="10000" | ) | """* sqltext: String = CREATE TABLE orctest ( PROD_ID bigint , CUST_ID bigint , TIME_ID timestamp , CHANNEL_ID bigint , PROMO_ID bigint , QUANTITY_SOLD decimal(10,0) , AMOUNT_SOLD decimal(10,0) ) CLUSTERED BY (PROD_ID) INTO 256 BUCKETS STORED AS ORC TBLPROPERTIES ( "orc.compress"="SNAPPY", "transactional"="true", "orc.create.index"="true", "orc.stripe.size"="16777216", "orc.row.index.stride"="10000" ) """ *scala> HiveContext.sql(sqltext)*res3: org.apache.spark.sql.DataFrame = [result: string] scala> // scala> // Put data in Hive table. scala> // *scala> sqltext = """ | INSERT INTO TABLE oraclehadoop.orctest | select * from tmp | """*sqltext: String = INSERT INTO TABLE oraclehadoop.orctest select * from tmp *scala> HiveContext.sql(sqltext)*res4: org.apache.spark.sql.DataFrame = [] // // Rows are there // *scala> sql("select count(1) from oraclehadoop.orctest").show*+------+ | _c0| +------+ |918843| +------+ // // Now let us try try few rows. This works fine in Hive. However, it fails here // scala> sql("update orctest set amount_sold = 1300 where prod_id = 13") *org.apache.spark.sql.AnalysisException:Unsupported language features in query: update orctest set amount_sold = 1300 where prod_id = 13*TOK_UPDATE_TABLE 1, 0,18, 7 TOK_TABNAME 1, 2,2, 7 orctest 1, 2,2, 7 TOK_SET_COLUMNS_CLAUSE 1, 4,10, 31 = 1, 6,10, 31 TOK_TABLE_OR_COL 1, 6,6, 19 amount_sold 1, 6,6, 19 1300 1, 10,10, 33 TOK_WHERE 1, 12,18, 52 = 1, 14,18, 52 TOK_TABLE_OR_COL 1, 14,14, 44 prod_id 1, 14,14, 44 13 1, 18,18, 54 scala.NotImplementedError: No parse rules for TOK_UPDATE_TABLE: TOK_UPDATE_TABLE 1, 0,18, 7 TOK_TABNAME 1, 2,2, 7 orctest 1, 2,2, 7 TOK_SET_COLUMNS_CLAUSE 1, 4,10, 31 = 1, 6,10, 31 TOK_TABLE_OR_COL 1, 6,6, 19 amount_sold 1, 6,6, 19 1300 1, 10,10, 33 TOK_WHERE 1, 12,18, 52 = 1, 14,18, 52 TOK_TABLE_OR_COL 1, 14,14, 44 prod_id 1, 14,14, 44 13 1, 18,18, 54 org.apache.spark.sql.hive.HiveQl$.nodeToPlan(HiveQl.scala:1217) ; at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:326) at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41) at org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890) at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110) at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34) at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:295) at org.apache.spark.sql.hive.HiveQLDialect$$anonfun$parse$1.apply(HiveContext.scala:66) at org.apache.spark.sql.hive.HiveQLDialect$$anonfun$parse$1.apply(HiveContext.scala:66) at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:290) at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:237) at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:236) at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:279) at org.apache.spark.sql.hive.HiveQLDialect.parse(HiveContext.scala:65) at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211) at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:211) at org.apache.spark.sql.execution.SparkSQLParser$$anonfun$org$apache$spark$sql$execution$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114) at org.apache.spark.sql.execution.SparkSQLParser$$anonfun$org$apache$spark$sql$execution$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:113) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891) at scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890) at scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110) at org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34) at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:208) at org.apache.spark.sql.SQLContext$$anonfun$1.apply(SQLContext.scala:208) at org.apache.spark.sql.execution.datasources.DDLParser.parse(DDLParser.scala:43) at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:231) at org.apache.spark.sql.hive.HiveContext.parseSql(HiveContext.scala:331) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:37) at $iwC$$iwC$$iwC.<init>(<console>:39) at $iwC$$iwC.<init>(<console>:41) at $iwC.<init>(<console>:43) at <init>(<console>:45) at .<init>(<console>:49) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 6 June 2016 at 19:15, Alan Gates <alanfga...@gmail.com> wrote: > This JIRA https://issues.apache.org/jira/browse/HIVE-12366 moved the > heartbeat logic from the engine to the client. AFAIK this was the only > issue preventing working with Spark as an engine. That JIRA was released > in 2.0. > > I want to stress that to my knowledge no one has tested this combination > of features, so there may be other problem. But at least this issue has > been resolved. > > Alan. > > > On Jun 2, 2016, at 01:54, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > > > > > Hi, > > > > Spark does not support transactions because as I understand there is a > piece in the execution side that needs to send heartbeats to Hive metastore > saying a transaction is still alive". That has not been implemented in > Spark yet to my knowledge." > > > > Any idea on the timelines when we are going to have support for > transactions in Spark for Hive ORC tables. This will really be useful. > > > > > > Thanks, > > > > > > Dr Mich Talebzadeh > > > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > > http://talebzadehmich.wordpress.com > > > >